Diagrams

Interactive

33 Python scripts generated for sankey diagram this week

Sankey Diagram

Chart overview

Sankey diagrams visualize flows and transfers between nodes, where the width of each arrow (link) is proportional to the quantity flowing through it.

Key points

In Python they are most commonly built with Plotly's go.
Sankey, which represents the diagram as a set of arrays: the node labels plus the source, target, and value of every link.
Originally developed by Irish engineer Matthew Sankey to show steam-engine energy efficiency, they are now used for energy and budget flows, customer journeys, material and supply-chain flows, website click paths, and any process where the magnitude of transfers matters.

Practical guidance

Because flow is conserved - what enters a node must leave it - Sankey diagrams make it easy to spot inefficiencies, dominant pathways, and where volume is lost. Multi-level Sankeys chain several stages together (for example sources to conversion to end use), while node colors and semi-transparent links keep dense diagrams readable.

Create a Sankey Diagram with your data using AI — no coding required.

Try it free Use my data

Python Tutorial

How to create a sankey diagram in Python

Use the full tutorial for implementation details, troubleshooting, and chart variations in matplotlib, seaborn, and plotly.

Complete Guide to Scientific Data Visualization

Interactive Visualization

Loading interactive chart...

This is an interactive sankey diagram. You can zoom, pan, and hover over elements for details.

Create This Chart Now

Generate publication-ready sankey diagrams with AI in seconds. No coding required – just describe your data and let AI do the work.

Try Example Prompt Use Your Own Data

View example prompt

Example AI Prompt

"Create an interactive Sankey diagram showing 'US Energy Flow' from sources to end uses. Generate realistic data in quadrillion BTU: Sources (left): Coal (11), Natural Gas (32), Petroleum (35), Nuclear (8), Renewables (12). Transform through: Electricity Generation (38), Direct Use (60). End Uses (right): Residential (21), Commercial (18), Industrial (33), Transportation (26). Show energy losses in electricity generation as a separate flow to 'Rejected Energy' (24). Color flows by source type (Coal: gray, Gas: blue, Oil: brown, Nuclear: purple, Renewables: green). Add hover labels with flow values and percentages. Include a title with total energy consumption."

How to create this chart in 30 seconds

Upload Data

Drag & drop your Excel or CSV file. Plotivy securely processes it in your browser.

AI Generation

Our AI analyzes your data and generates the Sankey Diagram code automatically.

Customize & Export

Tweak the design with natural language, then export as high-res PNG, SVG or PDF.

Newsletter

Get one weekly tip for better sankey diagrams

Join researchers receiving concise Python plotting techniques to improve chart clarity and reduce revision cycles.

Python Code Example

example.py

# === IMPORTS ===
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px

# === USER-EDITABLE PARAMETERS ===
# Change: Title of the Sankey diagram
title = "Energy Flow from Sources to Consumers (TWh/year)"

# Change: Color palette for the Sankey diagram
source_colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b']  # Sources
intermediate_colors = ['#e377c2', '#7f7f7f']  # Intermediate stages
consumer_colors = ['#17becf', '#bcbd22', '#ff9896', '#c5b0d5']  # Consumers

# Change: Node padding and thickness
node_padding = 10  # Change: Space between nodes (pixels)
node_thickness = 20  # Change: Thickness of nodes (pixels)

# Change: Font sizes
title_fontsize = 18
label_fontsize = 12

# === GENERATE EXAMPLE DATASET ===
# Create a realistic energy flow dataset
data = {
    'source': [
        'Solar', 'Solar', 'Wind', 'Wind', 'Coal', 'Coal', 
        'Natural Gas', 'Natural Gas', 'Nuclear', 'Nuclear', 'Hydro', 'Hydro'
    ],
    'target': [
        'Electric Grid', 'Direct Use', 'Electric Grid', 'Direct Use', 
        'Electric Grid', 'Industrial', 'Electric Grid', 'Direct Use',
        'Electric Grid', 'Industrial', 'Electric Grid', 'Direct Use'
    ],
    'value': [
        120, 15, 180, 25, 350, 80, 420, 150, 280, 40, 95, 35
    ]
}

# Add flows from intermediate to consumers
intermediate_to_consumer = {
    'source': [
        'Electric Grid', 'Electric Grid', 'Electric Grid', 'Electric Grid',
        'Direct Use', 'Direct Use', 'Direct Use', 'Industrial'
    ],
    'target': [
        'Residential', 'Commercial', 'Industrial', 'Transportation',
        'Residential', 'Commercial', 'Transportation', 'Industrial'
    ],
    'value': [
        450, 380, 520, 180, 85, 120, 95, 120
    ]
}

# Combine all data
all_data = {
    'source': data['source'] + intermediate_to_consumer['source'],
    'target': data['target'] + intermediate_to_consumer['target'],
    'value': data['value'] + intermediate_to_consumer['value']
}

df = pd.DataFrame(all_data)

# Get unique nodes
all_nodes = list(set(df['source'].unique().tolist() + df['target'].unique().tolist()))

# Create node to index mapping
node_to_index = {node: i for i, node in enumerate(all_nodes)}

# Prepare source, target, and value arrays for Sankey
source_indices = [node_to_index[source] for source in df['source']]
target_indices = [node_to_index[target] for target in df['target']]
values = df['value'].tolist()

# Create color array for nodes
node_colors = []
for node in all_nodes:
    if node in ['Solar', 'Wind', 'Coal', 'Natural Gas', 'Nuclear', 'Hydro']:
        node_colors.append(source_colors[all_nodes[:6].index(node)] if node in all_nodes[:6] else '#1f77b4')
    elif node in ['Electric Grid', 'Direct Use']:
        node_colors.append(intermediate_colors[0] if node == 'Electric Grid' else intermediate_colors[1])
    else:
        node_colors.append(consumer_colors[all_nodes[-4:].index(node)] if node in all_nodes[-4:] else '#17becf')

# === CREATE SANKEY DIAGRAM ===
fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=node_padding,
        thickness=node_thickness,
        line=dict(color='#000000', width=0.5),
        label=all_nodes,
        color=node_colors
    ),
    link=dict(
        source=source_indices,
        target=target_indices,
        value=values,
        hovertemplate='%{source.label} → %{target.label}<br>Energy: %{value} TWh/year<extra></extra>'
    )
)])

# Update layout
fig.update_layout(
    title=dict(
        text=title,
        x=0.5,
        y=0.98,
        font=dict(size=title_fontsize)
    ),
    font=dict(size=label_fontsize),
    margin=dict(t=130, b=80, l=80, r=80),
    hovermode='x'
)

# Print summary statistics
total_energy = df['value'].sum()
print(f"\nTotal Energy Flow: {total_energy} TWh/year")
print(f"Number of Energy Sources: {len([n for n in all_nodes if n in ['Solar', 'Wind', 'Coal', 'Natural Gas', 'Nuclear', 'Hydro']])}")
print(f"Number of Consumer Sectors: {len([n for n in all_nodes if n in ['Residential', 'Commercial', 'Industrial', 'Transportation']])}")
print(f"\nTop 3 Energy Sources:")
source_totals = df[df['source'].isin(['Solar', 'Wind', 'Coal', 'Natural Gas', 'Nuclear', 'Hydro'])].groupby('source')['value'].sum().sort_values(ascending=False)
for source, value in source_totals.head(3).items():
    print(f"  {source}: {value} TWh/year")

# Display the plot
fig.show()
# END-OF-CODE

Opens the Analyze page with this code pre-loaded and ready to execute

Console Output

Output

Total Energy Input: 98 Quad BTU
Electricity Generation: 38 Quad BTU
Energy Loss: 24 Quad BTU (24.5%)
End Use: 123 Quad BTU

Common Use Cases

1Energy production and consumption flows
2Website user journey and conversion funnel analysis
3Budget allocation and cash-flow tracking
4Manufacturing and process visualization
5Supply-chain and material flows
6Carbon and emissions accounting

Pro Tips

Order nodes to minimize crossing flows

Use color to distinguish source types

Add hover labels with exact quantities

Keep node labels short so they don't overlap the links

Use semi-transparent RGBA link colors so overlapping flows stay legible

Set explicit node x/y positions in Plotly for clean multi-stage layouts

Frequently asked questions

When should you use a sankey diagram?

Sankey diagrams visualize flows and transfers between nodes, where the width of each arrow (link) is proportional to the quantity flowing through it. In Python they are most commonly built with Plotly's go. Common applications include energy production and consumption flows, website user journey and conversion funnel analysis, and budget allocation and cash-flow tracking.

Which Python libraries can create a sankey diagram?

A sankey diagram can be built in Python with plotly — Plotly for interactive hover, zoom, and web sharing. In Plotivy you describe the figure and it writes the plotly code for you.

Can I make a sankey diagram without writing Python code?

Yes. Describe the sankey diagram you need in plain language and upload your dataset — Plotivy's AI writes the Python code and renders a publication-ready figure. You still get the full, editable plotly source, so nothing is locked in a black box.

What are best practices for a clear sankey diagram?

Order nodes to minimize crossing flows. Use color to distinguish source types.

Long-tail keyword opportunities

how to create sankey diagram in python

sankey diagram matplotlib

sankey diagram seaborn

sankey diagram plotly

sankey diagram scientific visualization

sankey diagram publication figure python

High-intent chart variations

Sankey Diagram with confidence interval overlays

Sankey Diagram optimized for publication layouts

Sankey Diagram with category-specific color encoding

Interactive Sankey Diagram for exploratory analysis

Library comparison for this chart

plotly

Best for interactive hover, zoom, and web sharing when collaborators need to inspect values directly from sankey-diagram figures.

Free Cheat Sheet

Scientific Chart Selection Cheat Sheet

Not sure whether to use a Violin Plot, Box Plot, or Ridge Plot? Download our single-page reference mapping the most-used scientific chart types, exactly when to use them, and the core Matplotlib/Seaborn functions.

Comparison Charts

Distribution Charts

Time Series Data

Common Mistakes

Scientific Chart Selection

Plotivy - The AI Chart Generator for Science