MultivariateLive Code Editor

30 researchers ran this analysis this month

Hierarchical Clustering Heatmap in Python

Technique overview

Cluster rows and columns to reveal structure in gene expression, similarity matrices, and other multivariate datasets.

Hierarchical clustering heatmaps reorder rows and columns so structure in a matrix becomes visible. They are common in gene expression, proteomics, metabolomics, similarity matrices, and high-dimensional assay screens. A useful clustered heatmap depends on preprocessing: row scaling, distance metric, linkage method, and annotation choices can change the apparent clusters. The figure should make those choices explicit.

Key points

Cluster rows and columns to reveal structure in gene expression, similarity matrices, and other multivariate datasets.
Hierarchical clustering heatmaps reorder rows and columns so structure in a matrix becomes visible.
They are common in gene expression, proteomics, metabolomics, similarity matrices, and high-dimensional assay screens.
A useful clustered heatmap depends on preprocessing: row scaling, distance metric, linkage method, and annotation choices can change the apparent clusters.

scipynumpymatplotlibseaborn

Example Visualization

Open full screen

Review the example first, then use the live editor below to run and customize the full workflow.

Mathematical Foundation

Hierarchical clustering heatmaps reorder rows and columns so structure in a matrix becomes visible.

Equation

distance(row_i, row_j) -> linkage tree -> reordered heatmap

Parameter breakdown

distance metricEuclidean, correlation, cosine, or another dissimilarity measure

linkageRule for merging clusters, such as average, complete, or ward

z-score scalingRow-wise normalization to emphasize patterns over absolute magnitude

dendrogramTree showing hierarchical relationships

When to use this technique

Use clustered heatmaps for exploratory pattern discovery in matrices where both samples and features may have meaningful groupings.

Apply This Technique Now

Run this analysis workflow with AI in seconds. Use the prepared technique prompt or bring your own dataset.

Try Technique Prompt Use Your Own Data

View example prompt

Example AI Prompt

"Create a hierarchical clustering heatmap from my matrix data, apply row z-score normalization, show both dendrograms, and label the most important clusters"

How to apply this technique in 30 seconds

Upload Data

Upload your CSV or Excel file in Analyze and keep your column names as-is.

Generate

Run the example prompt and let AI generate this technique automatically.

Refine and Export

Adjust code or prompt, then export publication-ready figures.

Implementation Code

The core data processing logic. Copy this block and replace the sample data with your measurements.

import numpy as np
import pandas as pd
from scipy.cluster.hierarchy import linkage, leaves_list
from scipy.spatial.distance import pdist

np.random.seed(4)
matrix = np.vstack([
    np.random.normal(1.2, 0.35, (8, 6)),
    np.random.normal(-0.8, 0.35, (8, 6)),
    np.random.normal(0.2, 0.35, (8, 6)),
])
matrix[:, 3:] += np.repeat([0.7, -0.4, 0.2], 8)[:, None]
df = pd.DataFrame(matrix, index=[f"gene_{i+1}" for i in range(24)],
                  columns=[f"sample_{i+1}" for i in range(6)])

row_z = df.sub(df.mean(axis=1), axis=0).div(df.std(axis=1), axis=0)
row_linkage = linkage(pdist(row_z.values, metric="correlation"), method="average")
row_order = leaves_list(row_linkage)
print(row_z.index[row_order].tolist()[:5])

Visualize This Implementation in Analyze

Visualization Code

Complete matplotlib code for a publication-ready figure. Copy, paste into your notebook, and adjust labels to match your data.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

np.random.seed(4)
matrix = np.vstack([
    np.random.normal(1.2, 0.35, (8, 6)),
    np.random.normal(-0.8, 0.35, (8, 6)),
    np.random.normal(0.2, 0.35, (8, 6)),
])
matrix[:, 3:] += np.repeat([0.7, -0.4, 0.2], 8)[:, None]
df = pd.DataFrame(matrix, index=[f"gene_{i+1}" for i in range(24)],
                  columns=[f"sample_{i+1}" for i in range(6)])
row_z = df.sub(df.mean(axis=1), axis=0).div(df.std(axis=1), axis=0)

cluster = sns.clustermap(
    row_z,
    method="average",
    metric="correlation",
    cmap="vlag",
    center=0,
    figsize=(7, 8),
    dendrogram_ratio=(0.18, 0.12),
    cbar_kws={"label": "Row z-score"},
)
cluster.fig.suptitle("Hierarchical Clustering Heatmap", y=1.02)
cluster.savefig("hierarchical_clustering_heatmap.png", dpi=300, bbox_inches="tight")
plt.show()

Visualize This Code in Analyze

Add Sample Group Color Annotations

Column color bars make experimental groups visible without crowding the heatmap labels.

sample_group = pd.Series(["control", "control", "control", "treated", "treated", "treated"], index=df.columns)
palette = {"control": "#888888", "treated": "#9240ff"}
col_colors = sample_group.map(palette)
sns.clustermap(row_z, method="average", metric="correlation", col_colors=col_colors, cmap="vlag", center=0)

Visualize This Advanced Variant

Common Errors and How to Fix Them

Clusters are driven by absolute abundance only

Why: Unscaled rows with large magnitudes dominate the distance calculation.

Fix: Use row z-score normalization when the goal is pattern clustering across samples.

Too many labels overlap

Why: A heatmap with hundreds of rows cannot show every label legibly.

Fix: Hide row labels, label selected features, or split the matrix into cluster-specific panels.

Changing linkage changes the conclusion

Why: Cluster structure is exploratory and sensitive to distance choices.

Fix: Report the metric and linkage method, and test whether major clusters are robust.

Frequently Asked Questions

Apply Hierarchical Clustering Heatmap in Python to Your Data

Upload your dataset and Plotivy generates the Python code, runs the analysis, and produces a publication-ready figure.

Generate Code for This Technique

Python Libraries