Hierarchical Clustering Heatmap in Python
Technique overview
Cluster rows and columns to reveal structure in gene expression, similarity matrices, and other multivariate datasets.
Hierarchical clustering heatmaps reorder rows and columns so structure in a matrix becomes visible. They are common in gene expression, proteomics, metabolomics, similarity matrices, and high-dimensional assay screens. A useful clustered heatmap depends on preprocessing: row scaling, distance metric, linkage method, and annotation choices can change the apparent clusters. The figure should make those choices explicit.
Key points
- Cluster rows and columns to reveal structure in gene expression, similarity matrices, and other multivariate datasets.
- Hierarchical clustering heatmaps reorder rows and columns so structure in a matrix becomes visible.
- They are common in gene expression, proteomics, metabolomics, similarity matrices, and high-dimensional assay screens.
- A useful clustered heatmap depends on preprocessing: row scaling, distance metric, linkage method, and annotation choices can change the apparent clusters.
Example Visualization
Review the example first, then use the live editor below to run and customize the full workflow.
Mathematical Foundation
Hierarchical clustering heatmaps reorder rows and columns so structure in a matrix becomes visible.
Equation
distance(row_i, row_j) -> linkage tree -> reordered heatmapParameter breakdown
When to use this technique
Use clustered heatmaps for exploratory pattern discovery in matrices where both samples and features may have meaningful groupings.
Apply This Technique Now
Run this analysis workflow with AI in seconds. Use the prepared technique prompt or bring your own dataset.
View example prompt
"Create a hierarchical clustering heatmap from my matrix data, apply row z-score normalization, show both dendrograms, and label the most important clusters"
How to apply this technique in 30 seconds
Generate
Run the example prompt and let AI generate this technique automatically.
Refine and Export
Adjust code or prompt, then export publication-ready figures.
Implementation Code
The core data processing logic. Copy this block and replace the sample data with your measurements.
import numpy as np
import pandas as pd
from scipy.cluster.hierarchy import linkage, leaves_list
from scipy.spatial.distance import pdist
np.random.seed(4)
matrix = np.vstack([
np.random.normal(1.2, 0.35, (8, 6)),
np.random.normal(-0.8, 0.35, (8, 6)),
np.random.normal(0.2, 0.35, (8, 6)),
])
matrix[:, 3:] += np.repeat([0.7, -0.4, 0.2], 8)[:, None]
df = pd.DataFrame(matrix, index=[f"gene_{i+1}" for i in range(24)],
columns=[f"sample_{i+1}" for i in range(6)])
row_z = df.sub(df.mean(axis=1), axis=0).div(df.std(axis=1), axis=0)
row_linkage = linkage(pdist(row_z.values, metric="correlation"), method="average")
row_order = leaves_list(row_linkage)
print(row_z.index[row_order].tolist()[:5])Visualization Code
Complete matplotlib code for a publication-ready figure. Copy, paste into your notebook, and adjust labels to match your data.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(4)
matrix = np.vstack([
np.random.normal(1.2, 0.35, (8, 6)),
np.random.normal(-0.8, 0.35, (8, 6)),
np.random.normal(0.2, 0.35, (8, 6)),
])
matrix[:, 3:] += np.repeat([0.7, -0.4, 0.2], 8)[:, None]
df = pd.DataFrame(matrix, index=[f"gene_{i+1}" for i in range(24)],
columns=[f"sample_{i+1}" for i in range(6)])
row_z = df.sub(df.mean(axis=1), axis=0).div(df.std(axis=1), axis=0)
cluster = sns.clustermap(
row_z,
method="average",
metric="correlation",
cmap="vlag",
center=0,
figsize=(7, 8),
dendrogram_ratio=(0.18, 0.12),
cbar_kws={"label": "Row z-score"},
)
cluster.fig.suptitle("Hierarchical Clustering Heatmap", y=1.02)
cluster.savefig("hierarchical_clustering_heatmap.png", dpi=300, bbox_inches="tight")
plt.show()Add Sample Group Color Annotations
Column color bars make experimental groups visible without crowding the heatmap labels.
sample_group = pd.Series(["control", "control", "control", "treated", "treated", "treated"], index=df.columns)
palette = {"control": "#888888", "treated": "#9240ff"}
col_colors = sample_group.map(palette)
sns.clustermap(row_z, method="average", metric="correlation", col_colors=col_colors, cmap="vlag", center=0)Common Errors and How to Fix Them
Clusters are driven by absolute abundance only
Why: Unscaled rows with large magnitudes dominate the distance calculation.
Fix: Use row z-score normalization when the goal is pattern clustering across samples.
Too many labels overlap
Why: A heatmap with hundreds of rows cannot show every label legibly.
Fix: Hide row labels, label selected features, or split the matrix into cluster-specific panels.
Changing linkage changes the conclusion
Why: Cluster structure is exploratory and sensitive to distance choices.
Fix: Report the metric and linkage method, and test whether major clusters are robust.
Frequently Asked Questions
Apply Hierarchical Clustering Heatmap in Python to Your Data
Upload your dataset and Plotivy generates the Python code, runs the analysis, and produces a publication-ready figure.
Generate Code for This TechniquePython Libraries
Quick Info
- Domain
- Multivariate
- Typical Audience
- Bioinformaticians, systems biologists, and data scientists exploring similarity patterns in high-dimensional measurements
