How to Plot a Histogram in Python with Matplotlib (Complete Guide)

Histograms are the fastest way to inspect how values in a dataset are distributed. Before choosing a statistical test, checking for outliers, or deciding whether to log-transform data, a histogram gives you the visual context you need.
Many tutorials stop at plt.hist(data). That is enough for a screenshot, but not enough for technical work where binning choices, scaling, weighting, and reproducibility affect your scientific conclusions. This guide goes from baseline usage to edge cases you will encounter in real data pipelines.
You will get multiple code variants, interpretation guidance, and practical troubleshooting patterns for common failure modes. If your goal is a publication figure or a defensible exploratory analysis, this is the level of histogram control you need.
Bin count
Too few bins hide structure; too many add noise. Start with bins="auto" and adjust.
density=True
Normalises the y-axis to probability density, making groups with different sample sizes comparable.
alpha
Set alpha=0.5–0.6 when overlapping two or more distributions so both remain readable.
1. Minimal histogram
Pass your array to ax.hist(). The most important optional argument is bins - an integer sets the number of equal-width bins. For fast exploratory checks, this is fine, but for reporting or comparing cohorts, you should avoid hardcoding a bin count before inspecting spread, outliers, and sample size.
import matplotlib.pyplot as plt
import numpy as np
rng = np.random.default_rng(42)
data = rng.normal(loc=5, scale=1.5, size=300)
fig, ax = plt.subplots(figsize=(7, 4))
ax.hist(data, bins=30, color="#6b21a8", edgecolor="white", linewidth=0.4)
ax.set_xlabel("Value")
ax.set_ylabel("Count")
ax.set_title("Basic Histogram")
plt.tight_layout()2. Choosing bins with intent
Bin selection controls what structure you can see. Too few bins hide multimodality. Too many bins amplify noise and produce unstable tails. Instead of guessing, compare rule-based options on the same dataset and keep your choice explicit in methods sections.
- Sturges - conservative, good for small near-normal datasets.
- Freedman-Diaconis (
fd) - robust to outliers, often strong for skewed experimental data. auto- chooses between rules, good as a baseline but still inspect visually.
import matplotlib.pyplot as plt
import numpy as np
rng = np.random.default_rng(42)
data = np.r_[rng.normal(0, 1, 700), rng.normal(4, 0.7, 300)]
fig, axes = plt.subplots(1, 3, figsize=(12, 3.5), sharey=True)
rules = ["sturges", "fd", "auto"]
for ax, rule in zip(axes, rules):
ax.hist(data, bins=rule, color="#6b21a8", edgecolor="white", linewidth=0.4)
ax.set_title(f"bins='{rule}'")
ax.set_xlabel("Value")
axes[0].set_ylabel("Count")
fig.suptitle("How bin rules change your interpretation")
plt.tight_layout()3. Add a KDE curve on top
Use density=True to normalize the histogram, then overlay a Gaussian KDE for a smooth estimate of the underlying distribution. In reports, describe KDE as a smoothed estimate rather than raw frequency.
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
rng = np.random.default_rng(42)
data = rng.normal(loc=5, scale=1.5, size=300)
fig, ax = plt.subplots(figsize=(7, 4))
ax.hist(data, bins=30, density=True, color="#6b21a8", edgecolor="white",
linewidth=0.4, alpha=0.6, label="Histogram")
xs = np.linspace(data.min(), data.max(), 200)
kde = gaussian_kde(data)
ax.plot(xs, kde(xs), lw=2, color="#f59e0b", label="KDE")
ax.set_xlabel("Value")
ax.set_ylabel("Density")
ax.legend(frameon=False)
plt.tight_layout()4. Overlapping histograms for group comparison
When comparing two groups, plot both histograms on the same axes with alpha set below 1. Use density=True if group sizes differ. Counts can be misleading when one cohort has more observations.
import matplotlib.pyplot as plt
import numpy as np
rng = np.random.default_rng(42)
group_a = rng.normal(4, 1.2, 200)
group_b = rng.normal(6, 1.0, 200)
fig, ax = plt.subplots(figsize=(7, 4))
ax.hist(group_a, bins=25, density=True, alpha=0.55, color="#6b21a8", label="Group A")
ax.hist(group_b, bins=25, density=True, alpha=0.55, color="#f59e0b", label="Group B")
ax.set_xlabel("Value")
ax.set_ylabel("Density")
ax.legend(frameon=False)
plt.tight_layout()5. Weighted histograms for non-uniform sampling
In many workflows, samples do not contribute equally. You may have confidence weights from measurement quality, inverse probability weights in surveys, or replicate-level weighting from merged instruments. Use the weightsargument so your histogram reflects weighted frequency, not raw row count.
import matplotlib.pyplot as plt
import numpy as np
rng = np.random.default_rng(42)
signal = rng.normal(0, 1, 600)
# Simulate acquisition confidence for each sample
weights = np.clip(rng.normal(loc=1.0, scale=0.25, size=signal.size), 0.2, 1.8)
fig, axes = plt.subplots(1, 2, figsize=(11, 4), sharey=True)
axes[0].hist(signal, bins=30, color="#1d4ed8", edgecolor="white", linewidth=0.4)
axes[0].set_title("Unweighted")
axes[0].set_xlabel("Signal")
axes[1].hist(signal, bins=30, weights=weights, color="#b45309", edgecolor="white", linewidth=0.4)
axes[1].set_title("Weighted")
axes[1].set_xlabel("Signal")
axes[0].set_ylabel("Count / weighted count")
plt.tight_layout()6. Skewed data: log bins and log axes
Particle size, latency, concentration, and many biological intensity datasets are right-skewed. Linear bins over-emphasize the dense left tail and compress the rest. Logarithmic bins plus a log x-axis often produce a more interpretable shape. Keep in mind that bin widths are no longer equal in linear space, so report this explicitly in figure captions.
import matplotlib.pyplot as plt
import numpy as np
rng = np.random.default_rng(42)
data = rng.lognormal(mean=1.6, sigma=0.9, size=1500)
# Build logarithmic bin edges
log_bins = np.logspace(np.log10(data.min()), np.log10(data.max()), 40)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].hist(data, bins=40, color="#6b21a8", edgecolor="white", linewidth=0.4)
axes[0].set_title("Linear bins")
axes[0].set_xlabel("Value")
axes[0].set_ylabel("Count")
axes[1].hist(data, bins=log_bins, color="#6b21a8", edgecolor="white", linewidth=0.4)
axes[1].set_xscale("log")
axes[1].set_title("Log bins + log x-axis")
axes[1].set_xlabel("Value (log scale)")
plt.tight_layout()Try it
Try it now: generate this histogram from your own data
Upload a CSV or Excel file and let Plotivy build a publication-ready histogram with sensible bins and clean styling.
Generate my histogram →Newsletter
Get a weekly Python plotting tip
One concise tip each week for cleaner, faster scientific figures. Built for researchers who publish.
7. Matplotlib vs Seaborn vs Plotly for histograms
The best library depends on your output target. Matplotlib offers maximum control for journal exports. Seaborn accelerates statistical defaults. Plotly gives interactivity for dashboards and exploratory notebooks.
| Library | Strength | Trade-off |
|---|---|---|
| matplotlib | Precise publication styling and export control | More manual code for polished defaults |
| seaborn | Fast statistical visuals and clean defaults | Less granular control in edge styling cases |
| plotly | Interactive hover, zoom, and web-native output | Static publication output needs extra tuning |
Quick reference: key parameters
| Parameter | What it does |
|---|---|
| bins | Integer, sequence of edges, or "auto" / "fd" |
| density | Normalise to probability density (default False) |
| alpha | Transparency 0–1; use 0.5–0.6 for overlapping groups |
| edgecolor | Border colour of bars; "white" separates dense bins |
| log | Log-scale y-axis — useful for power-law distributions |
| range | Clip input to a (min, max) window before binning |
| weights | Apply per-sample contribution when observations are non-uniform |
| histtype | Bar style: bar, step, or stepfilled |
8. Common mistakes and robust fixes
Histogram bugs are often data quality bugs in disguise. Before styling, validate numeric types, remove non-finite values, and lock your bin strategy. If your figure changes drastically across reruns, inspect random seeds and preprocessing order.
- Too few bins - large bins hide bimodality or skewness. Use
bins="auto"as a starting point. - Comparing raw counts - if sample sizes differ, normalise with
density=Truebefore overlapping. - No axis label - always label both axes, including the units on the x-axis.
- Missing caption - state the number of observations (n) in the figure caption or on the plot.
- Ignoring NaN or inf values - clean arrays before plotting or you will get unstable bins and hard-to-debug behavior.
- Changing bins between cohorts - when comparing groups, use a shared bin edge definition.
import matplotlib.pyplot as plt
import numpy as np
def clean_numeric(series):
arr = np.asarray(series, dtype=float)
arr = arr[np.isfinite(arr)]
if arr.size == 0:
raise ValueError("No finite numeric values available for histogram")
return arr
raw = [1.1, 2.0, np.nan, np.inf, 3.4, 4.1, -np.inf, 2.8]
data = clean_numeric(raw)
fig, ax = plt.subplots(figsize=(6, 4))
ax.hist(data, bins="auto", color="#6b21a8", edgecolor="white", linewidth=0.4)
ax.set_xlabel("Value")
ax.set_ylabel("Count")
ax.set_title("Histogram after numeric cleaning")
plt.tight_layout()9. Publication-ready histogram checklist
- Report sample size (n) in caption or panel text.
- Declare bin rule or fixed bin width explicitly.
- Use consistent bins across compared cohorts.
- Label x-axis units and whether y-axis is count or density.
- Document preprocessing: outlier handling, filtering, and transforms.
- Export at target journal dimensions and verify readability at print scale.
10. When to use a histogram vs alternatives
Histograms are excellent for showing distribution shape, but they are not always the best final figure. If your audience needs quartiles and outlier emphasis, a box plot may communicate faster. If you need full density shape comparison with small samples, violin plots can be more stable than aggressively binned histograms.
A practical workflow is: start with histogram for quick distribution diagnostics, validate assumptions, then choose the final chart form based on the message you need to communicate in the paper or report.
11. Advanced edge cases you should handle explicitly
Technical datasets rarely follow clean textbook assumptions. If you are working with sensor streams, assay readouts, or merged multi-site data, your histogram pipeline should include edge-case handling as first-class logic. Treat these as part of the analysis method, not cosmetic cleanup.
- Zero-inflated distributions - many biological and reliability datasets include a large spike at zero plus a continuous positive tail. Consider plotting a dedicated zero bar annotation and a second panel for non-zero values to avoid flattening useful variation.
- Censored measurements - if values below detection limit are imputed, your left tail may be artificial. Mark detection thresholds with vertical reference lines and mention censoring rules in captions.
- Integer count data - for low-range count variables (for example read counts or defect counts), use bin edges aligned to integer boundaries. Arbitrary fractional bins can imply precision that does not exist in the measured variable.
- Streaming windows - in online QC systems, histograms computed on rolling windows can drift due to changing sample volume. Keep a fixed bin edge policy and version your windowing settings for reproducible audits.
- Mixture populations - multimodal peaks may represent true subpopulations rather than noise. Before smoothing them away, verify whether modality aligns with treatment group, batch, instrument, or site.
12. Interpreting histogram shapes without overclaiming
Histogram interpretation is useful but easy to overstate. Shape alone does not prove mechanism. Use the plot to generate hypotheses, then test them with domain checks and formal statistics.
| Observed shape | Common interpretation | What to verify next |
|---|---|---|
| Right skew | Rare high values or multiplicative process | Check log transform, outlier provenance, measurement floor |
| Bimodal | Two latent groups or regime change | Color by batch/treatment and inspect subgroup histograms |
| Heavy tails | Extreme events matter to summary metrics | Use robust statistics (median, IQR) and tail diagnostics |
| Sharp central spike | Quantization, rounding, or imputation artifact | Inspect raw instrument precision and preprocessing rules |
13. Domain-specific implementation notes
The same plotting function behaves differently across domains because data collection constraints differ. These patterns reduce avoidable review comments in technical papers and internal reports.
- Clinical measurements - always annotate units and detection limits. If subgroup sample sizes differ strongly, favor density-normalized overlays plus subgroup-specific sample counts.
- Materials characterization - grain size and particle distributions are typically right-skewed. Use log bins and report the exact edge definition so another lab can reproduce your visual summary.
- A/B experimentation - compare control and treatment with shared bins and explicit cohort sizes. Pair histograms with confidence intervals or bootstrap summaries to avoid visual-only conclusions.
- Sensor engineering - if calibration changes over time, stratify histograms by calibration era. Mixed-calibration histograms can create false multimodality and mislead threshold setting.
14. Common error messages and immediate fixes
If you are building production notebooks or reusable plotting utilities, add guards for these failure cases. Most histogram runtime issues come from invalid data arrays rather than plotting API syntax.
- ValueError: autodetected range of [nan, nan] is not finite- clean non-finite values before binning.
- TypeError with mixed string and numeric arrays- cast to numeric explicitly and log rejected rows.
- Unreadable overlapping fills- lower alpha and use step histograms or faceting.
- Histogram shape changes between runs- control random seed, filtering order, and bin-edge policy.
- Axis labels clipped in export- apply
plt.tight_layout()and verify target figure size before saving.
15. FAQ for technical teams
Should I show counts or density in publications?
If you compare cohorts with unequal sample sizes, density is usually clearer. If absolute volume matters to the claim, keep counts and show sample size for each cohort. In many papers, the strongest approach is counts in supplemental plots and density overlays in the main figure where shape comparison matters.
How many bins are appropriate for n around 100 to 500?
There is no universal answer. Start with rule-based edges such as fd or auto, then compare visual stability across nearby settings. If conclusions change drastically with small bin adjustments, report that sensitivity and avoid overconfident narrative claims.
When should I choose KDE alone instead of histogram + KDE?
KDE alone is useful for cleaner visual summaries, especially when many groups overlap. Histogram + KDE is better when readers need to see both empirical bin occupancy and smoothed trend. In regulated or high-stakes contexts, showing the empirical histogram reduces ambiguity about where data points actually lie.
Can I use histograms for hypothesis testing directly?
Histograms are descriptive, not inferential. Use them to check assumptions, detect anomalies, and motivate model choice. Then run appropriate statistical tests on the underlying data. Keep the histogram as diagnostic evidence, not as the sole proof of significance.
Want to skip the boilerplate? Describe your data in plain language and Plotivy generates the histogram code for you — bins, density, and styling included.
Try histogram generation in PlotivyRelated chart guides
Apply this tutorial directly in the chart gallery with ready-to-run prompts and examples.
Technique guides scientists read next
scipy.signal.find_peaks guide
Tune prominence and width parameters for robust peak extraction.
Savitzky-Golay smoothing
Reduce noise while preserving peak shape and position.
PCA visualization workflow
Move from high-dimensional measurements to interpretable components.
ANOVA with post-hoc brackets
Add statistically correct pairwise significance annotations.
Found this helpful? Share it with your network.
Experimental Physicist & Photonics Researcher
Hands-on experience in silicon photonics, semiconductor fabrication (DRIE/ICP-RIE), optical simulation, and data-driven analysis. Built Plotivy to help researchers focus on discoveries instead of data struggles.
More about the authorVisualize your own data
Apply the techniques from this article to your own datasets. Upload CSV, Excel, or paste data directly.