Menu

Tutorial16 min read

How to Make a Box Plot in Python (Matplotlib & Seaborn Guide)

By Francesco VillasmuntaUpdated June 12, 2026
How to Make a Box Plot in Python (Matplotlib & Seaborn Guide)

A box plot (also called a box-and-whisker plot) is the fastest way to compare the distribution of a numeric variable across groups. It shows the median, the interquartile range (IQR), and outliers in a single compact glyph, which makes it a staple of biology, clinical, and engineering papers. This guide covers everything from a one-line box plot to a fully styled, journal-ready figure in both matplotlib and seaborn.

If you are still deciding whether a box plot is the right chart at all, read our box plot vs violin plot vs bar chart comparison first. Otherwise, let us build one.

What the box shows

The box spans the 25th to 75th percentile (the IQR). The line inside is the median, not the mean.

Whiskers

By default they extend to 1.5 × IQR. Points beyond that are drawn as individual outliers (fliers).

Small n caveat

With fewer than ~10 points per group, overlay the raw data so reviewers can see the actual sample.

1. The minimal box plot

Pass a list of arrays to ax.boxplot(). Each array becomes one box. Use tick_labels to label the groups. On matplotlib 3.9+ the argument is tick_labels; on older versions it was labels.

import matplotlib.pyplot as plt
import numpy as np

rng = np.random.default_rng(42)
control = rng.normal(100, 8, 60)
treatment = rng.normal(88, 11, 60)

fig, ax = plt.subplots(figsize=(6, 4.5))
ax.boxplot([control, treatment], tick_labels=["Control", "Treatment"])
ax.set_ylabel("Cell viability (%)")
ax.set_title("Basic box plot")
plt.tight_layout()

2. Color and style the boxes

Filled boxes require patch_artist=True. The boxplot() call returns a dictionary of artists, so you can iterate over bp["boxes"] to set a per-group color. Keep the median line dark for contrast and reduce flier marker size so outliers do not dominate.

import matplotlib.pyplot as plt
import numpy as np

rng = np.random.default_rng(42)
groups = [rng.normal(loc, 9, 60) for loc in (100, 88, 72)]
labels = ["Control", "Drug A", "Drug B"]
colors = ["#6b7280", "#7c3aed", "#ec4899"]

fig, ax = plt.subplots(figsize=(6.2, 4.6))
bp = ax.boxplot(
    groups,
    tick_labels=labels,
    patch_artist=True,       # required to fill the boxes
    widths=0.55,
    medianprops=dict(color="black", linewidth=1.4),
    flierprops=dict(marker="o", markersize=4, markerfacecolor="none"),
)

for patch, color in zip(bp["boxes"], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.65)

ax.set_ylabel("Cell viability (%)")
ax.spines[["top", "right"]].set_visible(False)
plt.tight_layout()

3. The faster route: seaborn

Seaborn works directly from a tidy (long-format) DataFrame and handles grouping, coloring, and labels for you. Pass the grouping column to both x and hue (with legend=False) to get per-group colors without a redundant legend, which is the current seaborn-recommended pattern.

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

rng = np.random.default_rng(42)
df = pd.DataFrame({
    "viability": np.concatenate([rng.normal(loc, 9, 60) for loc in (100, 88, 72)]),
    "group": np.repeat(["Control", "Drug A", "Drug B"], 60),
})

fig, ax = plt.subplots(figsize=(6.2, 4.6))
sns.boxplot(data=df, x="group", y="viability", hue="group",
            palette=["#6b7280", "#7c3aed", "#ec4899"], legend=False, ax=ax)
ax.set(xlabel="", ylabel="Cell viability (%)")
sns.despine()
plt.tight_layout()

4. Overlay the raw data points

Most journals now ask you to show individual observations alongside the summary, especially for small samples. Combine sns.boxplot (with fliersize=0 so outliers are not drawn twice) and sns.stripplot. This is the single most reviewer-friendly upgrade you can make to a box plot.

Try it

Try it now: compare your groups with the right chart

Generate box, violin, or bar charts directly from your dataset and choose the clearest visual for your paper.

Generate comparison charts

Newsletter

Get a weekly Python plotting tip

One concise tip each week for cleaner, faster scientific figures. Built for researchers who publish.

No spam. Unsubscribe anytime.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

rng = np.random.default_rng(7)
df = pd.DataFrame({
    "value": np.concatenate([rng.normal(loc, 9, 18) for loc in (100, 88, 72)]),
    "group": np.repeat(["Control", "Drug A", "Drug B"], 18),
})

fig, ax = plt.subplots(figsize=(6.2, 4.6))
sns.boxplot(data=df, x="group", y="value", hue="group",
            palette="Greys", legend=False, fliersize=0, ax=ax)
sns.stripplot(data=df, x="group", y="value", color="#7c3aed",
              size=4, jitter=0.18, alpha=0.85, ax=ax)
ax.set(xlabel="", ylabel="Response")
sns.despine()
plt.tight_layout()

5. Horizontal box plots for long labels

When group names are long (site names, gene IDs, drug names), a horizontal layout keeps them readable. In matplotlib, set vert=False; in seaborn, swap the x and y arguments.

import matplotlib.pyplot as plt
import numpy as np

rng = np.random.default_rng(42)
labels = ["Site E", "Site D", "Site C", "Site B", "Site A"]
data = [rng.normal(loc, 6, 50) for loc in (40, 46, 52, 58, 63)]

fig, ax = plt.subplots(figsize=(6.5, 4.2))
ax.boxplot(data, tick_labels=labels, vert=False, patch_artist=True,
           boxprops=dict(facecolor="#7c3aed", alpha=0.6))
ax.set_xlabel("Measurement")
ax.spines[["top", "right"]].set_visible(False)
plt.tight_layout()

6. Grouped box plots (two categorical variables)

To compare a condition across timepoints, doses, or sites, use hue in seaborn. This produces side-by-side boxes within each x-axis group and is far cleaner than juggling positions manually in matplotlib.

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

rng = np.random.default_rng(42)
n = 40
df = pd.DataFrame({
    "value": np.concatenate([
        rng.normal(50, 6, n), rng.normal(58, 6, n),   # Day 1
        rng.normal(54, 6, n), rng.normal(70, 6, n),   # Day 7
    ]),
    "timepoint": np.repeat(["Day 1", "Day 1", "Day 7", "Day 7"], n),
    "condition": np.tile(np.repeat(["Control", "Treated"], n), 2),
})

fig, ax = plt.subplots(figsize=(6.6, 4.6))
sns.boxplot(data=df, x="timepoint", y="value", hue="condition",
            palette=["#6b7280", "#7c3aed"], ax=ax)
ax.set(xlabel="", ylabel="Expression")
ax.legend(title="", frameon=False)
sns.despine()
plt.tight_layout()

Box plot anatomy: what each line means

ElementMeaning
Box bottom / top25th percentile (Q1) and 75th percentile (Q3)
Line in boxMedian (Q2) — not the mean
Box heightInterquartile range (IQR = Q3 − Q1)
WhiskersExtend to the last point within 1.5 × IQR of the box
FliersIndividual points beyond the whiskers (potential outliers)

7. Export at journal quality

A box plot is only publication-ready once it is exported correctly. Save at 300 DPI minimum, trim whitespace, and use a vector format (PDF or SVG) when the journal accepts it. See our high-resolution export guide for the full checklist.

import matplotlib.pyplot as plt

# ... build your box plot on the axes "ax" ...

fig.savefig(
    "boxplot_figure.png",
    dpi=300,                 # 300 DPI minimum for most journals
    bbox_inches="tight",     # trim surrounding whitespace
    facecolor="white",       # avoid transparent backgrounds in print
)
fig.savefig("boxplot_figure.pdf", bbox_inches="tight")  # vector for line art

Common mistakes and how to fix them

  • Boxes will not fill with color — you forgot patch_artist=True in matplotlib.
  • Outliers appear twice — when overlaying a strip plot, set fliersize=0 (seaborn) or showfliers=False (matplotlib) on the box plot.
  • Reader assumes the line is the mean — state explicitly in the caption that the center line is the median, or add a marker for the mean with showmeans=True.
  • Box plot on tiny samples — with n < 10, quartiles are unstable. Overlay the raw points or use a dot plot instead.
  • labels vs tick_labels error — newer matplotlib renamed labels to tick_labels; use whichever your version expects.

When to use a box plot vs an alternative

Box plots summarize distribution shape compactly, but they hide multimodality. If the shape of the distribution matters — for example two hidden subpopulations — a violin plot reveals it where a box plot would not. If you mainly need to compare group means with uncertainty, a bar chart with error bars may communicate faster.

Frequently asked questions

How do I show the mean on a box plot in Python?

Pass showmeans=True to ax.boxplot(). You can style the marker with meanprops. Always clarify in the caption that the box line is the median and the marker is the mean, since the two are easy to confuse.

How do I change what the whiskers represent?

The whis argument controls whisker extent. whis=1.5 is the default (Tukey) rule. Use whis=(5, 95) to set whiskers at the 5th and 95th percentiles, which some fields prefer. Document your choice in the methods section.

Should I use matplotlib or seaborn for box plots?

Use seaborn when your data is in a tidy DataFrame and you want grouping and color handled automatically. Drop to matplotlib when you need pixel-level control over every artist for a final journal figure. Because seaborn is built on matplotlib, you can mix both: build with seaborn, then fine-tune the returned axes.

How do I add statistical significance brackets?

Draw brackets manually with ax.plot and ax.text, or use the statannotations library, which places p-value annotations between box pairs automatically and supports common tests.

Skip the boilerplate: upload your data, type "box plot of viability by treatment group with individual points overlaid," and Plotivy writes the matplotlib or seaborn code for you — styling, overlay, and journal export included.

Generate a box plot in Plotivy
Tags:#box plot#matplotlib#seaborn#python#distribution

Found this helpful? Share it with your network.

FV
Francesco Villasmunta

Experimental Physicist & Photonics Researcher

Hands-on experience in silicon photonics, semiconductor fabrication (DRIE/ICP-RIE), optical simulation, and data-driven analysis. Built Plotivy to help researchers focus on discoveries instead of data struggles.

More about the author

Visualize your own data

Apply the techniques from this article to your own datasets. Upload CSV, Excel, or paste data directly.

Start Analyzing - Free