Histogram | strataquest

View

Definition

A histogram shows the distribution of a single measurement across all cells — how many cells have intensity 50, how many have intensity 100, how many have intensity 200. It reveals the shape of the population: a single peak (one population), two peaks (positive and negative), a long tail (heterogeneous expression), or a plateau (uniform distribution). Histograms are the simplest and most fundamental visualization in quantitative tissue analysis — the starting point for setting gates, evaluating staining quality, and understanding population structure.

Distribution Shape

See the population structure

Gate Guidance

Find the valley between populations

Quality Assessment

Evaluate staining and detection

Population Statistics

Mean, median, spread, percentiles

How It Works

The Histogram engine visualizes the distribution of any per-cell measurement:

Measurement selection — Choose any column from the measurement table: marker intensity, area, compactness, derived value.
Binning — The measurement range is divided into bins of equal width. Each bin counts the number of cells with measurement values falling in that range.
Display — Bar height represents the count (or frequency) of cells in each bin. Optional overlays: fitted distributions, gate positions, population coloring by phenotype.
Statistics — Compute and display mean, median, standard deviation, coefficient of variation, percentiles, and modality (number of peaks).

Simplified

A histogram counts how many cells have each measurement value and displays the distribution as bars. The shape reveals population structure — peaks are cell populations, valleys are natural classification boundaries, and the width shows measurement variability.

Science Behind It

Histogram as probability density (Gonzalez & Woods): The normalized histogram p(r_k) = n_k/N approximates the probability density function of pixel (or cell) intensities. This statistical interpretation underpins all threshold-based analysis: Otsu's method treats the histogram as a mixture of two probability distributions and finds their optimal separation. The histogram is therefore not just a visualization tool — it is the empirical estimate of the measurement's statistical distribution.

Histogram equalization: Gonzalez & Woods describe histogram equalization as a transform that spreads the histogram to use all available gray levels. The transform is the cumulative distribution function (CDF): s_k = CDF(r_k). This is relevant to tissue analysis when comparing samples with different staining intensity ranges — equalization normalizes the dynamic range, though it also distorts the proportional relationship between intensity and expression.

Gray levels and noise (Pawley): The number of meaningful gray levels in a histogram depends on SNR: g = 1 + SNR. With 100 photons per pixel, SNR = 10, yielding ~11 distinguishable levels. Below 25 photons/pixel, the histogram has only ~5 meaningful bins. This sets a fundamental floor on the resolution of any histogram-based analysis — if the measurement doesn't have enough precision to fill more than a few bins, the histogram cannot reveal fine population structure.

Bimodality as diagnostic: A truly bimodal histogram indicates two distinct populations — the basis for reliable binary gating. The "dip test" (Hartigan's) provides a formal statistical test for bimodality: if the histogram is significantly bimodal (p < 0.05), the two populations are statistically distinguishable, and a threshold between them is meaningful. If the histogram is not significantly bimodal, forcing a binary gate creates an artificial division of a continuous distribution.

Simplified

The histogram is the empirical probability distribution of your measurement. Otsu's method and all threshold-based analysis work on this distribution — they try to find natural boundaries between populations. The histogram's resolution (how many meaningful bins it has) depends on the measurement's precision, which depends on how many photons were collected. A well-separated bimodal histogram is the best-case scenario for gating; a featureless unimodal histogram means the marker may not divide cells into meaningful categories.

Practical Example

Evaluating Ki-67 staining quality before analysis:

Good staining: Bimodal histogram with clear valley at intensity 80 — negative population peaks at 30, positive population peaks at 150. Gate at 80 cleanly separates the two.
Poor staining: Unimodal histogram peaking at 50 with a long right tail — no clear positive population. Forcing a gate at any point divides a continuous distribution arbitrarily.
Too much background: Bimodal but with the negative peak at 120 (high background) and positive peak at 180 — populations overlap extensively. Background Removal needed before meaningful gating.

The histogram reveals the problem before you invest time in analysis. A quick histogram check is the single best quality control step in any tissue analysis workflow.

Simplified

Check the histogram before analyzing. Two clear peaks with a valley between them? Good staining — set the gate in the valley. One broad peak with no valley? Problem — the marker may not provide useful classification. High background shifting everything bright? Apply Background Removal first. The histogram is your first and most important quality check.

Connected Terms

Scattergram Category Related Learning Path
Cutoffs Category Related Learning Path
Gates Category Related Learning Path
Assign Classes to Objects Related
Statistical Operations Related
Raw Data Related
Phenotypes Category Learning Path