ScientiaLux
strataquest Glossary Raw Data
Data Access

Raw Data

Per-cell measurement tables for every detected event

View
Definition
Raw Data provides direct access to the original pixel values underlying each detected object — not just summary statistics like mean intensity, but the actual distribution of intensities across every pixel within each cell. This is the ground truth that all other measurements summarize. When a mean intensity hides important variation (bimodal staining within a cell, partial positivity, heterogeneous expression), Raw Data reveals what the summary statistics conceal.
Pixel-Level Access
Every intensity value per object
Distribution Analysis
Beyond mean values
Histogram Per Object
Intensity distribution within each cell
Quality Control
Verify measurement accuracy

How It Works

Raw Data extracts pixel-level information for each object in a coded image:

  1. Object masking — For each labeled object, identify all pixels belonging to it from the coded image.
  2. Intensity extraction — Read the intensity value at each of these pixels from the specified channels.
  3. Distribution computation — Compute distributional statistics: median, percentiles (25th, 75th, 95th), standard deviation, skewness, and kurtosis.
  4. Export — Raw pixel data can be exported for external analysis, or distributional features can be used as additional measurements in StrataQuest's classification pipeline.
Simplified

Raw Data reads the actual pixel values inside each detected cell, not just the average. It computes distributional statistics — median, percentiles, spread — that reveal patterns hidden by summary measures. Two cells with the same mean might have very different distributions.

Science Behind It

Sampling and reconstruction: Hanrahan's signal processing framework reminds us that every pixel is a point sample of an underlying continuous signal. The original fluorescence distribution in a cell is continuous; the digitized image samples it on a regular grid. Mean intensity is one reconstruction of this signal — a zeroth-order summary. But the full set of samples contains much more information. Median intensity is more robust to outliers. The 95th percentile captures peak expression better than the mean. Standard deviation measures heterogeneity.

Quantization and gray levels: Pawley emphasizes that the number of meaningful gray levels in a fluorescence image depends on photon counts. With 100 photons per pixel, SNR = 10, giving ~11 distinguishable intensity levels. Below 25 photons/pixel, you have only ~5 gray levels — the intensity "distribution" within a cell becomes too coarse for meaningful analysis. Raw Data is most informative when images are acquired with sufficient photon counts for meaningful intensity quantification.

When the mean lies: Consider a cell expressing a biomarker in a punctate (dotted) pattern — bright spots in a dim background. The mean intensity might be moderate, suggesting moderate expression. But the true pattern is binary: some regions are strongly positive, others are negative. The per-cell histogram would be bimodal. Standard deviation would be high. The 95th percentile would be much higher than the mean. Raw Data reveals the punctate pattern that the mean obscures.

Precision vs. accuracy: Measurement precision (reproducibility) differs from accuracy (correctness). A mean intensity computed from 500 pixels within a nucleus has high precision — repeat the measurement and you get nearly the same value. But its accuracy as a measure of biomarker concentration is limited by all the confounding factors Dobrucki warns about (illumination, focus, bleaching). Raw Data improves precision by providing multiple summary statistics; improving accuracy requires addressing the optical confounds.

Simplified

Every pixel is a sample of the cell's fluorescence. The mean is just one summary — the median, percentiles, and spread contain additional information. A cell with punctate (spotted) staining has a bimodal intensity distribution: some pixels are bright, others are dim. The mean would suggest moderate expression, hiding the biologically important pattern. Raw Data reveals these distributions, but the information is only meaningful when enough photons were collected to produce reliable pixel values.

Practical Example

Investigating heterogeneous PD-L1 expression in tumor cells:

  1. Standard Measurements shows mean PD-L1 intensity = 85 for a group of tumor cells
  2. Raw Data reveals: the distribution is bimodal — some cells have uniform intensity ~85, but others have half their pixels at ~150 and half at ~20
  3. The bimodal cells show punctate PD-L1 staining (membrane patches) rather than uniform membrane expression
  4. This distinction matters: punctate expression may indicate different biology (active PD-L1 clustering vs. uniform low-level expression)

Without Raw Data, both expression patterns produce the same mean intensity and would be classified identically. The distributional analysis distinguishes them.

Simplified

Two cells with mean PD-L1 intensity of 85 might be very different: one has uniform staining, the other has bright patches and dark gaps. Raw Data reveals this difference by showing the full intensity distribution within each cell, not just the average.

Connected Terms

Share This Term
Term Connections