Classification as partition: MIT's Statistical Models chapter frames classification as partitioning feature space into decision regions. Each phenotype definition creates a region in the multi-dimensional marker space — the set of all cells with the specified marker combination. The phenotype definitions collectively partition the marker space: every cell belongs to exactly one phenotype. Well-designed phenotype panels produce partitions that align with biologically distinct cell populations.
The combinatorial explosion: With n binary markers, there are 2ⁿ possible marker combinations. A 6-marker panel has 64 possible phenotypes; a 10-marker panel has 1,024. In practice, most combinations are biologically meaningless (a cell cannot be CD3+ and CD20+ simultaneously in normal biology) or extremely rare. Effective phenotype design focuses on the 5-15 biologically meaningful combinations rather than trying to characterize all possible ones.
Statistical methods for threshold optimization (Dilbilir): The quality of phenotype assignments depends entirely on the quality of individual marker gates. Each gate threshold determines the boundary between positive and negative for one marker. Dilbilir's statistical framework emphasizes that threshold optimization should minimize classification error across the population — not just separate the most obvious positive from the most obvious negative cells. Cells near the threshold (the "dim positive" or "equivocal" population) are the most error-prone and most important to get right.
Phenotypes partition cells into biologically meaningful groups based on marker combinations. A 6-marker panel could theoretically produce 64 phenotypes, but biology restricts the meaningful ones to 5-15 cell types. The accuracy depends on how well each individual marker gate separates positive from negative — the cells near the threshold are where most classification errors occur.