Feature selection criteria (Solomon & Breckon): Good classification features should have two properties: "(i) distribution over classes should be as widely separated as possible and (ii) features should be statistically independent of each other." The first criterion means the feature actually discriminates between classes (CD3 intensity separates T cells from tumor cells). The second means each feature adds new information (if two features are perfectly correlated, one is redundant).
The Bayesian foundation: All classification can be framed as Bayesian inference: assign cell x to class C_j if P(C_j|x) is highest, where P(C_j|x) ∝ P(x|C_j) × P(C_j). Naive Bayes assumes features are conditionally independent given the class. SVM finds the hyperplane that maximizes the margin between classes. Random Forest builds many decision trees on random feature subsets and votes. Each algorithm makes different assumptions about the structure of the data.
Texture features — Haralick and beyond: Solomon & Breckon describe Haralick texture features derived from the Gray-Level Co-occurrence Matrix (GLCM): contrast, correlation, energy, homogeneity. These capture spatial patterns in intensity — a cell with granular staining has high GLCM contrast; a cell with uniform staining has high homogeneity. Texture features are particularly valuable when two cell types have similar mean intensities but different staining patterns (e.g., diffuse cytoplasmic vs. punctate vesicular).
The curse of dimensionality: With dozens of features available (6 channels × 3 compartments × 5 statistics = 90+ features), the feature space is high-dimensional. Classification accuracy can paradoxically decrease as more features are added if the training set is too small (overfitting). The Fisher Linear Discriminant approach — projecting high-dimensional data onto the axis that maximally separates classes — is the theoretical foundation for feature selection: keep the features that contribute most to class separation.
Asymmetric error costs: Solomon & Breckon note: "the cost of a false-negative (abnormal classified as normal) is considerably greater than the other kind of misclassification." In tissue analysis, misclassifying a tumor cell as a lymphocyte (false negative for tumor) may have different consequences than the reverse. Classification thresholds can be adjusted to favor sensitivity (catch all tumor cells) or specificity (don't misclassify any lymphocytes) depending on the clinical context.
The classifier learns to distinguish cell types by finding the combination of measurements that best separates them — like learning to distinguish apple varieties by combining color, size, and texture rather than any single feature. Random Forest is often the best starting choice because it handles complex patterns, doesn't require feature scaling, and is robust to irrelevant features. The main risk is overfitting: using too many features with too few training examples makes the model memorize the training data rather than learning generalizable patterns.