Feature Importance & Evaluation

Predomics provides multiple methods for assessing feature importance and evaluating model performance, from individual feature contributions to population-level prevalence analysis.

Individual Feature Importance

Mean Decrease in Accuracy (MDA)

MDA is a permutation-based importance metric that measures how much each feature contributes to the model’s predictive power:

Compute the baseline AUC of the model on the test data
For each feature j in the model: a. Randomly shuffle the values of feature j across samples (N permutations, default N=100) b. Recompute the AUC on the shuffled data c. MDA(j) = mean(baseline_AUC - shuffled_AUC)
Features with high MDA are critical; removing them destroys predictive accuracy

Interpretation:

MDA > 0: Feature contributes positively to the model
MDA ~ 0: Feature is irrelevant or redundant (its information is captured by other features)
MDA < 0: Feature may be adding noise (rare, but possible)

Limitations:

MDA evaluates each feature independently – it cannot capture feature interactions
If two features are highly correlated, shuffling one may not decrease AUC because the other compensates
MDA is used internally for feature pruning: features with zero or negative importance can be automatically removed to simplify the model

SHAP-like Per-Sample Explanations

For each sample, the contribution of each feature to the predicted score is decomposed:

contribution(feature_j, sample_s) = coefficient_j * value(feature_j, sample_s)

This provides a per-sample, per-feature breakdown of the prediction, enabling:

Beeswarm Plot

Each dot represents one sample’s SHAP value for one feature
X-axis: SHAP value (contribution to score)
Y-axis: Features (sorted by overall importance)
Color: Feature value (low = blue, high = red)

Reveals the direction of each feature’s effect: high values of feature X push toward disease (positive SHAP, red dots on right) or health (positive SHAP, blue dots on right).

Force Plot (Waterfall)

For a selected sample, shows the sequential contribution of each feature
Starting from a base value, each feature pushes the score up or down
The final score determines the classification

Useful for explaining individual predictions: “This patient was classified as disease because feature A was high (+0.3), feature B was present (+0.2), despite feature C being low (-0.1).”

Dependence Plot

X-axis: Feature value
Y-axis: SHAP value for that feature
One dot per sample

Reveals non-linear relationships between feature values and their contribution to the model. A flat line means the feature’s effect is constant; a curve means the effect changes with abundance.

Coefficient Direction

A simpler importance view that shows the model’s coefficients directly:

Positive coefficients (+1, or higher in Pow2): Feature enriched in class 1 (disease)
Negative coefficients (-1, or lower in Pow2): Feature enriched in class 0 (controls)

The coefficient direction chart provides an immediate visual summary of which features push toward which class.

Population-Level Feature Analysis

Feature Prevalence in the FBM

For each feature, its prevalence across the Family of Best Models measures its consistency as a biomarker:

prevalence(feature_j) = count(FBM models containing feature_j) / |FBM|

Features are ranked by prevalence and displayed as:

Prevalence bar chart: Horizontal bars showing prevalence per feature
Population heatmap: Models (rows) x features (columns), colored by coefficient

FBM Feature Categories

Prevalence	Category	Interpretation
> 80%	Core biomarker	Almost always selected. Very robust signal.
50-80%	Frequent biomarker	Selected in most models. Reliable but may have alternatives.
20-50%	Accessory feature	Selected in some models. May be context-dependent or interchangeable.
< 20%	Rare feature	Only in a few models. Possibly noise or specific to certain k values.

Feature x Sparsity Heatmap

Shows how feature prevalence varies across model sizes k:

A feature with high prevalence at k=3 and k=5 but low at k=10 is a core feature that gets diluted at higher sparsity
A feature with increasing prevalence as k grows is secondary – only included when the model has room for extra features

Evaluation Metrics

Classification Metrics

Metric	Formula	Range	Interpretation
AUC	Area under ROC	[0.5, 1]	Discrimination ability across all thresholds
Accuracy	(TP+TN) / N	[0, 1]	Overall correctness (affected by class imbalance)
Sensitivity	TP / (TP+FN)	[0, 1]	Ability to detect positives (recall)
Specificity	TN / (TN+FP)	[0, 1]	Ability to detect negatives
MCC	(TPTN-FPFN) / sqrt(…)	[-1, 1]	Balanced metric, robust to class imbalance
F1 Score	2PrecSens / (Prec+Sens)	[0, 1]	Harmonic mean of precision and sensitivity
G-mean	sqrt(Sens * Spec)	[0, 1]	Geometric mean, penalizes imbalanced performance

Confusion Matrix

The standard 2x2 confusion matrix is extended with a rejection class when threshold confidence intervals are enabled:

	Predicted 0	Predicted 1	Rejected
Actual 0	TN	FP	Rejected-0
Actual 1	FN	TP	Rejected-1

Rejected samples have scores falling within the threshold confidence interval.

Cross-Validation Reporting

Performance is reported at multiple levels:

Per-fold: Train and validation metrics for each outer CV fold
Aggregated: Mean and standard deviation across folds
Final: Performance of the combined FBM on full training data
External test: Performance on held-out test data (if provided)

Generation Tracking

During GA optimization, train and test AUC are tracked at each generation to detect:

Convergence: AUC stabilizes (good stopping criterion)
Overfitting: Train AUC increases but test AUC decreases or stagnates
Underfitting: Both train and test AUC are low (need more epochs or different parameters)

External Validation

Models trained on one dataset can be validated on an independent cohort:

Upload external Xtest and Ytest files
Apply the best model (or jury ensemble) to the new data
Compare AUC, sensitivity, specificity on the external cohort

This is the gold standard for assessing generalization and is essential before claiming clinical utility.

Multi-Cohort Meta-Analysis

The comparative/meta-analysis view allows side-by-side comparison of multiple jobs:

Overlaid metrics: AUC, accuracy, sensitivity across different runs
Feature overlap: Venn diagram of features selected by different runs
Cross-cohort biomarkers: Features consistently selected across datasets are the strongest candidates for generalization

References

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. (Original MDA formulation)
Lundberg, S. M. & Lee, S.-I. (2017). A unified approach to interpreting model predictions. NeurIPS.
Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta, 405(2), 442-451.