Family of Best Models (FBM)

Rather than selecting a single “best” model, Predomics identifies the Family of Best Models – a statistically defined population of all models whose performance is not significantly worse than the best.

Definition

Let N models have fitness scores f_1 >= f_2 >= … >= f_N (sorted descending). The FBM is defined using a binomial confidence interval around the best score f_1:

lower_bound = BinomialCI(f_1, N, alpha)
FBM = { model_i : f_i > lower_bound }

where alpha is the confidence level (parameter best_models_criterion). A smaller alpha produces a larger FBM (more permissive).

If the fitness metric is not in [0, 1], a fallback criterion selects the top 5% of models.

Why a Family, Not a Single Model?

Statistical equivalence: Many models may perform equally well within sampling noise. Selecting only the top model ignores this uncertainty.
Feature robustness: Features appearing in many FBM models (high prevalence) are more reliable biomarkers than features appearing in only the top model.
Biological insight: The diversity of models in the FBM reveals which features are interchangeable and which are essential.
Ensemble prediction: The FBM provides a natural ensemble for jury/voting-based prediction.

FBM Analysis in PredomicsApp

Feature Prevalence

For each feature, its FBM prevalence is the fraction of FBM models that include it:

prevalence(feature_j) = count(models containing feature_j) / |FBM|

Features with prevalence > 80% are likely core biomarkers. Features with prevalence 20-80% may be interchangeable alternatives. Features with prevalence < 20% are accessory or context-dependent.

Population Heatmap

The FBM population is visualized as a heatmap:

Rows: Models (sorted by fitness or language)
Columns: Features (sorted by prevalence)
Color: Coefficient value (+1 blue, -1 red, 0 white)

This reveals which features co-occur, which are mutually exclusive, and whether distinct model families exist.

Co-Presence Analysis

Feature co-presence measures how often two features appear together in FBM models compared to chance:

For each feature pair (A, B), count:
- Both present: n_AB
- Only A: n_A - n_AB
- Only B: n_B - n_AB
- Neither: N - n_A - n_B + n_AB
Apply the hypergeometric test (Fisher’s exact test) to assess significance
Features that co-occur more than expected are positively associated (potential functional modules)
Features that co-occur less than expected may be functionally redundant (interchangeable)

FBM Z-Score Filtering

The FBM can be further filtered using a z-score criterion to focus on models whose performance is within a statistical threshold of the mean:

z_i = (f_i - mean(FBM)) / std(FBM)

Models with z_i below a configurable threshold are excluded. This produces a tighter “core FBM” while still being more robust than single-model selection.

Jury / Ensemble Voting

The FBM naturally produces an ensemble of expert models for prediction:

Majority Voting

Each FBM model independently predicts the class of a new sample. The final prediction is the majority vote, optionally weighted by model fitness:

prediction = argmax( sum(w_i * vote_i) for class in {0, 1} )

where w_i is the weight of model i (e.g., its AUC).

Consensus Voting

Requires a minimum agreement level (e.g., 70% of experts) to make a prediction. If agreement is below the threshold, the sample is classified as “undecided” (rejection):

if max(agreement_0, agreement_1) < consensus_threshold:
    prediction = "rejected"
else:
    prediction = argmax(agreement_0, agreement_1)

Vote Matrix

A detailed per-sample, per-model matrix showing how each expert voted on each sample. This allows identification of:

Unanimous samples: All experts agree (high confidence)
Controversial samples: Experts disagree (potentially misclassified or atypical)
Expert groups: Clusters of models that vote similarly

Concordance Analysis

Measures pairwise agreement between jury experts:

Cohen’s kappa between model pairs
Overall concordance: Fraction of samples where all experts agree
Confusion matrix for the jury ensemble vs. true labels

References

Prifti, E. et al. (2020). Interpretable and accurate prediction scores for metagenomics data with Predomics. GigaScience, 9(3).
Cui, S. (2017). Mining sparse statistical learning models. Internship report, Ecole Polytechnique / ICAN.