Methods

Detailed descriptions of the algorithms and statistical methods behind Predomics.

Overview

Predomics combines compact mathematical model languages with evolutionary optimization to discover interpretable predictive signatures from high-dimensional omics data. The methods span four key areas:

Model Languages & Scoring

The four model languages (Binary, Ternary, Ratio, Pow2), data type transformations (raw, prevalence, log), threshold optimization, confidence intervals, fitness functions and penalties.

Search Algorithms

The three optimization strategies (Genetic Algorithm, Beam Search, MCMC), cross-validation schemes (outer and inner folds, stratification), and feature pre-selection methods (Wilcoxon, t-test, Bayesian Fisher).

Family of Best Models

The statistical definition of the FBM (binomial CI), feature prevalence analysis, co-presence testing (hypergeometric), z-score filtering, and jury/ensemble voting (majority, consensus, rejection).

Stability Analysis

Three stability indices (Tanimoto, Kuncheva, CW_rel) computed per sparsity level, hierarchical model clustering (Tanimoto distance, average linkage), feature x sparsity heatmaps, and dendrogram visualization. Inspired by the Shasha Cui internship (2017).

Ecosystem Network

Co-abundance network construction (Spearman correlation, prevalence filtering), Louvain community detection, taxonomic coloring (SCAPIS palette), node metrics (degree, betweenness), FBM overlay, and class-specific networks. Inspired by Interpred (2019) and SCAPIS ecosystem work (2024).

Feature Importance & Evaluation

MDA permutation importance, SHAP-like per-sample explanations (beeswarm, force, dependence), population-level feature prevalence, evaluation metrics (AUC, MCC, F1, G-mean), cross-validation reporting, and external validation.

Key References

Prifti, E. et al. (2020). Interpretable and accurate prediction scores for metagenomics data. GigaScience, 9(3). doi:10.1093/gigascience/giaa010
Kuncheva, L. I. (2007). A stability index for feature selection. IASTED, 390-395.
Blondel, V. D. et al. (2008). Fast unfolding of communities in large networks. J. Stat. Mech., P10008.
Efron, B. & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC.