Co-Abundance Ecosystem Network
The ecosystem network visualizes microbial species as a co-abundance network, revealing ecological relationships (cooperation, competition, niche sharing) and their connection to disease-associated biomarkers.
Inspired by the Interpred approach (Magali Cousin-Thorez internship, 2019) and the SCAPIS ecosystem work (Edi Prifti, 2024).
Network Construction
Step 1: Feature Filtering by Prevalence
Before computing correlations, features are filtered by prevalence – the fraction of samples where the feature is detected (value > 0):
prevalence(feature_j) = count(samples where F_j > 0) / N_samples * 100
Only features with prevalence >= min_prevalence_pct (default 30%) are retained. This removes rare species whose correlations would be unreliable due to excess zeros.
Rationale: In metagenomic data, many species are present in only a few samples. Correlations between two rarely-detected species are dominated by the (0, 0) pairs and do not reflect true ecological interactions.
Step 2: Spearman Rank Correlation
Pairwise Spearman rank correlations are computed between all retained features:
rho(F_i, F_j) = Pearson correlation of rank(F_i), rank(F_j)
Spearman is preferred over Pearson because:
- Robust to outliers: Rank-based, not affected by extreme values
- Captures monotonic relationships: Not limited to linear associations
- Handles zeros: Common in sparse metagenomic data
For feature sets <= 1000, the full correlation matrix is computed at once using scipy.stats.spearmanr. For larger sets, pairwise computation is used.
Step 3: Edge Filtering
| Only edges where | rho | >= correlation_threshold (default 0.3) are retained. This produces a sparse network focused on the strongest associations. |
- Positive correlations (rho > 0): Species that increase together (co-abundance, potential cooperation or shared niche)
- Negative correlations (rho < 0): Species that show inverse abundance patterns (competition, different niches, or mutual exclusion)
Step 4: Class-Specific Networks
Networks can be computed for:
- All samples: The overall ecological structure
- Class 0 only (e.g., healthy controls): The healthy ecosystem
- Class 1 only (e.g., patients): The disease-associated ecosystem
Comparing class-specific networks reveals how disease reorganizes the microbial ecosystem – which interactions are preserved, which are disrupted, and which emerge.
Community Detection (Louvain)
The network is partitioned into modules (communities) using the Louvain algorithm:
- Start with each node in its own community
- For each node, evaluate the modularity gain of moving it to each neighbor’s community
- Move the node to the community that maximizes modularity gain
- Repeat until no improvement is possible
- Build a new network where nodes are the communities, and repeat
The modularity score Q measures the quality of the partition:
Q = (1/2m) * sum_ij [ A_ij - (k_i * k_j) / (2m) ] * delta(c_i, c_j)
where:
-
A_ij = adjacency matrix (weighted by rho ) - k_i = degree of node i
- m = total edge weight
-
delta(c_i, c_j) = 1 if nodes i and j are in the same community
- Q > 0.3: Significant community structure (typical for ecological networks)
- Q > 0.5: Strong modular organization
- Q close to 0: No clear community structure
Ecological Interpretation
Modules represent ecological niches – groups of species that tend to co-occur and may share functional roles:
- Module with many Bacteroidota: Polysaccharide degradation niche
- Module with many Bacillota: Short-chain fatty acid production niche
- Module mixing phyla: Cross-phylum metabolic interactions
- Module disrupted in disease: Potential therapeutic target
Node Metrics
Degree
The number of edges connected to a node. High-degree species are hubs – central to the network and potentially keystone species.
Betweenness Centrality
The fraction of shortest paths between all pairs of nodes that pass through a given node:
BC(v) = sum_{s != v != t} ( sigma_st(v) / sigma_st )
where sigma_st is the total number of shortest paths from s to t, and sigma_st(v) is the number passing through v.
High-betweenness species are bridges between modules – removing them would fragment the network.
Per-Class Prevalence
For each species, prevalence is computed separately in class 0 and class 1 samples. The enriched class is the one with higher prevalence:
enriched_class = 1 if prevalence_1 > prevalence_0 else 0
Mean Abundance
Average abundance across all samples (or class-specific samples). Reflects the overall contribution of the species to the community.
Taxonomic Coloring
SCAPIS Phylum Palette
A consistent color scheme assigns base colors to major bacterial phyla:
| Phylum | Color | Hex |
|---|---|---|
| Bacillota (Firmicutes) | Blue | #08519c |
| Bacteroidota | Red | #d73027 |
| Pseudomonadota (Proteobacteria) | Green | #1a9850 |
| Actinomycetota | Purple | #ae017e |
| Verrucomicrobiota | Orange | #f16913 |
| Fusobacteriota | Brown | #8c6d31 |
Family-Level Shading
Within each phylum, families are distinguished by lightening or darkening the phylum base color:
family_color = lighten(phylum_color, amount) // for family 1
family_color = darken(phylum_color, amount) // for family 2
The amount is spaced evenly across families within the phylum, producing a gradient from light to dark. This ensures:
- Each family has a visually distinct color
- Families within the same phylum are perceptually grouped
- The color scheme is consistent across analyses
Color Modes
The network supports three coloring modes:
- Taxonomy: Nodes colored by family (within phylum gradient)
- Module: Nodes colored by Louvain community (12-color palette)
- Enrichment: Nodes colored by which class they are enriched in
FBM Overlay
When a completed job is selected, the network can be annotated with data from the Family of Best Models:
FBM Prevalence
For each feature in the network, compute the fraction of FBM models that include it:
fbm_prevalence(feature) = count(FBM models containing feature) / |FBM|
Node size or opacity can be scaled by FBM prevalence, highlighting features that are consistently selected as biomarkers.
Dominant Coefficient
The dominant coefficient direction across FBM models:
coefficient = sign( sum(coefficients across FBM models) )
- +1: Feature typically contributes positively (enriched in disease)
- -1: Feature typically contributes negatively (enriched in controls)
Node shape encodes this: squares for +1 (disease-associated), circles for -1 (health-associated).
Bridging Ecological and Predictive Views
The FBM overlay connects two perspectives:
- Ecological: Which species co-occur, which compete, what modules exist
- Predictive: Which species are selected by the algorithms, with what coefficients
This reveals whether biomarkers cluster in specific ecological modules (suggesting niche-level disruption) or are scattered across the network (suggesting diffuse changes).
Layout Algorithms
Four layout algorithms are available for positioning nodes:
| Layout | Algorithm | Best for |
|---|---|---|
| Organic | Fruchterman-Reingold with simulated annealing | General purpose, reveals clusters naturally |
| Force-directed | Spring-electrical model | Large networks, even spacing |
| Circle | Nodes arranged on a circle by module | Module comparison, clean visualization |
| Radial | Hub nodes at center, others radiate outward | Highlighting central species |
Interactive Controls
- Prevalence threshold (10-80%): Filter rare species
- Correlation threshold (0.1-0.8): Filter weak associations
- Class filter: All / Class 0 / Class 1
- Color mode: Taxonomy / Module / Enrichment
- Layout: Organic / Force-directed / Circle / Radial
- FBM overlay toggle: Annotate with model data
- Module click: Highlight all nodes in a module
References
- Blondel, V. D. et al. (2008). Fast unfolding of communities in large networks. J. Stat. Mech., P10008.
- Fruchterman, T. M. J. & Reingold, E. M. (1991). Graph drawing by force-directed placement. Software: Practice and Experience, 21(11), 1129-1164.
- Cousin-Thorez, M. (2019). Interpred: Interpretation of predictive models. Internship report, ICAN.
- Prifti, E. (2024). SCAPIS ecosystem analysis. Internal report.