Co-Abundance Ecosystem Network

The ecosystem network visualizes microbial species as a co-abundance network, revealing ecological relationships (cooperation, competition, niche sharing) and their connection to disease-associated biomarkers.

Inspired by the Interpred approach (Magali Cousin-Thorez internship, 2019) and the SCAPIS ecosystem work (Edi Prifti, 2024).

Network Construction

Step 1: Feature Filtering by Prevalence

Before computing correlations, features are filtered by prevalence – the fraction of samples where the feature is detected (value > 0):

prevalence(feature_j) = count(samples where F_j > 0) / N_samples * 100

Only features with prevalence >= min_prevalence_pct (default 30%) are retained. This removes rare species whose correlations would be unreliable due to excess zeros.

Rationale: In metagenomic data, many species are present in only a few samples. Correlations between two rarely-detected species are dominated by the (0, 0) pairs and do not reflect true ecological interactions.

Step 2: Spearman Rank Correlation

Pairwise Spearman rank correlations are computed between all retained features:

rho(F_i, F_j) = Pearson correlation of rank(F_i), rank(F_j)

Spearman is preferred over Pearson because:

Robust to outliers: Rank-based, not affected by extreme values
Captures monotonic relationships: Not limited to linear associations
Handles zeros: Common in sparse metagenomic data

For feature sets <= 1000, the full correlation matrix is computed at once using scipy.stats.spearmanr. For larger sets, pairwise computation is used.

Step 3: Edge Filtering

Only edges where rho >= correlation_threshold (default 0.3) are retained. This produces a sparse network focused on the strongest associations.

Positive correlations (rho > 0): Species that increase together (co-abundance, potential cooperation or shared niche)
Negative correlations (rho < 0): Species that show inverse abundance patterns (competition, different niches, or mutual exclusion)

Step 4: Class-Specific Networks

Networks can be computed for:

All samples: The overall ecological structure
Class 0 only (e.g., healthy controls): The healthy ecosystem
Class 1 only (e.g., patients): The disease-associated ecosystem

Comparing class-specific networks reveals how disease reorganizes the microbial ecosystem – which interactions are preserved, which are disrupted, and which emerge.

Community Detection (Louvain)

The network is partitioned into modules (communities) using the Louvain algorithm:

Start with each node in its own community
For each node, evaluate the modularity gain of moving it to each neighbor’s community
Move the node to the community that maximizes modularity gain
Repeat until no improvement is possible
Build a new network where nodes are the communities, and repeat

The modularity score Q measures the quality of the partition:

Q = (1/2m) * sum_ij [ A_ij - (k_i * k_j) / (2m) ] * delta(c_i, c_j)

where:

A_ij = adjacency matrix (weighted by rho )
k_i = degree of node i
m = total edge weight
delta(c_i, c_j) = 1 if nodes i and j are in the same community
Q > 0.3: Significant community structure (typical for ecological networks)
Q > 0.5: Strong modular organization
Q close to 0: No clear community structure

Ecological Interpretation

Modules represent ecological niches – groups of species that tend to co-occur and may share functional roles:

Module with many Bacteroidota: Polysaccharide degradation niche
Module with many Bacillota: Short-chain fatty acid production niche
Module mixing phyla: Cross-phylum metabolic interactions
Module disrupted in disease: Potential therapeutic target

Node Metrics

Degree

The number of edges connected to a node. High-degree species are hubs – central to the network and potentially keystone species.

Betweenness Centrality

The fraction of shortest paths between all pairs of nodes that pass through a given node:

BC(v) = sum_{s != v != t} ( sigma_st(v) / sigma_st )

where sigma_st is the total number of shortest paths from s to t, and sigma_st(v) is the number passing through v.

High-betweenness species are bridges between modules – removing them would fragment the network.

Per-Class Prevalence

For each species, prevalence is computed separately in class 0 and class 1 samples. The enriched class is the one with higher prevalence:

enriched_class = 1 if prevalence_1 > prevalence_0 else 0

Mean Abundance

Average abundance across all samples (or class-specific samples). Reflects the overall contribution of the species to the community.

Taxonomic Coloring

SCAPIS Phylum Palette

A consistent color scheme assigns base colors to major bacterial phyla:

Phylum	Color	Hex
Bacillota (Firmicutes)	Blue	#08519c
Bacteroidota	Red	#d73027
Pseudomonadota (Proteobacteria)	Green	#1a9850
Actinomycetota	Purple	#ae017e
Verrucomicrobiota	Orange	#f16913
Fusobacteriota	Brown	#8c6d31

Family-Level Shading

Within each phylum, families are distinguished by lightening or darkening the phylum base color:

family_color = lighten(phylum_color, amount)   // for family 1
family_color = darken(phylum_color, amount)     // for family 2

The amount is spaced evenly across families within the phylum, producing a gradient from light to dark. This ensures:

Each family has a visually distinct color
Families within the same phylum are perceptually grouped
The color scheme is consistent across analyses

Color Modes

The network supports three coloring modes:

Taxonomy: Nodes colored by family (within phylum gradient)
Module: Nodes colored by Louvain community (12-color palette)
Enrichment: Nodes colored by which class they are enriched in

FBM Overlay

When a completed job is selected, the network can be annotated with data from the Family of Best Models:

FBM Prevalence

For each feature in the network, compute the fraction of FBM models that include it:

fbm_prevalence(feature) = count(FBM models containing feature) / |FBM|

Node size or opacity can be scaled by FBM prevalence, highlighting features that are consistently selected as biomarkers.

Dominant Coefficient

The dominant coefficient direction across FBM models:

coefficient = sign( sum(coefficients across FBM models) )

+1: Feature typically contributes positively (enriched in disease)
-1: Feature typically contributes negatively (enriched in controls)

Node shape encodes this: squares for +1 (disease-associated), circles for -1 (health-associated).

Bridging Ecological and Predictive Views

The FBM overlay connects two perspectives:

Ecological: Which species co-occur, which compete, what modules exist
Predictive: Which species are selected by the algorithms, with what coefficients

This reveals whether biomarkers cluster in specific ecological modules (suggesting niche-level disruption) or are scattered across the network (suggesting diffuse changes).

Layout Algorithms

Four layout algorithms are available for positioning nodes:

Layout	Algorithm	Best for
Organic	Fruchterman-Reingold with simulated annealing	General purpose, reveals clusters naturally
Force-directed	Spring-electrical model	Large networks, even spacing
Circle	Nodes arranged on a circle by module	Module comparison, clean visualization
Radial	Hub nodes at center, others radiate outward	Highlighting central species

Interactive Controls

Prevalence threshold (10-80%): Filter rare species
Correlation threshold (0.1-0.8): Filter weak associations
Class filter: All / Class 0 / Class 1
Color mode: Taxonomy / Module / Enrichment
Layout: Organic / Force-directed / Circle / Radial
FBM overlay toggle: Annotate with model data
Module click: Highlight all nodes in a module

References

Blondel, V. D. et al. (2008). Fast unfolding of communities in large networks. J. Stat. Mech., P10008.
Fruchterman, T. M. J. & Reingold, E. M. (1991). Graph drawing by force-directed placement. Software: Practice and Experience, 21(11), 1129-1164.
Cousin-Thorez, M. (2019). Interpred: Interpretation of predictive models. Internship report, ICAN.
Prifti, E. (2024). SCAPIS ecosystem analysis. Internal report.