Usage
Web Application
Basic Workflow
- Create a project – Click “New project” in the Projects page
- Upload datasets – In the Data tab, upload your training files (Xtrain.tsv, Ytrain.tsv) and optionally test files (Xtest.tsv, Ytest.tsv). You can also pick from the dataset library or load demo datasets.
- Explore data – Use the Data Explorer to inspect feature statistics, prevalence distributions, volcano plots, and barcode visualizations. Apply feature filtering (Wilcoxon, t-test, Bayesian Fisher).
- Configure parameters – In the Parameters tab, choose your algorithm (GA, Beam, MCMC), model language (binary, ternary, ratio), and adjust settings like population size, max epochs, k range, and cross-validation folds. Use templates for preset configurations.
- Launch analysis – Click “Launch Analysis”. Monitor progress in real time via the console panel with live sparkline chart.
- Explore results – Once completed, browse:
- Summary: best AUC, k, timing, generation tracking charts
- Population: feature heatmap, violin plots, prevalence analysis
- Jury: ensemble voting, confusion matrices, vote matrix, sample predictions
- Comparative: compare multiple jobs side-by-side
- Co-presence: feature co-occurrence analysis
- Ecosystem: co-abundance network with taxonomic coloring and module detection
- Stability: model stability indices (Kuncheva, Tanimoto, CW_rel), feature × sparsity heatmap, model clustering dendrogram
Batch Mode
In the Parameters tab, enable “Batch Mode” to sweep across multiple configurations:
- Seeds (e.g., 42, 123, 456)
- Algorithms (GA, Beam, MCMC)
- Languages (bin, ter, ratio)
- Data types (raw, prev)
- Population sizes, max epochs, k_max values
The system generates all combinations and launches them as separate jobs (up to 50 per batch).
External Validation
After training, validate your model on an independent cohort:
- In the Results tab, select a completed job
- Click “Validate on New Data”
- Upload the validation X matrix (and optionally Y labels)
- View AUC, accuracy, confusion matrix, and per-sample predictions
Prediction API
Deploy trained models as REST endpoints:
curl -X POST http://localhost:8001/api/v1/projects/{id}/jobs/{job_id}/predict \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"features": {"species_A": 0.12, "species_B": 0.05}}'
Exporting Results
From the Results tab, click “Export” to access:
- PDF biomarker report (publication-ready)
- HTML report (self-contained)
- CSV files (best model, population, generations, jury predictions)
- Python notebook (.ipynb)
- R notebook (.Rmd)
- Full JSON
Python (gpredomicspy)
Basic Analysis
import gpredomicspy
# Load parameters from YAML
param = gpredomicspy.Param()
param.load("params.yaml")
# Run the evolutionary search
experiment = gpredomicspy.fit(param)
# Display results with jury voting
experiment.display_results()
Accessing Results
# Best individual from the population
best = experiment.best_population().best()
print(best.get_metrics()) # AUC, accuracy, sensitivity, specificity
print(best.get_features()) # Feature names and coefficients
print(best.get_k()) # Number of features
# Generation tracking
tracking = experiment.generation_tracking()
# Jury results
jury = experiment.jury_results()
Parameter Configuration
param = gpredomicspy.Param()
param.set_algorithm("ga") # ga, beam, mcmc
param.set_language("ter") # bin, ter, ratio
param.set_data_type("raw") # raw, prev
param.set_max_epochs(200)
param.set_population_size(100)
param.set_k_range(3, 15)
param.set_n_folds_outer(5)
param.set_seed(42)
param.set_compute_importance(True)
param.set_voting(True)
R Package
library(predomics)
# Load data
data <- loadData("Xtrain.tsv", "Ytrain.tsv")
# Run analysis
result <- mainFunction(data,
language = "ter",
algorithm = "ga",
populationSize = 100,
maxEpochs = 200
)
# View results
print(result)
plotResults(result)