This article was originally published on Medium. Read the full version here.
The Problem
mixOmics provides powerful multivariate analysis tools for multi-omics studies, but hyperparameter tuning has been limited to basic grid search with minimal performance metrics. Users faced computational bottlenecks, inadequate parameter exploration, and insufficient model evaluation — leading to suboptimal analyses and questionable scientific conclusions.
In practice, a typical block.splsda model requires tuning across ncomp (number of components), keepX (variable selection per block), and stratified cross-validation folds. With 5 components × 5 gene thresholds × 5 miRNA thresholds, an exhaustive grid search evaluates 125 combinations — each requiring full cross-validation.
The Solution: tuneR
tuneR is a systematic framework for hyperparameter optimization with two key innovations:
- Random search with comparable accuracy: On the repository benchmark harness, random search tests 50 of 125 combinations, cuts median wall time by 60.5%, and still matches the best observed accuracy on the controlled workload.
- Reproducible benchmark artifacts: The repository now includes a dedicated benchmark harness with raw runs, summary files, and environment metadata for rechecking the claim on the current working tree.
Quick Start
library(tuneR)
library(mixOmics)
# Load example multi-omics data
data(breast.tumors)
X_blocks <- list(
genes = breast.tumors$gene,
mirnas = breast.tumors$miRNA
)
Y_treatment <- breast.tumors$sample$treatment
# Grid search tuning
result_grid <- tune(
method = "block.splsda",
data = list(X = X_blocks, Y = Y_treatment),
ncomp = c(1, 2, 3),
test.keepX = list(
genes = c(20, 50, 100),
mirnas = c(10, 20, 30)
),
search_type = "grid",
nfolds = 5,
stratified = TRUE
)
# View results
print(result_grid)
summary(result_grid)
plot(result_grid)
Random Search for Efficiency
result_random <- tune(
method = "block.splsda",
data = list(X = X_blocks, Y = Y_treatment),
ncomp = c(1, 2, 3, 4, 5),
test.keepX = list(
genes = c(20, 50, 100, 150, 200),
mirnas = c(10, 20, 30, 40, 50)
),
search_type = "random",
n_random = 50,
nfolds = 5
)
# Compare the random-search results against the benchmark summary artifacts
plot(result_random, type = "scatter")
Key Features
- Advanced Search Strategies: Both exhaustive grid search and efficient random search algorithms
- Comprehensive Metrics: Accuracy and error-rate summaries alongside saved benchmark artifacts
- Computational Efficiency: Random search cuts median wall time by 60.5% on the repository’s controlled benchmark harness
- Robust Cross-Validation: Stratified sampling with flexible fold configuration
- Rich Visualizations: Parameter landscapes, performance distributions, and optimization paths
- Benchmark Traceability: Raw runs, summary tables, and environment metadata live alongside the code
- Extensible Design: Framework ready for additional mixOmics methods
- Repository-Backed Validation: Controlled benchmark scripts can be rerun against the current working tree
Why This Matters
In computational biology, the difference between a well-tuned and poorly-tuned model can mean the difference between identifying a real biomarker and reporting noise. tuneR makes rigorous parameter tuning accessible by automating the statistically sound practices that were previously manual and error-prone.
The same pipeline design — systematic exploration, metric-driven optimization, reproducibility — transfers directly to any domain where model selection impacts operational decisions.
Source code: github.com/omar391/tuneR