Back to Writing
12 min read

tuneR: A Deep Dive into Hyperparameter Optimization with mixOmics

R Statistics mixOmics Computational Biology Open Source

This article was originally published on Medium. Read the full version here.

The Problem

mixOmics provides powerful multivariate analysis tools for multi-omics studies, but hyperparameter tuning has been limited to basic grid search with minimal performance metrics. Users faced computational bottlenecks, inadequate parameter exploration, and insufficient model evaluation — leading to suboptimal analyses and questionable scientific conclusions.

In practice, a typical block.splsda model requires tuning across ncomp (number of components), keepX (variable selection per block), and stratified cross-validation folds. With 5 components × 5 gene thresholds × 5 miRNA thresholds, an exhaustive grid search evaluates 125 combinations — each requiring full cross-validation.

The Solution: tuneR

tuneR is a systematic framework for hyperparameter optimization with two key innovations:

  1. Random search with comparable accuracy: On the repository benchmark harness, random search tests 50 of 125 combinations, cuts median wall time by 60.5%, and still matches the best observed accuracy on the controlled workload.
  2. Reproducible benchmark artifacts: The repository now includes a dedicated benchmark harness with raw runs, summary files, and environment metadata for rechecking the claim on the current working tree.

Quick Start

library(tuneR)
library(mixOmics)

# Load example multi-omics data
data(breast.tumors)
X_blocks <- list(
  genes = breast.tumors$gene,
  mirnas = breast.tumors$miRNA
)
Y_treatment <- breast.tumors$sample$treatment

# Grid search tuning
result_grid <- tune(
  method = "block.splsda",
  data = list(X = X_blocks, Y = Y_treatment),
  ncomp = c(1, 2, 3),
  test.keepX = list(
    genes = c(20, 50, 100),
    mirnas = c(10, 20, 30)
  ),
  search_type = "grid",
  nfolds = 5,
  stratified = TRUE
)

# View results
print(result_grid)
summary(result_grid)
plot(result_grid)

Random Search for Efficiency

result_random <- tune(
  method = "block.splsda",
  data = list(X = X_blocks, Y = Y_treatment),
  ncomp = c(1, 2, 3, 4, 5),
  test.keepX = list(
    genes = c(20, 50, 100, 150, 200),
    mirnas = c(10, 20, 30, 40, 50)
  ),
  search_type = "random",
  n_random = 50,
  nfolds = 5
)

# Compare the random-search results against the benchmark summary artifacts
plot(result_random, type = "scatter")

Key Features

  • Advanced Search Strategies: Both exhaustive grid search and efficient random search algorithms
  • Comprehensive Metrics: Accuracy and error-rate summaries alongside saved benchmark artifacts
  • Computational Efficiency: Random search cuts median wall time by 60.5% on the repository’s controlled benchmark harness
  • Robust Cross-Validation: Stratified sampling with flexible fold configuration
  • Rich Visualizations: Parameter landscapes, performance distributions, and optimization paths
  • Benchmark Traceability: Raw runs, summary tables, and environment metadata live alongside the code
  • Extensible Design: Framework ready for additional mixOmics methods
  • Repository-Backed Validation: Controlled benchmark scripts can be rerun against the current working tree

Why This Matters

In computational biology, the difference between a well-tuned and poorly-tuned model can mean the difference between identifying a real biomarker and reporting noise. tuneR makes rigorous parameter tuning accessible by automating the statistically sound practices that were previously manual and error-prone.

The same pipeline design — systematic exploration, metric-driven optimization, reproducibility — transfers directly to any domain where model selection impacts operational decisions.


Source code: github.com/omar391/tuneR

Thanks for reading. If you found this useful, feel free to DM me on X.