BGPT: Paper Review: Accelerating Drug Discovery with HyperLab: An Easy-to-Use AI-Driven Platform

Fuel Your Discoveries

Quick Explanation Copied

Concise critical summary

HyperLab is a proprietary, web based AI platform that integrates protein ligand pose and affinity prediction (Hyper Binding), virtual screening across curated libraries and an ultra large generative space (Hyper Screening and Hyper Screening X), structure based design (Hyper Design), ADME/T property prediction (Hyper ADME/T), and an AI assistant to support workflows; the authors report internal benchmarks including 77% pose prediction accuracy (PoseBuster v2), Pearson r=0.70 and 0.53 on two FEP affinity datasets, and an internal experimental validation where 52 top ranked virtual hits produced ~11.5% actives and nanomolar leads (IC50 70 600 nM) with follow up designs yielding 3/5 synthesized compounds showing >75% inhibition at 1 muM and a lead H00505 with pathway activity in ARE luciferase assays

All claims below cite the HyperLab technical report (doi=10.1101/2025.08.31.672525).

Long Explanation

Detailed critical review and analysis

What the paper claims (verbatim extracts and context)

Scope The report describes an integrated AI driven SBDD platform including Hyper Binding, Hyper Screening, Hyper Screening X (11 trillion library), Hyper Design, Hyper ADME/T and an AI assistant; the platform accepts PDB, uploaded structures, or AlphaFold predicted structures and can cofold from sequence to complex (cofolding)
Benchmarks reported PoseBuster v2 pose prediction accuracy 77% (with binding site) versus AlphaFold3 84% and Boltz2 78%; binding affinity Pearson r=0.70 and 0.53 on two FEP datasets versus multiple baselines; cofolding inference time ~3 minutes per complex versus ~15 minutes for AlphaFold3 on an RTX3060.
Experimental validation An internal case study screened 52 top ranked compounds in vitro, reporting 6 actives (11.5%) and 5 compounds showing >40% inhibition in an initial assay with IC50 values 70 600 nM; Hyper Design generated 14 derivatives, 10 judged synthetically accessible, 5 synthesized, of which 3 showed >75% inhibition at 1 muM (IC50 200 400 nM); H00505 matched or exceeded pathway activation of a reference compound in ARE luciferase HepG2 assays.

Each of the above claims is drawn directly from the HyperLab technical report (doi 10.1101/2025.08.31.672525)

Positive aspects and practical strengths

End to end integration: The platform covers pose prediction, ultra large virtual screening, generative design, ADME/T predictions and an assistant that aims to lower the technical barrier for experimental researchers; these are useful practical features for translational teams and align with trends in AIDD platforms (reported in paper)
Benchmarking on recognized tasks: use of PoseBuster v2, FEP style datasets, and comparisons to docking tools (Vina, Glide SP) give a baseline for performance claims (though see limitations below)
Experimental validation included: the internal case produced nanomolar hits and follow up synthetic/assay work demonstrating conversion of in silico designs into experimentally active molecules — a critical real world check many platform papers omit

Major limitations, blindspots, and risks

Proprietary black box and limited data/code availability The report is a technical/product report without public release of training data, model weights, exact training/test splits, or analysis scripts — this constrains reproducibility and independent verification (the paper includes no public code or dataset links)
Internal and small experimental sample sizes The in vitro case used 52 top ranked molecules chosen from the virtual screen; while 6 actives and nanomolar IC50s are promising, the small size and lack of independent replication make it plausible that enriched selection or other biases influenced the hit rate; the paper does not describe blinded selection or confirmatory orthogonal assays in detail
Benchmark selection and potential overfitting Benchmarks reported are useful but incomplete: the paper compares to multiple methods but does not always specify identical training/test splits or whether any methods were re-trained or tuned; without an independent external test set and clear dataset splits there is risk of optimistic estimates or hidden data leakage
Generality across targets The evidence set focuses on internal case studies and a set of benchmarks; claims about broad generalizability across target classes and chemotypes need external community validation on diverse, fully independent datasets (this is not present)
Commercial/test bias The authors are the HyperLab Team and HITS (company/institution behind the platform); promotional framing and absence of third party replication increases risk of sponsor bias — expected for company technical reports but important to call out

Technical critiques and reproducibility checklist

The following are specific items a reader or reviewer should request or verify to increase confidence and allow independent reproduction:

Publish model architectures, training hyperparameters, and training/validation/test splits for Hyper Binding and Hyper Screening X (including negative controls) so results can be independently reproduced.
Release or document the exact PoseBuster v2 and FEP dataset splits used, and whether baselines were used as-is or re-trained/tuned on identical data.
Provide raw screening lists, SMILES, predicted scores, and all experimental assay data (raw curve fits, replicates) for the 52 compounds and synthesized derivatives, with clear assay protocols and blinding statements.
Disclose compute budgets and cloud configurations for timing claims and for Hyper Screening X generative workflows so others can estimate cost/throughput tradeoffs.
Share metrics for false positive rates, enrichment factors, and prospective validation across multiple independent targets and chemotypes.

How to assess the core claims experimentally (falsification plan)

To falsify or validate the main claims (high pose accuracy, improved affinity prediction, and high hit rates after virtual screening) independent groups should run the following:

Benchmark Hyper Binding on publicly available, standard pose and affinity datasets with pre-registered splits and report per target family performance and failure cases.
Run blind prospective screens: have HyperLab (or an independent user) nominate top 100 virtual hits per target for at least three diverse protein targets, then synthesize and assay them in blinded, orthogonal assays (enzymatic + biophysical) at independent labs to measure true hit rates and potencies.
Compare Hyper Screening X generative outputs against matched-size generative baselines (e.g., standard GFlowNet implementations, diffusion-based molecule generators) with pre-specified multiobjective metrics (synthesizability, novelty, predicted ADME/T) and experimental follow up on selected molecules.

Conclusions and confidence

Practical conclusion: HyperLab appears to be a feature rich, production oriented AIDD platform with several promising internal benchmarks and limited experimental validation that demonstrate capability to produce nanomolar hits in focused internal studies; however the lack of public code/data, small experimental sample sizes, and single source authorship reduce independent confidence. Independent replication and public benchmark releases would materially increase confidence and scientific value.

What would change this conclusion: public release of the model code and datasets, independent third party reproductions of the reported case studies, and prospective blinded experimental validations across multiple target classes would increase confidence; discovery that reported assays or splits were non blinded or had circular data leakage would decrease confidence.

Author review links

Feedback:

Updated: October 06, 2025