Publications

Research at the intersection of machine learning and therapeutic discovery. My work spans autonomous drug design systems, foundation models for proteins, and scalable ML methods for chemistry and biology.

Full publication list on Google Scholar


Lab-in-the-Loop: Autonomous Therapeutic Antibody Design

Frey, N. C. (co-lead), et al. (2025). “Lab-in-the-loop therapeutic antibody design with deep learning.” bioRxiv. doi: 10.1101/2025.02.19.639050

What it does: Orchestrates generative ML models, multi-task property predictors, active learning, and in vitro experimentation in an iterative optimization loop. Semi-autonomous antibody design.

Results: Applied to four clinically relevant targets (EGFR, IL-6, HER2, OSM). Designed and tested 1,800+ unique antibody variants from leads obtained through animal immunization and immune repertoire mining.

Impact: End-to-end automation of antibody optimization - design, property prediction, lab testing. Practical ML-in-the-loop system for therapeutic development.

Target audience: Biotech researchers evaluating AI drug discovery, antibody engineers.

Read paper


Protein Discovery with Discrete Walk-Jump Sampling

Frey, N. C., Berenberg, D., Zadorozhny, K., et al. (2024). “Protein Discovery with Discrete Walk-Jump Sampling.” International Conference on Learning Representations (ICLR 2024). Outstanding Paper Award 🏆

Method: Combines energy-based and score-based models. Learns smoothed energy function, samples from smoothed data manifold with Langevin MCMC, projects to true data manifold with one-step denoising.

Results: 97-100% of generated samples expressed and purified successfully. 70% of functional designs matched or beat known functional antibodies for binding affinity.

Impact: Resolves training and sampling difficulties in discrete generative models. Top 1.2% of ICLR submissions (Oral). Introduced distributional conformity score to benchmark protein generative models.

Target audience: ML researchers, protein engineers.

Read paperOpenReviewICLR talk

Neural Scaling of Deep Chemical Models

Frey, N. C., Soklaski, R., Axelrod, S., et al. (2023). “Neural scaling of deep chemical models.” Nature Machine Intelligence 5(11), 1297-1305.

Study: First systematic investigation of neural scaling in large chemical models. Trained models with 1B+ parameters on datasets up to 10M molecules. Investigated large language models for generative chemistry and graph neural networks for interatomic potentials.

Results: Neural scaling laws hold for chemistry models at massive scale. No plateau in loss improvement. Pre-training loss improved monotonically with dataset size up to 10M molecules and model size up to 1B+ parameters.

Impact: Chemistry and biology follow ML scaling laws. Empirical foundation for building foundation models for drug discovery. Published in Nature Machine Intelligence.

Target audience: AI for bio researchers, computational chemists

Read paperarXivCode

Protein Design with Guided Discrete Diffusion

Gruver, N., Stanton, S., Frey, N. C., Rudner, T. G., et al. (2023). “Protein Design with Guided Discrete Diffusion.” Advances in Neural Information Processing Systems (NeurIPS 2023).

Method: Guidance for discrete diffusion models following gradients in hidden states of denoising network. Enables design directly in sequence space, bypassing structure-based method limitations.

Results: Applied to antibody optimization for expression yield and binding affinity. 99% expression rate, 40% binding rate in exploratory in vitro experiments under locality and developability constraints.

Impact: Makes discrete diffusion practical for protein design. Strong performance with limited edits through novel saliency maps.

Target audience: ML researchers, protein designers, biotech researchers building guided generation systems.

Read paperNeurIPSCode

Selected Additional Publications

Machine Learning for Materials (2018-2020)

Deep Learning Infrastructure (2021)


† Denotes equal contribution where applicable.