amy x. lu 🌱

I’m a Computer Science PhD student in Pieter Abbeel’s group at UC Berkeley and Berkeley Artificial Intelligence Research, and a part-time member of Genentech’s Prescient Design team.

I’m interested in AI for drug discovery, and more broadly, in designing task-agnostic systems that can reason about the molecular-level world.

amyxlu [at] berkeley [dot] edu

Amy X. Lu

News

2025-04-28 I’m giving my PhD Dissertation Talk at the BAIR seminar series 👩‍🎓.
2025-04-26 We won the seed grant challenge at the ICLR 2025 GEM workshop!
2025-04-23 Excited to be in Singapore for ICLR 2025 to present our work on protein language model likelihoods.

Selected Publications

See Google Scholar for an up-to-date list of publications.
Genome modeling and design across all domains of life with Evo 2
Genome modeling and design across all domains of life with Evo 2
Garyk Brixi , Matthew G. Durrant , Jerome Ku , Michael Poli , Greg Brockman , Daniel Chang , Gabriel A. Gonzalez , Samuel H. King , David B. Li , Aditi T. Merchant , Mohsen Naghipourfar , Eric Nguyen , Chiara Ricci-Tam , David W. Romero , Gwanggyu Sun , Ali Taghibakshi , Anton Vorontsov , Brandon Yang , Myra Deng , Liv Gorton , Nam Nguyen , Nicholas K. Wang , Etowah Adams , Stephen A. Baccus , Steven Dillmann , Stefano Ermon , Daniel Guo , Rajesh Ilango , Ken Janik , Amy X. Lu , Reshma Mehta , Mohammad R.K. Mofrad , Madelena Y. Ng , Jaspreet Pannu , Christopher RĂ© , Jonathan C. Schmok , John St. John , Jeremy Sullivan , Kevin Zhu , Greg Zynda , Daniel Balsam , Patrick Collison , Anthony B. Costa , Tina Hernandez-Boussard , Eric Ho , Ming-Yu Liu , Thomas McGrath , Kimberly Powell , Dave P. Burke , Hani Goodarzi , Patrick D. Hsu , Brian L. Hie
bioRxiv, 2025
Evo 2 is a 40B parameter genomic foundation model capable of predicting functional impacts of genetic variations, autonomously learning biological features, and generating novel genomic sequences across all domains of life.
All-Atom Protein Generation with Latent Diffusion
All-Atom Protein Generation with Latent Diffusion
Amy X. Lu , Wilson Yan , Sarah A. Robinson , Simon Kelow , Kevin K. Yang , Vladimir Gligorijevic , Kyunghyun Cho , Richard Bonneau , Pieter Abbeel , Nathan Frey
bioRxiv, 2024
PLAID is a multimodal protein generation model that generates all-atom protein structures from function and organism prompts, but requires only sequence training data.
Protein Language Model Fitness Is a Matter of Preference
Protein Language Model Fitness Is a Matter of Preference
Cade Gordon , Amy X. Lu , Pieter Abbeel
International Conference on Learning Representations (ICLR), 2025
Enabled by a one-pass pseudolikelihood algorithm, we find that pLMs capture artifacts of training data selection rather than true fitness landscape via influence functions.
Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure
Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure
Amy X. Lu , Wilson Yan , Kevin K. Yang , Vladimir Gligorijevic , Kyunghyun Cho , Pieter Abbeel , Richard Bonneau , Nathan Frey
bioRxiv, 2024
CHEAP is a joint embedding of protein sequence and structure that can be obtained from sequence alone, and unveil insights into the compressibilitiy, tokenizability, and mechanistic interpretability of protein folding models.
Self-Supervised Contrastive Learning of Protein Representations by Mutual Information Maximization
Self-Supervised Contrastive Learning of Protein Representations by Mutual Information Maximization
Amy X. Lu , Haoran Zhang , Marzyeh Ghassemi , Alan Moses
Machine Learning for Computational Biology (MLCB), 2020
CPCProt uses contrastive learning to learn a parameter-efficient way of embedding proteins, and performs competitively with large language models.

Talks

2025-03-03 Baker Lab Journal Club, UW Institute for Protein Design [Slides]
2025-03-02 Latent Labs Journal Club