Literature Notes - ESM2

less than 1 minute read

Published: December 24, 2024

Traditional protein prediction models rely heavily on evolutionary information, often requiring computationally expensive multiple sequence alignment (MSA) inputs.
ESM2, powered by large language models (LLMs), learns evolutionary patterns directly from raw protein sequences, eliminating MSA requirements and simplifying the computational pipeline.
Achieves a ~60x faster inference compared to prior state-of-the-art methods, facilitating studies on vast metagenomic datasets (e.g., MGnify90).

Atomic Resolution

Scaled up model parameters from 8M to 15B with a BERT-style masked language modeling objective.
Experiments demonstrate its capability to predict both low-resolution (contact maps) and high-resolution (atomic-level) structures.
Highlights that scaling the model enhances its ability to encode and predict protein representations.

Speed Improvements

Reduces protein structure prediction time from over 10 minutes to under 1 minute on a single NVIDIA V100 GPU.
Matches or closely approaches the performance of AlphaFold, achieving significant speedup while maintaining high accuracy.

Graph: Influence on Protein Prediction Evolution

Share on

Twitter Facebook LinkedIn

You May Also Enjoy

Literature Notes - Geneformer

3 minute read

Published: December 26, 2024

Traditional methods for analyzing single-cell transcriptomics rely on cell-type-specific annotations or perturbation-based studies, which can be resource-intensive and dataset-specific.
Geneformer, a transformer-based foundation model, learns transcriptional dynamics directly from large-scale unlabelled single-cell transcriptomes, enabling context-aware gene mapping across diverse cellular states and conditions.
Pretrained on Genecorpus-30M (29.9 million single-cell transcriptomes), Geneformer provides a robust framework for a wide range of downstream applications, from dosage sensitivity prediction to in silico gene perturbation analysis.

Literature Notes - ESM3

1 minute read

Published: December 25, 2024

ESM3 pushes the boundaries of protein design by integrating sequence, structure, and function into a single unified framework.
It scales up to 98 billion parameters and leverages multimodal inputs, enabling generative design and evolutionary insights.
Simulates evolutionary processes, producing proteins equivalent to 500 million years of natural evolution within days.

Literature Notes - Physics-informed machine learning

1 minute read

Published: April 22, 2024

The text discusses the advancements in Physics-Informed Neural Networks (PINNs) facilitated by their integration with modern machine learning libraries such as TensorFlow, PyTorch, and specialized frameworks like DeepXDE and SimNet. PINNs excel in incorporating physical laws into machine learning models, making them particularly adept at handling complex, multidimensional scientific problems.

Literature Notes - MolFormer

1 minute read

Published: April 11, 2024

The study highlights the capabilities of the Molecule Transformer (M-Transformer), an unsupervised, pretrained molecular language model that excels in predicting molecular properties from SMILES sequences. This model surpasses traditional graph-based models in various benchmarks, efficiently utilizes computational resources by reducing GPU usage by a factor of 60, and accurately captures interatomic relationships. Further exploration into expanding its applicability beyond small organic molecules is recommended.