Blog posts

2024

Literature Notes - Geneformer

2 minute read

Published: December 26, 2024

Traditional methods for analyzing single-cell transcriptomics rely on cell-type-specific annotations or perturbation-based studies, which can be resource-intensive and dataset-specific.
Geneformer, a transformer-based foundation model, learns transcriptional dynamics directly from large-scale unlabelled single-cell transcriptomes, enabling context-aware gene mapping across diverse cellular states and conditions.
Pretrained on Genecorpus-30M (29.9 million single-cell transcriptomes), Geneformer provides a robust framework for a wide range of downstream applications, from dosage sensitivity prediction to in silico gene perturbation analysis.

Literature Notes - ESM3

1 minute read

Published: December 25, 2024

ESM3 pushes the boundaries of protein design by integrating sequence, structure, and function into a single unified framework.
It scales up to 98 billion parameters and leverages multimodal inputs, enabling generative design and evolutionary insights.
Simulates evolutionary processes, producing proteins equivalent to 500 million years of natural evolution within days.

Literature Notes - ESM2

less than 1 minute read

Published: December 24, 2024

Traditional protein prediction models rely heavily on evolutionary information, often requiring computationally expensive multiple sequence alignment (MSA) inputs.
ESM2, powered by large language models (LLMs), learns evolutionary patterns directly from raw protein sequences, eliminating MSA requirements and simplifying the computational pipeline.
Achieves a ~60x faster inference compared to prior state-of-the-art methods, facilitating studies on vast metagenomic datasets (e.g., MGnify90).

Literature Notes - Physics-informed machine learning

1 minute read

Published: April 22, 2024

The text discusses the advancements in Physics-Informed Neural Networks (PINNs) facilitated by their integration with modern machine learning libraries such as TensorFlow, PyTorch, and specialized frameworks like DeepXDE and SimNet. PINNs excel in incorporating physical laws into machine learning models, making them particularly adept at handling complex, multidimensional scientific problems.

Literature Notes - MolFormer

1 minute read

Published: April 11, 2024

The study highlights the capabilities of the Molecule Transformer (M-Transformer), an unsupervised, pretrained molecular language model that excels in predicting molecular properties from SMILES sequences. This model surpasses traditional graph-based models in various benchmarks, efficiently utilizes computational resources by reducing GPU usage by a factor of 60, and accurately captures interatomic relationships. Further exploration into expanding its applicability beyond small organic molecules is recommended.

Mingxuan Li

Blog posts

2024

Literature Notes - Geneformer

Literature Notes - ESM3

Literature Notes - ESM2

Literature Notes - Physics-informed machine learning

Literature Notes - MolFormer