Published on Mar 10, 2025 by Arcadia Science

Paired residue prediction dependencies in ESM2

During a quick analysis of the ESM2 model for masked token prediction, we noticed that amino acid probability distributions of residues affect each other in a pattern that mirrors a protein’s 3D contact map. But less so for the larger model sizes. Our question to you is, why?

Paired residue prediction dependencies in ESM2

Purpose

This notebook details a quick analysis we performed on the ESM2 (Evolutionary Scale Modeling) models [1] for masked token prediction. We stumbled upon a counterintuitive result related to the effect that masking one residue has on the distribution of another.

Before jumping into our results, let’s first establish what masked token prediction is, how it works, and our consequent motivation for this analysis.

View the notebook

The full notebook pub is available here.

The source code to generate it is available in this GitHub repo (DOI: 10.5281/zenodo.15002836).

In the future, we hope to host notebook pubs directly on our publishing platform. Until that’s possible, we’ll create stubs like this with key metadata like the DOI, author roles, citation information, and an external link to the pub itself.


D
Daniel Burns
Critical Feedback, Formal Analysis
K
Keith Cheveralls
Critical Feedback, Validation
E
Evan Kiefl
Conceptualization, Formal Analysis, Investigation, Methodology, Software, Visualization, Writing