An antibody LLM helps to analyse B-cells and design specific antibodies

https://www.cas.cn/syky/202604/t20260422_5107565.shtml

https://academic.oup.com/bib/article/27/2/bbag154/8653665

Researchers at the CAS Hefei Institutes of Physical Science (HFIPS) have developed BCRInsight—an antibody language model based on phenotype-aware contrastive learning. Through self-supervised learning on massive sequence datasets, this model achieves deep decoding of complex immune signals, attaining state-of-the-art performance in tasks such as antibody binding site prediction and B-cell subpopulation analysis.

BCRInsight features a 12-layer Transformer encoder and approximately 86 million trainable parameters. Unlike traditional language models that rely solely on masking techniques, the team introduced a phenotype-aware contrastive learning strategy, pre-training the model on a massive dataset comprising 80 million human BCR sequences. The model employs a joint encoding scheme—analogous to “sentence pairs” in natural language processing—to integrate amino acid sequences with metadata such as gene annotations.

Experimental results demonstrate that BCRInsight exhibits exceptional generalization and representation capabilities. In B-cell subpopulation analysis, the model can deconvolve the compositional proportions of B-cell subpopulations from highly complex bulk BCR-seq data, achieving an accuracy rate that surpasses existing models. In antibody binding site prediction tests, the model achieved an AUROC of 0.962, outperforming nine other state-of-the-art methods. Notably, without having been exposed to any 3D structural supervision signals, the model leverages its self-attention mechanism to perceive protein 3D structures, focusing specifically on the critical HCDR3 loop regions and structural support sites that determine antigen recognition.

This research provides a foundation for bridging the gap from “reading” the immune language to “writing” it—thereby guiding the rational design and optimization of disease-specific antibodies.

Most popular posts: