Skip to main content

Table 3 Overview of selected AI-based computational studies: applications and insights

From: Revolutionizing oncology: the role of Artificial Intelligence (AI) as an antibody design, and optimization tools

S. No

Year

Problem Addressed

Algorithms Used

Description

Key Results

Main Conclusion

References

1

1999

Antibody engineering, targeted mutations

X-PLOR and REFMAC for structure optimization, Surface Plasmon Resonance (SPR) for kinetic analysis, Structural modeling using Brookhaven Protein Data Bank (1bj1)

Enhanced the affinity and potency of Fab-12 targeting VEGF for cancer treatment. Targeted mutations in heavy-chain CDRs improved binding affinity, with CDR-H2 and CDR-H3 having the most significant impact

Y0243-1 variant (CDR-H1) showed a threefold affinity increase, while Y0317 (CDR-H3) achieved a 20-fold improvement. Final variant Y0317, with six mutations, resulted in a 100-fold increase in potency. Structural analysis and X-ray crystallography confirmed improved binding interactions

CDR mutations and phage display significantly enhanced Fab-12 affinity for VEGF, demonstrating the role of targeted mutations and computational modeling in antibody engineering

[283]

2

1987

Understanding canonical structures in immunoglobulins

Structural analysis of immunoglobulins, Sequence-structure correlation mapping

Linked immunoglobulin amino acid sequences to 3D structures of antigen-binding sites, identifying key residues that shape hypervariable regions crucial for antigen binding

Identified key residues in hypervariable regions and β-sheet frameworks dictating main-chain conformations. Revealed that many immunoglobulins share predictable structural patterns, categorized as 'canonical structures.' Demonstrated that structural predictions based on sequence data improve antibody modeling and engineering

Canonical structures provide a framework for predicting antibody structures from sequences, enhancing the accuracy of antigen-binding site modeling for therapeutic applications

[312]

3

2000

Antibody modeling

WAM (Web Antibody Modelling) algorithm, CONGEN conformational search, Energy screening via Eureka (a solvent-modified VFF), Clustering using RMS deviation

Developed WAM, an improved antibody modeling algorithm focusing on variable regions (Fv) and CDRs, particularly enhancing CDR-H3 loop modeling using canonical class concepts and energy screening methods

Achieved higher accuracy (1.7–2.0 Å rmsd compared to traditional methods. Energy screening using a solvent-modified VFF (Eureka) improved structural refinement. Statistical analysis showed rmsd values as low as 1.3 Å for short loops and up to 2.0 Å for longer loops

WAM enhances antibody structure prediction accuracy through knowledge-based screening and energy optimization, offering an advanced modeling approach accessible online

[313]

4

2006

Affinity improvement of antibodies

Structure-based computational design, Side chain repacking, Electrostatic optimization, Energy evaluation techniques

Optimized antibody binding affinity to the I-domain of integrin VLA1 using computational modeling, refining amino acid positioning and electrostatic interactions to enhance affinity. Experimental validation included competition ELISA and KinExA assays

Created a quadruple mutant with tenfold increased affinity to the VLA1 integrin I-domain. Hit rate analysis (12% success from 83 mutants) and EC50 comparisons quantified improvements. Crystal structures confirmed hydrogen bonding's role in high-affinity mutations

Structure-based computational design significantly enhances antibody binding affinity, demonstrating the role of electrostatic optimization and iterative modeling in therapeutic antibody development

[162]

5

2007

Comparison of B-cell epitope prediction methods

Scale-based (DiscoTope, PIER, ProMate, ConSurf), Patch-based (CEP, PPI-PRED, ClusPro (DOT), PatchDock)

Evaluation of eight web servers for B-cell epitope prediction using probability scores and docking-based interface identification

PatchDock achieved the highest AUC (> 0.69) and sensitivity (> 75%). ProMate and PPI-PRED showed moderate performance, with ProMate slightly outperforming PPI-PRED. ConSurf and DiscoTope exhibited poor performance (AUC ~ 0.6), while ClusPro (DOT) and CEP performed near random

PatchDock was the most effective method, while others had limited success

[306]

6

2007

Improving antibody affinity

Hierarchical computational design, A* search and Dead-end elimination, Poisson-Boltzmann electrostatics

Developed a computational method to enhance antibody affinity by optimizing electrostatic interactions, evaluating single mutations systematically, and incorporating experimental feedback for iterative refinement

Achieved a tenfold affinity increase in cetuximab and a 140-fold improvement in D44.1. Identified beneficial mutations in bevacizumab and 4–4–20. Demonstrated a 460% success rate in predicting beneficial single mutations, with D44.1 improving to 30 pM and cetuximab to 52 pM

Iterative computational design significantly enhances antibody affinity, demonstrating the effectiveness of computational methods in optimizing protein binding affinity for therapeutic applications

[281]

7

2008

Identification of CDRs and B-cell epitope characteristics

Automated structure-based method leveraging antibody-protein complexes

Developed an automated method to accurately identify CDRs and B-cell epitopes by analyzing structural regions from known antibody-protein complexes, overcoming sequence-based tool limitations

CDRs exhibit a highly restricted composition dominated by four amino acids, while aliphatic-hydrophobic residues (A, I, L, V) are underrepresented. Histidine uniquely maintains consistent interactions across different molecular contexts

Structural biology and automation improve CDR identification, enhancing insights into antibody-antigen interactions

[267]

8

2008

Antibody loop replacement study

Rosetta software for structural modeling and hydrogen bond calculations, High-throughput screening, Tailored DNA libraries

Explored CDR loop length modifications to enhance antigen binding and improve affinity. Tested whether longer CDR loops could create new interactions with the VLA1 antigen, expanding antibody design possibilities

Affinity comparisons showed over 100-fold lower binding in modified constructs compared to the wild-type antibody. Replacing the L1 loop in an anti-VLA1 antibody did not improve affinity due to structural instability caused by domain swapping. A second round of modifications (Leu51 to Ser mutation) stabilized the loop and eliminated dimerization

Structural instability limited the success of CDR loop replacements, but targeted mutations stabilized the loop, highlighting the need for careful loop engineering in antibody design

[286]

9

2008

Antibody affinity maturation

In silico modeling, Molecular dynamics, Docking, CDR walking mutagenesis (targeting CDR-H3 and CDR-L3)

Optimized the affinity of human anti-gastrin TA4 scFv for therapeutic use. Combined computational modeling with experimental validation, using phage display technology, CDR walking mutagenesis, and molecular dynamics to assess stability and key contact residues

Achieved a 454-fold affinity improvement (KD = 13.2 nM). Targeted mutagenesis focused on CDR-H3 and CDR-L3 to reduce antigenicity and immunogenicity. Structural modeling guided rational design, demonstrating the effectiveness of computational methods

The integration of computational modeling and experimental approaches effectively accelerates high-affinity therapeutic antibody development, demonstrating the potential of structure-based design

[285]

10

2008

Antibody homology modeling and antibody-antigen docking

RosettaAntibody protocol, Monte Carlo minimization, Rigid-body optimization, Comparative modeling, De novo CDR H3 modeling, Ensemble docking predictions

Developed RosettaAntibody, a high-resolution homology modeling protocol for antibody variable regions (Fv). Combines canonical CDR loop modeling, de novo CDR H3 predictions, and VL-VH orientation optimization to improve structure-based applications

Achieved a median RMSD of 1.5 Å for antigen-binding pockets. Demonstrated moderate to high docking accuracy for 7 of 15 targets. Statistical validation via rmsd benchmarking showed many models aligned within 2.0 Å of crystal structures

RosettaAntibody improves antibody homology modeling and docking accuracy, providing valuable insights for computational docking, therapeutic antibody engineering, and protein design

[152]

11

2010

CDR design

OptCDR (De novo CDR generation), Structural refinement, Mutation prediction, Computational selection for affinity optimization

Developed OptCDR, a computational framework for de novo CDR generation, optimizing binding affinity and specificity. Unlike traditional methods modifying existing antibodies, OptCDR mimics natural evolution to identify favorable mutations, streamlining iterative refinement and accelerating antibody discovery

Tested on three antigens (fluorescein, hepatitis C capsid peptide, and VEGF), demonstrating high-affinity antibody design across diverse targets. Employed statistical scoring functions for binding optimization and computational benchmarking against known binders

OptCDR efficiently generates diverse antibody libraries with enhanced specificity and affinity, proving its value as a computational alternative to time-consuming experimental approaches

[80]

12

2012

Structural analysis of B-cell epitopes

Hobohm2 algorithm for redundancy removal, bootstrapping for statistical validation

Used interaction vectors and amino acid pair frequencies to define antigen–antibody similarity, applying bootstrapping to assess statistical distributions for epitope composition trends

B-cell epitopes are flat, oval-shaped, and align with antibody binding sites at −30° to 60°. Typically consist of ~ 15 residues with a hydrophobic core for stability and charged edges for interaction. Hydrophilic residues dominate the binding interface, while positively charged residues are underrepresented

B-cell epitopes differ from general protein–protein interfaces, necessitating specialized prediction models for improved immunology and vaccine design

[314]

13

2013

Antibody library and optimization of CDR

Predator synthetic antibody library, Targeted randomization, Trinucleotide synthesis, Hydrophilic mutation selection

Developed Predator, a synthetic antibody library designed to improve folding, reduce aggregation, and enhance functional antibody selection. Features a high diversity (6.2 × 10⁷ clones) with optimized CDR design for increased binding efficiency

Predator mimics the human immune response by designing CDR3 compositions based on functional human antibodies. Incorporates targeted randomization at seven positions, omitting suboptimal amino acids while optimizing for stability and solubility. Phage display selection yielded specific antigen binders

Predator is an effective synthetic antibody library built on an aggregation-resistant HEL4 scaffold, allowing modular cloning and affinity maturation for improved therapeutic antibody development

[308]

14

2014

Prediction of antibody-specific B-cell epitopes

Custom computational framework using residue-pairing and interface characteristics

Developed a method utilizing antibody sequences to identify discontinuous epitopes on antigens, emphasizing residue-pairing and interface characteristics

A dataset of 646 Ab-Ag structures from PDB was compiled, and antibody sequences were clustered using BLASTCLUST (≥ 97% identity). Computational predictions refined through Patch-per-Ab and Patch-per-group approaches improved accuracy in D8 antigen analysis. Validation techniques included X-ray crystallography, peptide ELISA, deuterium exchange, site-directed mutagenesis, and cross-blocking

Integrating antibody-specific predictions with cross-blocking experiments enhances precision in identifying overlapping epitopes, confirming residues within conformational B-cell epitopes

[315]

15

2014

Antibody generation and immunogenicity minimization

OptMAVEn (de novo modeling), Mixed-Integer Linear Programming (MILP), Iterative Protein Redesign & Optimization (IPRO), MHC/T-cell epitope quantification (HSC score)

Developed OptMAVEn, a computational method for designing antibody variable regions with optimized affinity and reduced immunogenicity, eliminating reliance on existing antibodies or immunized animals. Enhances binding affinity using biophysical models and a humanization procedure to minimize immunogenicity

OptMAVEn efficiently positioned antigens, rediscovered native structures, and mimicked natural evolution to improve affinity. Validated against targets like influenza hemagglutinin and HIV gp120, ensuring lower immune recognition through a 9-mer approach. Modular Antibody Parts (MAPs) database selected components for high affinity and low immunogenicity

OptMAVEn extends OptCDR for de novo antibody design, integrating humanization strategies to minimize immune response, accelerating therapeutic and vaccine antibody development

[140]

16

2014

Antibody design with elongated CDRs

Scaffold-based antibody engineering, Computational modeling, X-ray crystallography, Flow cytometry, Tag-lite HTRF binding assays

Engineered antibodies with elongated CDRs, particularly CDRH3 from bovine antibodies (BLV1H12), to enhance binding affinity and access hidden ligand sites, expanding therapeutic potential

Modified CXCR4-binding peptides were grafted into CDRH3 and CDRH2, optimizing binding without steric hindrance. Engineered antibodies (bAb-AC1, bAb-AC2, bAb-AC3) bound CXCR4-expressing cells with high affinity (Kd = 2.1–19.8 nM), blocking SDF-1-induced CXCR4 activation and inhibiting cell migration

Scaffold-based antibody engineering using elongated CDRs provides high specificity and therapeutic potential for targeting CXCR4-related diseases, including cancer and HIV

[316]

17

2017

All-Atom Energy Function for Macromolecular Modeling and Design

Rosetta Energy Function (REF15), van der Waals interactions, Electrostatics, Torsional energy terms, PyRosetta

Developed REF15, an enhanced energy function for the Rosetta suite, improving biomolecular modeling accuracy. Originally designed for proteins, it now supports RNA, DNA, carbohydrates, and synthetic macromolecules

REF15 refines structural insights by optimizing energy calculations, improving computational speed, and expanding applications in drug discovery, synthetic biology, and nanomaterial design

REF15 significantly enhances Rosetta's capabilities in biomolecular modeling, strengthening its role in structural biology, vaccine development, and therapeutic engineering

[317]

18

2017

Computational design of stable and functional binding antibodies

AbDesign algorithm, Rosetta design calculations, Statistical modeling based on antibody datasets

Developed a computational framework for designing stable and functional Fvs, addressing structural challenges such as large loops and buried polar networks. Used sequence constraints from natural antibodies to maintain backbone integrity and enhance stability

Engineered Fvs achieved mid-nanomolar affinities and stability comparable to natural antibodies. Iterative design and experimentation refined structures by correcting flaws like unpaired charges and cavities. Crystallography confirmed atomic accuracy of designed Fvs, supporting precise antibody engineering

The study established principles for designing stable, functional antibodies, demonstrating that iterative computational design and sequence constraints enable high-affinity, structurally stable antibody fragments

[284]

19

2019

Protein Engineering using Generative Models

Score Matching, Pseudolikelihood, Loopy Belief Propagation, Stochastic Backpropagation, Generative Adversarial Nets, Deep AutoRegressive Networks, Reweighted Wake-Sleep

Protein design using generative models, improving sequence-structure consistency and predicting novel functional variants for efficient protein engineering

Enhanced structural accuracy and sequence variability, improving protein folding predictions

Generative models significantly improve protein design, paving the way for more reliable synthetic protein development

[318]

20

2019

Generative models in protein design

Graph-based conditional generative models, Inverse protein folding, Structural generalization, Graph encodings

Developed a generative framework for protein sequence design to solve the inverse folding problem, ensuring sequences fold into specific 3D structures. This is critical for biomedicine, energy, and materials science applications

Graph-based models capture long-range dependencies, improving structural generalization. Outperformed Rosetta in sequence accuracy and computational efficiency, reducing trial-and-error in protein design

Graph-based conditional generative models significantly advance targeted biomolecule design, enhancing speed, reliability, and efficiency in protein engineering

[319]

21

2020

Prediction of antigen–antibody binding interfaces

Graph Convolutional Networks (GCNs), Attention Mechanism, Transfer Learning

Developed computational methods to predict antibody-antigen binding interfaces, enhancing drug and vaccine design while addressing limitations of experimental approaches using large datasets

Conv2-layer with Attention Layer achieved the highest performance (AUC-PR = 0.8), while Convolution Alone performed significantly worse (AUC-PR = 0.48). Attention-enhanced models consistently outperformed others, demonstrating the effectiveness of combining graph convolutions with attention mechanisms

A DL framework using graph convolutions, an attention layer, and transfer learning achieved state-of-the-art accuracy, improving antibody-antigen interaction predictions while offering interpretable insights

[297]

22

2020

CDR design

Ens-Grad framework, CNNs, Gradient ascent optimization, Ensemble learning, Argmax transformation

Developed Ens-Grad, a ML-based approach for designing high-affinity CDRs in human IgG antibodies, eliminating off-target effects without requiring detailed target structures. Integrates experimental data to enhance specificity and optimize antibody sequences

Trained on phage display panning data, linking CDR-H3 sequences to binding affinity. Achieved AUROC of 0.960 in predicting ranibizumab enrichment, demonstrating high accuracy in distinguishing binding vs. non-binding sequences. Ensemble learning with voting-thresholding strategy improved diversity and robustness

Ens-Grad provides a modular, ML-driven approach for precise antibody design, demonstrating the effectiveness of ensemble learning in optimizing high-affinity candidates

[295]

23

2020

CDR H3 loop structure prediction

DeepH3 (Deep residual neural network—ResNet), Geometric potential refinement, Distance and orientation-based learning

Developed DeepH3, a DL-based approach to predict CDR H3 loop conformations, converting sequence data into geometric potentials to refine RosettaAntibody-generated models

DeepH3 outperformed RosettaAntibody in 33 of 49 targets, reducing RMSD by 32.1% (1.4 Å improvement). Achieved an average RMSD of 2.2 ± 1.1 Å (top 1) and 1.9 ± 0.9 Å (top 5), demonstrating superior predictive accuracy

DeepH3 significantly enhances CDR H3 loop modeling accuracy using DL-based distance and orientation potential refinements, making it a valuable tool in antibody structural predictions

[148]

24

2020

Denoising Diffusion Probabilistic Model

Diffusion probabilistic models, Sampling algorithm, Progressive lossy compression, Reweighting objective algorithm

Developed a diffusion-based generative model for high-quality image synthesis, leveraging noise-adding and denoising processes inspired by nonequilibrium thermodynamics

Achieved an Inception Score of 9.46 and FID of 3.17 on CIFAR10, outperforming many existing generative models. Demonstrated superior sample fidelity, distinguishing it from GANs and VAEs

Diffusion probabilistic models provide an efficient and effective approach to image synthesis, achieving state-of-the-art results in generative modeling

[320]

25

2021

Paratope-epitope prediction

Para-EPMP (CNNs + GNNs for localized paratope prediction), Epi-EPMP (Graph Attention Networks + Multi-task Learning for scattered epitopes)

Developed EPMP architecture, combining Para-EPMP for paratope prediction and Epi-EPMP for epitope prediction. Para-EPMP models antigen-dependent paratope interactions, while Epi-EPMP handles scattered epitopes using graph-based structural learning

Para-EPMP achieved the highest paratope prediction performance (AUC ROC = 0.966, AUC PR = 0.752), outperforming PECAN and other models. Epi-EPMP led epitope prediction with AUC ROC = 0.710, AUC PR = 0.277, surpassing alternative approaches

EPMP architecture effectively models paratope-epitope interactions by integrating both sequence and structural data, enhancing antibody-antigen interaction predictions and setting new benchmarks in epitope and paratope modeling

[299]

26

2021

Virtual screening of antibodies

DLAB framework (Deep Learning for Antibody Binding), DLAB-VS (CNN-based binding classifier), DLAB-Re (Pose refinement), Ensemble Learning

Introduced DLAB, a DL framework for structure-based antibody screening, enhancing virtual screening without prior binders. DLAB-VS classifies binders/non-binders using structural features, while DLAB-Re refines pose selection based on high-scoring inputs (fnat > 0.7 binders, < 0.1 non-binders). Post-snapshot evaluation ensures generalization to unseen targets, optimizing docking accuracy

DLAB-VS, combined with ZDock outputs, achieved superior AUC scores, enhancing antibody ranking and therapeutic candidate identification. CNNs were used for binding classification, trained on high-quality docking poses with data augmentation to improve generalization

DLAB enhances antibody-antigen binding prediction, improving docking accuracy and virtual screening. By integrating CNN-based binding classification and optimized pose refinement, it generalizes across low-identity targets and antigen variants, streamlining therapeutic antibody development

[296]

27

2021

Antibody design and affinity maturation

Long Short-Term Memory (LSTM) networks, Phage display panning, Next-generation sequencing (NGS), Likelihood scoring (Negative Log-Likelihood—NLL)

Developed an LSTM-based framework for antibody sequence generation and binding affinity prediction, reducing reliance on traditional labor-intensive mutation experiments. Uses NGS data to generate high-affinity variants while avoiding combinatorial complexity

Generated sequences with 1800-fold higher affinity than the parental clone. Likelihood scores (NLL) correlated with actual binding affinity, improving candidate selection. Cross-validation confirmed the robustness of LSTM over traditional frequency-based screening

LSTM-based affinity maturation streamlines antibody discovery by efficiently prioritizing high-affinity candidates, reducing cost and labor while outperforming traditional methods

[226]

28

2021

Antibody design and affinity maturation

Long Short-Term Memory (LSTM) networks, Graph Neural Networks (GNNs), Hag-Net (graph-based network for antibody-antigen interactions), Pairwise prediction strategy

Developed sequence-based DL models to optimize therapeutic antibody leads, predicting antibody-antigen interactions and binding affinities without requiring crystal structures. Pairwise prediction strategy improves affinity assessment by comparing closely related mutants

Hag-Net achieved an AUC > 0.90 in five-fold cross-validation and 0.70 in out-of-distribution tests. Graph Neural Networks improved structural learning, enhancing binding affinity predictions over conventional in silico approaches

DL-based models, including LSTM and GNNs, provide computationally efficient, scalable, and accurate affinity predictions, enabling broader applicability in industrial antibody development

[163]

29

2021

Benchmarking of models used in antibody sequence design

Autoregressive model (AR), Geometric Vector Perceptron (GVP), Fold2Seq (Transformer-based model)

Benchmarked three deep generative models (DGMs) for designing diverse antibody sequences while maintaining structural constraints, addressing challenges in protein design

Fold2Seq achieved the best balance between sequence diversity and structural integrity. Evaluations based on sequence diversity, structural accuracy (TM-score, RMSD), and physicochemical properties confirmed its superior performance

Fold2Seq excels in generating structurally consistent and diverse antibody sequences, highlighting the potential of DGMs in therapeutic antibody design

[254]

30

2021

Score-based generative modeling using SDEs

Stochastic Differential Equations (SDEs), Score-based generative modeling, Predictor–Corrector (PC) framework, Neural ODEs

Developed a generative modeling framework that transforms noise into complex data distributions, enhancing sample generation and solving inverse problems in data science

Achieved state-of-the-art performance on CIFAR-10 with an Inception Score of 9.89 and an FID of 2.20. Enabled conditional generation, image inpainting, and colorization without retraining, expanding practical applications

SDE-based generative modeling with predictor–corrector frameworks enhances sampling efficiency, improves accuracy, and sets new benchmarks in high-fidelity image synthesis

[321]

31

2021

Protein structure prediction

Two-Track Network, Three-Track Network, Attention Mechanisms, End-to-End Learning, SE(3)-Equivariant Transformer, pyRosetta, Crop Training Method

DeepMind achieved accurate predictions at CASP14 with a three-track network architecture. This network integrates information from the 1D sequence, 2D distance map, and 3D coordinates, and transforms and integrates it successively to achieve the best performance

The three-track neural network integrates 1D sequences, 2D distance maps, and 3D atomic coordinates for accurate protein structure prediction. It uses attention mechanisms for long-range interactions and SE(3)-equivariant layers for spatial consistency. End-to-end training optimizes predictions, with PyRosetta refining atomic models. RoseTTAFold is benchmarked against AlphaFold2 and trRosetta using TM-score, Cα-RMSD, and RMS error

This method matched DeepMind's CASP14 results, solved challenging modeling problems, provided insights into unknown protein functions, and quickly generated accurate protein–protein complex models from sequences. It is available to the scientific community to advance protein structure predictions and biological research

[322]

32

2021

Affinity Maturation using AntiBERTy Model

AntiBERTy, Multi-head attention pooling, Deep multiple instance learning (MIL)

AntiBERTy uses 558 million antibody sequences to model affinity maturation, revealing evolutionary pathways and key binding residues for therapeutic design

Improved clustering of antibody sequences into evolutionary trajectories, enhancing antigen-specific affinity maturation studies

AntiBERTy provides deep insights into antibody evolution, assisting vaccine and therapeutic antibody development

[241]

33

2021

Molecular Conformation Generation using ConfGF

Torsionnet, MARS (Markov molecular sampling), Conditional Variational Graph Auto-Encoder (CVGAE), Graphaf, Deep Potential Molecular Dynamics (DPMD), SchNet, Neural Message Passing, SE(3)-Transformers

ConfGF enhances molecular conformation generation by predicting stable 3D structures from 2D graphs using Langevin dynamics for more accurate results

Achieves 88.49% coverage and a 0.2673 Å MAE on QM9, surpassing existing molecular modeling methods

ConfGF generates diverse and accurate molecular conformations, improving molecular modeling for drug discovery

[323]

34

2021

Categorical Distributions with Argmax Flows and Multinomial Diffusion

Argmax Flows, Multinomial Diffusion, Importance Weighted Autoencoders, Real NVP (Real-valued Non-Volume Preserving), NICE (Non-linear Independent Components Estimation)

Introduces Argmax Flows and Multinomial Diffusion to improve categorical generative models, ensuring better coherence in high-dimensional discrete data

Outperforms traditional dequantization methods in negative log-likelihood, achieving high accuracy in text generation and image segmentation

Argmax Flows and Multinomial Diffusion significantly enhance categorical data modeling, improving text and image generation tasks

[324]

35

2021

Protein Structure Prediction using AlphaFold

AlphaFold, TensorFlow, Sonnet, JAX, Haiku, XLA compiler

AlphaFold revolutionizes protein structure prediction by integrating DL with physical constraints, achieving atomic accuracy on CASP14 challenges

Achieves atomic accuracy, surpassing conventional methods in protein structure modeling

AlphaFold sets new standards for computational biology, making protein structure prediction highly accurate and accessible

[28]

36

2021

Protein Structure Prediction using Deep Residual Networks

RaptorX-Contact, AlphaFold, Residual Neural Networks, PyRosetta, TensorFlow

Deep residual networks enhance protein structure predictions by incorporating inter-residue distances and orientations, outperforming previous methods

TM-score of 0.625, outperforming Robetta and achieving superior predictive accuracy

Inter-residue distance prediction advances protein modeling accuracy, leading to better structure-based drug design

[325]

37

2021

Antibody Structure Prediction using DeepAb

DeepAb, Focal Loss (for calibrating deep neural networks), MUFOLD (for protein 3D structure prediction), Rosetta3 (for macromolecular modeling and design)

DeepAb uses DL to predict antibody structures, outperforming RosettaAntibody. It combines a residual convolutional network and biLSTM-based RNN to learn inter-residue distances and orientations from sequences. The model refines predictions and generates 3D structures using Rosetta, advancing therapeutic antibody development

Improved RMSD accuracy by 14%−18% for heavy chains and 16%−17% for light chains

DeepAb enhances antibody prediction, advancing drug discovery and therapeutic antibody optimization

[88]

38

2021

Protein Design and Variant Prediction

Deep Generative Models, Autoregressive Models, Variational Autoencoders (VAE) (as a reference point for comparison), Probabilistic Models (such as EVmutation)

A deep generative model (DGM) using natural language processing (NLP) was developed to predict and design functional proteins without sequence alignments. It effectively handles highly variable sequences, such as antibodies and nanobodies

The model was tested on a designed nanobody library, yielding 1.5 × higher expression and nearly doubled mean display levels (166,193 vs. 92,183 units). Mutation effect predictions achieved an AUC of 0.90

An alignment-free DGM that designs high-expression nanobody libraries and predicts protein sequences accurately. It significantly improves computational protein engineering and therapeutic antibody development

[253]

39

2021

Mutations Effect on Protein Function

ESM-1v, MSA Transformer, EVMutation, DeepSequence JackHMMer

Zero-shot inference enables efficient protein function prediction without specialized training, reducing data dependency and computation. It accelerates research, uncovers novel functional relationships, and advances scalable protein engineering

Deep mutational scanning uses computational models to predict mutation effects on proteins. Supervised methods (e.g., regression, random forests) offer interpretability, while ensemble models (Revel, CADD) improve accuracy. Language models (UniRep, ESM) capture sequence patterns, and unsupervised tools (SIFT, EVMutation) leverage evolutionary data. Benchmarking ensures accuracy using Spearman correlation (ρ) and cross-validation. Zero-shot inference enhances generalizability, driving advances in protein engineering and functional analysis

Researchers have shown that protein language models can use zero-shot inference to predict the functional effects of sequence variation, without experimental data or additional training. This method leverages evolutionary patterns in protein sequences, providing an efficient and generalized solution for predicting protein functions from sequence data

[199]

40

2021

Denoising Diffusion Models in Discrete State-Spaces

Absorbing-state diffusion, Nearest-neighbor (NN) diffusion, Uniform diffusion, Cosine scheduling, Mutual information scheduling

D3PMs enhance discrete data modeling by introducing multinomial diffusion for structured transition matrices. Benchmarked on CIFAR-10, they outperform traditional dequantization methods in text and image synthesis, improving log-likelihood, sample quality, and generative accuracy

D3PMs achieved an Inception Score (IS) of 11.47 and FID ≤ 2.94. on CIFAR-10, surpassing continuous-space DDPMs in text and image generation. Their structured transition matrices improved categorical data modeling

D3PMs refine generative models for categorical data, enhancing text and image synthesis efficiency while maintaining structured transition matrices

[326]

41

2022

Antibody affinity optimization and developability

AI and ML-driven sequence analysis, Next-generation sequencing (NGS), Bioinformatics-guided CDR design, Computational CDR-FWR shuffling

Leveraged NGS data from human antibody repertoires, including COVID-19 patient antibodies, to optimize CDR and framework selection for antigen recognition. Used AI and ML to improve affinity and developability more efficiently than traditional methods

Framework-CDR shuffling optimized CDR-framework combinations, preserving natural diversity while improving affinity and specificity against SARS-CoV-2 variants. CB79, derived from SARS-CoV-2-neutralizing antibody H4, exhibited a sevenfold affinity increase and 75-fold improved neutralization, effectively blocking viral entry

Computational CDR-FWR shuffling is an efficient strategy for antibody development, enhancing affinity, stability, and specificity for therapeutic applications against SARS-CoV-2 and other diseases

[264]

42

2022

B-cell conformational epitope prediction

Pretrained Protein Language Models (ESM-1v, ESM-IF1), Transfer Learning, Binary Classification, Ensemble Learning

Developed the SEMA model to enhance conformational B-cell epitope prediction, addressing limitations of current methods in vaccine and immunotherapy research. Tested on the SARS-CoV-2 receptor-binding domain (RBD), capturing structural and functional features for improved predictions

SEMA achieved high accuracy (ROC AUC 0.76), outperforming existing tools. Utilized sequence-based (SEMA-1D) and structure-based (SEMA-3D) approaches to improve interpretability and immunogenic potential assessment

SEMA, using transfer learning and pretrained DL models (ESM-1v, ESM-IF1), enhanced conformational B-cell epitope prediction and effectively ranked immunodominant regions in SARS-CoV-2 RBD

[237]

43

2022

Predicting antibody-antigen interactions

AbAgIntPre (Siamese-like CNN architecture), CKSAAP encoding for spatial amino acid relationship representation

Developed AbAgIntPre, a DL method using a Siamese-like CNN architecture to predict antibody-antigen interactions from amino acid sequences. Addresses limitations of traditional methods and 3D structure-dependent computational approaches

Achieved an AUC of 0.82, excelling in SARS-CoV datasets. Outperformed traditional ML models like Random Forest and SVM, demonstrating higher accuracy and reducing false positives

AbAgIntPre accelerates antibody screening and design by predicting antibody-antigen interactions from sequence data, complementing experimental methods with improved accuracy and efficiency

[157]

44

2022

Understanding protein binding in DNA-binding antibodies

Long Short-Term Memory (LSTM) networks, CNNs

Used large datasets to predict binding sites in autoimmune antibodies, advancing vaccine design and synthetic pharmacology for autoimmune disease treatments

CNNs achieved higher accuracy (96.56% for binding, 97.81% for non-binding) compared to LSTMs (87.07% and 88.56%). LSTMs captured long-range dependencies, enhancing interpretability, while CNNs efficiently detected local features

CNNs are more effective for direct binding prediction, while LSTMs provide deeper insights into binding mechanisms. A hybrid model could integrate both strengths for improved predictions in autoimmune antibody research

[227]

45

2022

Protein design

Diffusion models (DDPMs), SE(3)-equivariant frameworks, RFdiffusion, RoseTTAFold, Gaussian noise, Brownian motion

Developed a DL framework for de novo protein design, focusing on binder creation and symmetric architecture design. Improved upon constraints in backbone geometry and sequence-structure complexities

RFdiffusion generated diverse, biochemically precise protein structures, leveraging SE(3)-equivariant modeling and self-conditioning techniques. Experimental tests on hundreds of designs confirmed its reliability

RFdiffusion enhances de novo protein backbone design, excelling in monomers, binders, oligomers, enzyme scaffolds, and therapeutic proteins, advancing drug discovery and synthetic biology

[302]

46

2022

Antibody Backbone and Side-Chain Conformations Prediction

DeepSCAb (DL model), Inter-residue module, Attention mechanisms, Integration with RosettaAntibodyDesign

Developed DeepSCAb, a DL model predicting full FV antibody structures, including backbone and side-chain conformations, directly from sequences, improving accuracy without prior structural input

DeepSCAb outperforms rotamer repacking methods, handling structural variability in solvent-exposed residues. It maintains accuracy across environments and integrates with antibody design tools for flexible and robust predictions

DeepSCAb enhances antibody modeling by predicting inter-residue geometries and side-chain dihedrals from sequence data, improving structure prediction and aiding therapeutic antibody design

[292]

47

2022

CDR Loop Structure Prediction

ABlooper, E(n)-Equivariant Graph Neural Networks (E(n)-EGNNs), Rotational and Translational Invariance, Parallel EGNNs

Developed ABlooper, a DL model that predicts CDR-H3 loop structures with high accuracy in under five seconds, outperforming traditional methods like ABodyBuilder, making it ideal for large-scale antibody research

ABlooper achieves an average RMSD of 2.49 Å, improving to 2.05 Å for high-confidence predictions, surpassing ABodyBuilder. It leverages five parallel EGNNs with four layers, minimizing RMSD and L1 loss for structural accuracy

ABlooper enhances antibody modeling with rapid and accurate CDR-H3 loop predictions, making it a powerful tool for biotherapeutics, vaccine development, and large-scale antibody research

[147]

48

2022

Antibody Sequence-Structure Co-Design

RefineGNN (Graph-Based Framework), Iterative Refinement, Coarse-Grained Graph Representation, Feedback Loop Optimization

RefineGNN improves antibody CDR design by co-generating sequence and 3D structure, unlike previous models that assume a fixed structure. It iteratively refines structures while adding residues, ensuring structural relevance and functional integrity

RefineGNN outperforms traditional approaches in sequence quality (perplexity, PPL), structural accuracy (RMSD), and antigen-binding precision (amino acid recovery, AAR). It dynamically refines structures through a feedback loop, ensuring compatibility with fixed framework regions

Researchers developed a generative model that designs CDR sequences and 3D structures simultaneously, treating CDRs as graphs. The model showed superior log-likelihood and effectiveness in designing SARS-CoV-2 neutralizing antibodies, making it a powerful tool for computational antibody design

[250]

49

2022

Antibody Design and Optimization

Diffusion-based generative model, side-chain packing, AMBER force field refinement, equivariant neural networks

Introduces a deep generative model integrating sequence-structure co-design, explicitly conditioning CDR generation on antigen 3D structures. It iteratively refines amino acid types, positions, and orientations to enhance antibody-antigen interactions and optimize existing antibodies

For H1 CDRs, amino acid recovery (AAR) increased from 22.85% (RAbD) to 65.75% (DiffAb), RMSD improved from 2.261 Å to 1.188 Å, and interaction model potential (IMP) increased from 43.88% to 53.63%

Diffusion-based generative model significantly enhances CDR design by improving accuracy, binding affinity, and sequence recovery, making it a powerful tool for therapeutic antibody development

[288]

50

2022

Antibody Binders Prediction and Antibody Generation

CNNs, Generative Adversarial Networks (GANs), Keras, TensorFlow, BLOSUM62 Encoding

Used DL to classify antibody binders and generate synthetic antibodies. CNNs extract features from encoded CDR3 sequences, while GANs create novel CDR3 sequences by learning binder patterns. High-throughput methods combined with deep sequencing enhance classification accuracy

Prediction accuracy: 91.2% (CTLA-4), 92.6% (PD-1). AUC values: 0.90 (CTLA-4), 0.94 (PD-1). Matthews correlation coefficient (MCC) used for robust classification assessment. CNNs trained on BLOSUM62-encoded sequences accurately classify binders

DL improves antibody analysis and optimization. CNNs classify PD-1 and CTLA-4 binders, identify key residues, and GANs generate synthetic antibodies, enhancing efficiency and accuracy in therapeutic antibody discovery

[223]

51

2022

Protein Structure and Sequence Generation

Diffusion Models, AdamW Optimizer, Gradient Accumulation, RosettaFixBB, RosettaRelBB,

This study introduces a generative model for designing proteins with specific 3D structures and chemical properties using Equivariant Denoising Diffusion Probabilistic Models. It integrates sequence, structure, and constraints, outperforming energy-based methods in generating diverse, functional proteins

This model successfully generated biophysically realistic protein sequences, ensuring proper folding and maintaining hydrogen bonding patterns in helices and beta sheets. The generated structures closely matched natural proteins

Equivariant generative models improve protein engineering by learning structural and functional constraints, enabling large-scale, accurate sequence-structure predictions

[327]

52

2022

Protein Modelling, Cell Development

Predictor–Corrector Schemes, Denoising Score Matching (DSM), Conditional SGM and Schrödinger Bridges, Stereographic SGM

Riemannian Score-Based Generative Models (RSGMs) extend Score-Based Generative Models to curved manifolds, improving robotic path planning, geoscience modeling, and protein structure prediction by handling non-Euclidean data distributions effectively

RSGMs demonstrated superior path planning efficiency in robotics and significantly enhanced geoscience modeling accuracy. They provided accurate earthquake and climate event predictions while improving protein structure modeling

RSGMs improve generative modeling for curved data distributions, offering practical applications in robotics, geoscience, and computational biology

[328]

53

2023

Antibody paratope prediction

Graph Neural Network-based tool (Paragraph), Structure modeling via ABodyBuilder and ABlooper

Developed Paragraph, a paratope prediction tool trained on 1,086 antibody-antigen complexes from SAbDab. It is antigen-agnostic, employs simplified feature vectors, and supports vaccine development, antibody design, and high-throughput screening

Paragraph achieved a PR AUC of 0.696, rising to 0.763 for its most confident models. It is 50× faster than Parapred (0.1 s per prediction) and outperforms PECAN. Its simplified feature vector reduces computational complexity while leveraging structural data for improved accuracy

Paragraph is a structure-based tool for paratope prediction that surpasses existing methods by using simpler feature vectors and no antigen information, enhancing prediction accuracy and efficiency

[298]

54

2023

Protein Structures Design

Genie (Diffusion-based generative model), Denoising Diffusion Probabilistic Models (DDPMs), Equivariant neural networks, Triangular multiplicative update layers, Multidimensional scaling (MDS)

Developed Genie, an advanced diffusion-based model for protein structure generation, leveraging equivariant diffusion to learn 3D residue frame distributions, aiding protein engineering, therapeutics, and materials science

Genie generates highly designable, novel, and diverse protein structures, outperforming ProtDiff and RFDiffusion. Its visualization tools enhance structural analysis, and open-source availability promotes further research

Genie advances protein design by exploring novel fold spaces beyond known proteins, improving structural understanding, and enabling efficient generative modeling for cellular engineering and therapeutic applications

[307]

55

2023

Protein Structure and Sequence Design

Protpardelle (Diffusion-based generative model), Self-consistency metrics (scRMSD, scTM-score), MiniMPNN, ProteinMPNN, Clustering algorithms

Developed Protpardelle, an all-atom generative model that co-designs protein structure and sequence, capturing side-chain interactions for realistic configurations while ensuring functional integrity

Protpardelle generates high-quality and diverse all-atom protein structures, validated using self-consistency metrics, chemical quality evaluations, and secondary structure analysis. It avoids energy function relaxation to maintain accuracy

Protpardelle advances protein engineering by generating realistic protein structures and sequences, capturing natural features without relying on predefined backbones or rotamers

[300]

56

2023

Full-Atom Generation of Antibodies

AbDiffuser (Diffusion model with physics-informed priors), Graph Neural Networks (E(n) Equivariant GNNs, FA-GNN), Aligned Protein Mixer (APMixer), High-throughput screening, Sequence transformers (BERT)

Developed AbDiffuser, a physics-informed diffusion model for generating accurate and efficient full-atom antibody structures and sequences, optimizing therapeutic applications

AbDiffuser achieved high precision in antibody structure and sequence generation. In vitro testing on HER2 binders showed 57.1% strong binding rates, with top candidates performing comparably to Trastuzumab

AbDiffuser integrates diffusion models, physics-based priors, and memory-efficient architectures to advance computational antibody design, improving precision, structural accuracy, and therapeutic potential

[290]

57

2023

Antibody Optimization using AbGAN-LMG

AbGAN-LMG, AbGAN-ESM2–150 M, AbGAN-BERT2DAb, AbGAN-AntiBERTy, IgFold, RGN2, DRN-1D2D_Inter, AbGAN-FEGS, AbGAN-No-Guided, ProteinGAN

AbGAN-LMG, a GAN-language model hybrid, optimizes antibody sequences, improving binding affinity and developability, demonstrating superior performance

Generated antibodies demonstrated higher affinity and better developability, with a 70% improvement over traditional optimization methods

AbGAN-LMG enhances antibody sequence design, making AI-assisted therapeutics more efficient and precise

[85]

58

2023

Estimation of Data Distribution Gradients with Langevin Dynamics

Annealed Langevin dynamics, Standard Langevin dynamics

Langevin dynamics is applied for estimating data gradients, preventing distribution collapse, achieving state-of-the-art results on generative benchmarks

Inception Score of 8.87, FID of 25.32, surpassing GANs and conventional score-matching models

Langevin dynamics stabilizes generative modeling, improving sample diversity and generative efficiency

[329]

59

2023

Protein Engineering using UniRep

Average Linkage (Euclidean Distance), Average Linkage (Levenshtein Distance), Levenshtein Algorithm, mLSTM (multi-layer Long Short-Term Memory), Softmax Regression

UniRep integrates DL and Gaussian processes to improve protein engineering by learning statistical representations of unlabeled amino acid sequences

Outperforms Rosetta in stability prediction, achieving a Spearman’s correlation of 0.59 vs. 0.42

UniRep streamlines protein engineering, demonstrating state-of-the-art efficiency in protein stability predictions

[304]

60

2023

Ligand Binding Site Prediction

Fpocket, DeepSite, Kalasanty, DeepSurf, GAT, GCN, GCN2, SchNet, DimeNet +  + , EGNN

EquiPocket improves binding site prediction by using graph-based representations, preserving protein structures without voxelization distortion. It captures detailed surface geometry and integrates chemical and spatial structures, enhancing predictive accuracy

EquiPocket was benchmarked on multiple protein datasets, including scPDB (17,594 structures), COACH420 (2,123 proteins), HOLO4K (4,000 proteins), and PDBbind (3,104 proteins). Performance metrics like DCC (Distance of Closest Contact) and DCA (Distance-based Contact Area) demonstrated superior results compared to CNN-based methods

EquiPocket, an E(3)-equivariant Graph Neural Network, predicts protein binding sites with high accuracy and robustness. Its superior performance in benchmarks highlights its potential as a valuable tool for drug discovery

[249]

61

2023

Antibody Structure Prediction

AlphaFold-Multimer, AlphaFold2, EquiFold, ABodyBuilder, ABlooper, IgFold, RAdam, Cosine annealing scheduler

ImmuneBuilder predicts immune protein structures, including antibodies and nanobodies, using DL. It features specialized models like ABodyBuilder2, NanoBodyBuilder2, and TCRBuilder2

ABodyBuilder2 achieved an RMSD of 2.81 Å for CDR-H3 loops, outperforming AlphaFold2. The tool provides error estimates and a web server for accessibility

ImmuneBuilder is a highly efficient, accurate tool for antibody modeling, improving prediction speed and accuracy for biotherapeutic applications

[291]

62

2023

Antibody Structure Prediction

IgFold, AlphaFold, AlphaFold-Multimer, DeepAb, ABlooper, MMseqs2, and MODELLER

IgFold predicts antibody structures from sequences using graph networks and DL. It enhances hypervariable loop modeling and significantly expands the known structural antibody space

IgFold achieved a median RMSD of 1.95 Å for CDR-H3 loops and improved large-scale structural analysis by 500-fold

A rapid, AI-driven antibody structure prediction model that matches or outperforms AlphaFold, expanding antibody modeling capabilities

[24]

63

2023

Immunoglobulin Language Model, antibody sequence generation

ANARCI, Rosetta, CamSol-Intrinsic, BioPhi, IgLM (Infilling by Language Modeling), GPT-2 Transformer, and ProGen2

This study addresses antibody developability issues by generating full-length antibody sequences with enhanced stability and reduced aggregation. IgLM is a transformer-based model trained on 558 million sequences, optimizing antibody discovery by controlling sequence modifications while maintaining structural integrity

IgLM enables targeted sequence modifications while preserving antibody integrity. It achieved high developability scores, improving CDR loop libraries and therapeutic relevance. Comparative analyses confirmed its superior stability over ProGen2-OAS

IgLM, trained on 558 million sequences, significantly enhances antibody optimization by addressing solubility, aggregation, and immunogenicity challenges, streamlining therapeutic development

[215]

64

2023

Protein Backbone Modelling in 3D

SMCDiff, DDPM (Denoising Diffusion Probabilistic Model), Particle Filtering Algorithm

SMCDiff enables conditional motif-scaffolding, generating longer, diverse scaffolds (up to 80 residues) while maintaining AlphaFold2 accuracy. It overcomes previous limitations of short scaffolds and inefficient generation, advancing vaccine and enzyme design

SMCDiff generates scaffolds up to 80 residues long, significantly extending previous limits (20 residues). Empirical results confirmed its efficiency and structural diversity, making it a valuable tool for protein engineering

SMCDiff efficiently samples scaffold structures, ensuring design flexibility and accuracy, making it a significant advancement for motif-scaffolding applications

[293]

65

2024

Antibody Design

ZeRoShot, generative AI models, grid search, Principal Component Analysis (PCA), support vector machine (SVM)

A generative AI model was used to design HER2-targeting antibodies de novo, eliminating the need for optimization. It generated epitope-specific binders, reducing reliance on traditional screening or immunization

High-throughput screening identified potential binders, with SPR analysis confirming low nanomolar affinity. 71 antibodies showed high binding efficiency, with 11 matching trastuzumab potency

The AI-driven model significantly accelerates therapeutic antibody discovery, achieving high binding success and eliminating the need for extensive optimization

[142]

66

2024

CDR Design

Equivariant Graph Neural Networks, Data Augmentation Strategy, LSTM-based Deep Generative Models

The AbFlex model enhances CDR design using an equivariant graph neural network (EGNN) and CDR augmentation strategies. It improves generalization across numbering schemes, refining antigen binding

Benchmarking results showed an RMSD of 1.568 Å and an AAR of 37.54%, with 80% of designed antibodies exhibiting improved binding energies over wild types

AbFlex optimizes CDR sequences with structural accuracy, improving binding affinity and robustness across different antibody designs

[289]

67

2024

Foundation model, natural and chemical language

T5 Model, Multi-task Learning, Fine-tuning Techniques, NeMo Toolkit, Data Mixing Strategy

This study introduces nach0, an LLM designed for biomedical problem-solving, molecular recognition, synthesis, and chemical property prediction. Nach0 integrates linguistic and chemical knowledge, excelling in biomedical question answering, molecular generation, synthesis planning, and reaction prediction. Trained using the NeMo framework, it offers robust multi-task performance

Nach0 outperformed SciFive and FLAN in molecular tasks, achieving an F1 score of 82.24%. It excelled in multitasking and generated viable JAK3 inhibitors in 45 min compared to Chemistry42’s 72-h runtime

Nach0, a multi-domain LLM, outperforms leading models in scientific literature and molecular task generation, proving effective in biomedical and chemical research applications

[309]

68

2024

Protein structure generation

RFDiffusion, P-SEA, Autoregressive (AR) model, Randomized sampling approach, 31-dimensional Gauss integrals

This study developed FoldingDiff, a diffusion-based model for generating diverse, foldable protein structures. It iteratively refines random conformations into stable structures using angle-based representation, ensuring accurate geometric properties. The model closely matches experimental structures and enables biologically plausible protein design

FoldingDiff generated 177 viable protein backbones out of 780, achieving a self-consistency TM score of ≥ 0.5. Clustering analysis confirmed structural diversity, while Ramachandran plots validated correct folding properties

FoldingDiff simulates natural protein folding, producing viable and diverse structures with validated experimental consistency. It advances computational protein design and structural biology research

[330]

69

2024

Antibody design

IgDiff (Diffusion-based generative model), Deep probabilistic models, Structural consistency validation

Developed IgDiff, a diffusion-based generative framework for designing novel, highly designable antibody structures, particularly optimizing CDRs for enhanced binding. Unlike slow physics-based methods, IgDiff leverages large datasets for rapid and efficient predictions

IgDiff-designed antibodies were experimentally validated, showing high expression yields. It maintained realistic backbone dihedral angles and achieved 93.3% structural self-consistency in light chains. It outperformed RFDiffusion in designing CDR loops and pairing light/heavy chains, demonstrating superior designability and functionality

IgDiff significantly advances computational antibody design by generating highly designable, experimentally viable antibodies with superior CDR loop modeling and structural accuracy, making it a powerful tool for therapeutic development

[294]