- Review
- Open access
- Published:
Revolutionizing oncology: the role of Artificial Intelligence (AI) as an antibody design, and optimization tools
Biomarker Research volume 13, Article number: 52 (2025)
Abstract
Antibodies play a crucial role in defending the human body against diseases, including life-threatening conditions like cancer. They mediate immune responses against foreign antigens and, in some cases, self-antigens. Over time, antibody-based technologies have evolved from monoclonal antibodies (mAbs) to chimeric antigen receptor T cells (CAR-T cells), significantly impacting biotechnology, diagnostics, and therapeutics. Although these advancements have enhanced therapeutic interventions, the integration of artificial intelligence (AI) is revolutionizing antibody design and optimization. This review explores recent AI advancements, including large language models (LLMs), diffusion models, and generative AI-based applications, which have transformed antibody discovery by accelerating de novo generation, enhancing immune response precision, and optimizing therapeutic efficacy. Through advanced data analysis, AI enables the prediction and design of antibody sequences, 3D structures, complementarity-determining regions (CDRs), paratopes, epitopes, and antigen–antibody interactions. These AI-powered innovations address longstanding challenges in antibody development, significantly improving speed, specificity, and accuracy in therapeutic design. By integrating computational advancements with biomedical applications, AI is driving next-generation cancer therapies, transforming precision medicine, and enhancing patient outcomes.
Graphical Abstract

Introduction
Antibody-based therapies have revolutionized oncology, with monoclonal antibodies (mAbs) becoming essential tools for targeted cancer treatment since their development in the late twentieth century [1]. These therapies selectively target antigens on malignant cells, minimizing damage to healthy tissues and improving treatment outcomes [1, 2]. However, tumor biology presents significant challenges due to cellular diversity, genetic mutations, and adaptive resistance mechanisms driven by both genetic and non-genetic factors [3, 4]. These complexities hinder the development of highly effective antibody-based treatments.
Genetic alterations in oncogenes and tumor suppressor genes drive malignancy, leading to rapid proliferation and resistance to apoptosis. Meanwhile, non-genetic factors, such as modifications in the tumor microenvironment and metabolic shifts, allow cancer cells to evade immune surveillance and adapt to therapy [3, 4]. Resistance mechanisms further complicate treatment; for example, hypoxia can reduce radiation effectiveness, while PI3K/AKT pathway mutations contribute to therapeutic resistance [3]. While targeted therapies, such as HER2 and VEGF inhibitors, have improved clinical outcomes, challenges like antigen heterogeneity, immune evasion, and immunosuppressive tumor microenvironments continue to limit their efficacy [3,4,5,6,7,8].
To address these challenges, researchers have developed advanced monoclonal antibody-based therapies, including bispecific antibodies (bsAbs), antibody–drug conjugates (ADCs), immune checkpoint inhibitors, chimeric antigen receptor (CAR)-T cells, and CAR-NK cells [9,10,11]. Despite their promise, these therapies face limitations such as off-target effects, drug resistance, stability issues, and immunogenicity. Additionally, tumor complexities such as inadequate chemokine trafficking, T-cell suppression, metabolic dysregulation, and high mutational burdens reduce the effectiveness of these treatments [12,13,14,15,16]. Immunosuppressive factors, including regulatory T-cells and myeloid-derived suppressor cells, further hinder immunotherapy, while resistance mechanisms, such as HER2-targeted drug resistance (HTDR), pose additional challenges [14, 17, 18]. Genetic factors, such as RAS mutations, can render therapies ineffective, whereas HER2 overexpression in breast cancer can influence treatment responses [14]. Thus, overcoming cellular complexity, genetic variability, and immune evasion remains critical for advancing antibody-based cancer therapies [14].
Early computational methods for antibody design were constrained by limited structural data and computational power, which hindered the development of reliable models for antibody-antigen interactions [19]. This led to inaccuracies in binding predictions, restricting their utility in guiding experimental design and necessitating extensive in vitro validation. As a result, researchers relied heavily on time-intensive and resource-intensive experimental approaches [19, 20]. However, recent advancements in high-throughput sequencing, the availability of structural data, and improved computational techniques have enabled more precise predictions of antibody-antigen structures and interactions [19, 21,22,23,24,25,26].
Artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), has transformed antibody design by improving the prediction of antibody-antigen structures, interactions, structural dynamics, and molecular stability [25, 26]. Leveraging vast structural databases like the Protein Data Bank (PDB) and advanced tools such as AlphaFold, AI-driven approaches enhance in-silico antibody design with exceptional efficiency and accuracy [27, 28]. With increased computational power and cloud-based platforms, these advancements enable rapid simulations that complement traditional experimental methods, improving antibody affinity, specificity, and therapeutic potential, particularly in cancer treatment [13, 29,30,31,32,33]. AI models analyze complex datasets to predict antibody sequences, 3D structures, complementarity-determining regions (CDRs), paratopes, epitopes, and antigen–antibody interactions with remarkable accuracy. These innovations streamline antibody design, optimization, and testing, reducing time and cost while addressing challenges related to developability and stability [26, 34,35,36,37,38].
This review explores AI’s transformative role in antibody design and optimization. We highlight advancements in CDR development, structural stability, folding efficiency, and CDR H3 conformation prediction—key factors in optimizing epitope-paratope interactions and enhancing therapeutic efficacy.
Evolution of antibody-based therapeutics
Antibody-based therapies offer high specificity in cancer treatment by binding to antigens on cancer cells, triggering immune responses and inhibiting tumor growth with greater precision than traditional treatments [39, 40]. Advances in humanized antibody engineering, tumor biology, and antibody conjugation and selection techniques have further enhanced their efficacy [41]. These developments are closely tied to antibody structure, which consist of two light and two heavy chains forming a Y-shaped structure. The Fab regions mediate antigen binding, while the Fc region governs effector functions (Fig. 1) [42]. Structural modifications such as single-chain variable fragments (scFvs) and bispecific antibodies (bsAbs) enable targeting of multiple antigens, further improving therapeutic effectiveness [43]. Notably, an antibody's specific binding capability depends upon its six hypervariable regions: CDR-H1, H2, H3 (heavy chain), and CDR-L1, L2, and L3 (light chain) [44, 45].
Overview of antibody structure. (a) The antibody has a Y-shaped structure, comprising two identical light chains (L) and heavy chains (H). These chains give rise to two Fab segments along with one Fc segment, comprising a total of 12 domains. The Fab segments are those where antigen recognition and binding occur, while the Fc region is primarily responsible for carrying out effector functions. (b) The surface representation of the antibody (PDB code: 1IGT) is depicted in Fig. 1(b). (c) Each Fab's antigen-binding site consists of six hypervariable loops: CDR-L1, CDR-L2, CDR-L3, CDR-H1, CDR-H2, and CDR-H3. These loops are situated on the variable domain pair VL: VH of each Fab, which is also referred to as the variable fragment (Fv)
Currently, over 100 mAb drug are approved, with many more in clinical trials or patent filing stage [34, 46]. A total of 162 antibody therapies have been approved globally, including 122 in the US, 82 in Japan, 114 in Europe, and 73 in China [47]. The projected revenue for antibody therapies is estimated to reach $479 billion by 2028. (www.marketsandmarkets.com).
First approved in the 1980s, mAbs paved its way for advanced therapies, including bsAbs, ADCs, CAR T-cell and CAR NK-cell therapies, and immune checkpoint inhibitors. Most prominent examples of mAbs include rituximab (Rituxan) for non-Hodgkin lymphoma and chronic lymphocytic leukemia, targeting CD20 on B cells [48]; trastuzumab (Herceptin) for HER2-positive breast cancer [49]; cetuximab (Erbitux) for colorectal, head, and neck cancers, targeting EGFR [49]; bevacizumab (Avastin) for various cancers by blocking VEGF and inhibiting tumor angiogenesis [49]; and ipilimumab (Yervoy) for melanoma, targeting CTLA-4 to enhance immune response [50]. Combining these mAbs with chemotherapy has shown improved outcomes [40, 51].
ADCs, also known as "biological missiles," are a breakthrough in targeted cancer therapy by combining the specificity of mAbs with potent cytotoxic drugs [51, 52]. ADCs consist of a tumor-specific mAb, a cytotoxic drug, and a linker, ensuring stability in circulation and targeted drug release within cancer cells (Fig. 2) [51]. This structure enhances precision, sparing healthy tissues and offering higher clinical response rates than unconjugated mAbs targeting the same surface antigens [53]. Advances in linker technology and potent cytotoxic payloads further improved ADC efficacy and safety [54]. Most prominent ADCs include brentuximab vedotin, effective for Hodgkin's lymphoma and systemic anaplastic large cell lymphoma [53], and ado-trastuzumab emtansine, approved by the FDA for HER2-positive breast cancer [55].
bsAbs, such as blinatumomab (for acute lymphoblastic leukemia) and catumaxomab (for malignant ascites), have revolutionized cancer therapy by targeting two antigens simultaneously, offering increased tumor selectivity and potential for improved payload delivery (Fig. 3) [56]. Also Amivantamab, targeting EGFR and MET (Mesenchymal-Epithelial Transition factor), has shown promising results in treating NSCLC with EGFR exon 20 insertion mutations [57]. These therapies can target multiple surface receptors or ligands linked to cancer, growth, or inflammation [43]. They can also prevent cancer cell escape mechanism by blocking multiple pathways [58]. Furthermore, bsAbs have shown promise in treating both blood cancers and solid tumors [56], but more research is required to optimize their use and minimize toxicity.
The figure shows various common bispecific antibody (bsAb) formats. The Fc-modified IgG format leverages KIH technology to facilitate the heterodimerization of two distinct heavy chains. To enhance the pairing of homologous heavy and light chains, DuetMab introduces an alternative disulfide bond, replacing the natural bond at one CH1-CL interface. The Duobody format includes specific Fc region mutations, significantly reducing Fc-mediated cytotoxicity. Appended IgG structures integrate an IgG with a single-chain variable fragment (scFv), either through light chain (LC) or heavy chain (HC) connections. Constructs such as scFv-Fc and Fab-scFv-Fc also rely on the KIH method for their assembly. The DART-Fc structure incorporates two distinct antigen-binding domains, stabilized into a diabetes-like mimic. TriFabs are IgG-derived bsAbs with two standard Fab arms linked to a third Fab-sized unit via flexible peptide linkers. CrossMab, on the other hand, achieves connectivity using domain crossovers involving a shared light chain. Tandem scFv (taFv) represents the most compact bsAb design, closely related to Triplebody constructs. The diabetic (db) format employs a short linker to join VH and VL domains of an scFv, forming a noncovalent heterodimer. Dual-Affinity Re-Targeting (DART) molecules pair two Fv segments to generate distinct antigen-binding regions. Tandem single-domain antibodies (dAb/VHH) are derived from the binding regions of heavy-chain-only antibodies. Lastly, the Fab-scFv "bibody" format links an scFv to the Fab's C-terminus, while the Fab-scFv "tribody" format adds a second scFv segment for enhanced functionality
CAR T-cell therapy (Fig. 4) has transformed cancer treatment, particularly in blood cancers, with FDA-approved treatments like tisagenlecleucel for acute lymphoblastic leukemia, axicabtagene ciloleucel for non-Hodgkin lymphoma, and brexucabtagene autoleucel for mantle cell lymphoma [59,60,61]. Other therapies, including idecabtagene vicleucel, lisocabtagene maraleucel, and ciltacabtagene autoleucel, are also used for specific adult cancers [62,63,64,65]. Idecabtagene vicleucel treats relapsed multiple myeloma [63], lisocabtagene maraleucel targets relapsed large B-cell lymphoma [64], and ciltacabtagene autoleucel is used for relapsed/refractory multiple myeloma [65]. This personalized immunotherapy engineers a patient's T cells to attack cancer cells, showing significant success in leukemia and lymphoma, receiving FDA approval [66]. However, challenges like high costs and severe side effects remain [67]. Further research is needed to extend CAR T-cell therapy to solid tumors and enhance its safety and effectiveness [66, 67].
Schematic representation of the evolution of the CAR structure from the first generation to the fifth. First CAR generation contains only a CD3ζ signaling domain and no co-stimulatory molecules (CMs). Second generation CAR adds one CM to CD3ζ, enabling dual signaling. Third generation CAR combines CD3ζ with multiple CMs to enhance signaling. The fourth-generation CAR, like the 2G CAR, features an NFAT-responsive cassette that triggers cytokine expression, delivering triple signaling through CD3ζ, CM, and transgenic proteins. The fifth generation of CAR, rely on 2G CAR and integrates IL-2Rβ receptors, which activates the JAK-STAT signaling domain for synergistic activation of CD3ζ, CMs, and the JAK-STAT3/5 pathway
CAR-NK cell therapy shows promise in cancer treatment by harnessing natural killer cells' innate immunity [68, 69]. It offers benefits such as reduced risks of Cytokine Release Syndrome, neurotoxicity as well as Graft-versus-Host Disease etc. [69]. Early trials, including a Phase I/II study targeting CD19 in B-cell lymphomas and leukemia, have yielded encouraging results [69]. CAR-NK cells also target solid tumor antigens such as HER2, EGFR and mesothelin, showing effective tumor infiltration and anti-tumor responses [70]. Challenges include ensuring long-term persistence and overcoming the immunosuppressive tumor microenvironment [71]. Despite these hurdles, CAR-NK cell therapy has significant potential as an innovative cancer treatment.
Immune checkpoint inhibitors targeting CTLA-4 and PD-1 have revolutionized cancer therapy by enhancing the immune system's capacity to combat tumors. Inhibitors such as Ipilimumab, Pembrolizumab, Nivolumab, Atezolizumab, and Durvalumab have achieved significant success in treating cancers such as melanoma, lung, kidney, and bladder, improving patient survival rates [72,73,74,75]. These inhibitors work by disrupting proteins that help cancer cells evade detection, thereby empowering the immune system to target and destroy them. However, not all patient responds hence emphasizing the need for predictive biomarkers [74]. While they are effective, these drugs can cause side effects like fatigue, rashes, fever, and, in rare cases, serious issue like cardiotoxicity and inflammation [72].
The choice between these innovative therapies requires a nuanced evaluation of their benefits and limitations, with decisions tailored to each patient's unique clinical needs. Despite the precision of mAbs and ADCs, the novel mechanisms of bsAbs and CAR-T therapy, or the broad efficacy of immune checkpoint inhibitors, cancer treatment is advancing toward more personalized, effective, and less toxic options.
While these advancements have revolutionized treatment, the development of novel antibodies remains a complex challenge. Conventional antibody design faces significant obstacles, including the vast combinatorial search space of CDR sequences, off-target effects, low binding affinity, stability issues, and developability limitations such as poor expression, solubility, and aggregation [76,77,78,79]. Experimental discovery methods like phage display and rational design, though valuable, are time-intensive, labor-intensive, and impractical for exploring the immense diversity of potential antibody sequences [35]. Biophysical energy-based computational approaches enhance efficiency but remain computationally expensive and susceptible to local optima, limiting their ability to comprehensively explore sequence space [80,81,82,83].
AI, particularly ML and DL, is transforming antibody design by accelerating discovery and optimization while addressing key limitations of traditional methods [35,36,37, 84]. Generative models, such as generative adversarial networks (GANs) and variational autoencoders (VAEs), are now capable of generating diverse CDR sequences while improving binding affinity and stability properties [22, 85]. Reinforcement learning (RL) further optimizes antibody sequences by iteratively refining affinity and developability through feedback-driven improvements [35, 36, 86, 87]. Additionally, DL models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) integrate sequence and structural data to predict how CDR modifications influence binding and stability, enabling more informed design decisions [88]. AI-powered simulations of antibody-antigen interactions further enhance target specificity and reduce off-target effects, addressing concerns related to immunogenicity [35, 36]. Furthermore, AI-driven high-throughput screening is expediting candidate selection, significantly reducing the time and cost associated with experimental validation [35, 36]. By efficiently navigating vast sequence spaces, optimizing antibody structures, and streamlining the development process, AI is playing an increasingly critical role in therapeutic antibody engineering. These innovations have the potential to improve the safety, efficacy, and manufacturability of antibody-based treatments. In the following section, we examine the foundational contributions of early ML approaches to antibody design and optimization, setting the stage for more advanced AI-driven innovations.
Early ML approaches in developability of antibody design and optimization
Antibodies are vital therapeutics, but factors such as solubility, stability, and aggregation, along with developability attributes like viscosity, immunogenicity, and expression yield, play a critical role in ensuring their safety, scalability, and efficacy [9, 89]. Therefore, early-stage assessments using computational methods are essential for optimizing antibody candidates, minimizing risks, and ensuring efficient development [9, 89]. Various algorithms, including Support Vector Machines (SVMs), XGBoost, Random Forest, Gradient Boosting Machine (GBM), and k-nearest neighbors (k-NN), as well as DL models such as RNNs, CNNs, and Feedforward Neural Networks (FNNs), are widely used for predicting physicochemical and developability properties. Additionally, specialized methods like Repeated Edited Nearest Neighbor (RENN) and Instance Hardness Threshold (IHT) contribute to epitope prediction, structure modeling, and antigen–antibody interaction analysis are discussed in the following sections.
Developability
Developability is a key factor in advancing antibody candidates into therapeutic products [9, 90]. It involves evaluating biophysical, biochemical, pharmacological, and manufacturability attributes to predict stability, efficacy, and safety. Biophysical properties such as stability, solubility, and aggregation propensity influence formulation stability and storage conditions, ensuring long-term viability [11, 90]. Solubility is particularly important, as poor solubility can hinder formulation and delivery. Biochemical factors, including Fc receptor interactions and post-translational modifications (e.g., glycosylation), directly impact function and pharmacokinetics, affecting antibody performance in vivo [9]. Pharmacological considerations, such as efficacy, safety, and off-target effects, determine therapeutic potential and risk profiles [10, 90]. Additionally, manufacturability aspects, including yield, purity, scalability, and cost-effectiveness, are essential for large-scale production and commercial viability [11].
An important aspect of developability is its relationship to immunogenicity, which influences therapeutic success. While developability focuses on manufacturability, stability, and function, immunogenicity assesses the likelihood of an antibody triggering an immune response. Although not always classified within developability, immunogenicity remains a crucial consideration, as highly immunogenic antibodies may require further optimization. To address these factors, high-throughput screening assays and computational tools play a pivotal role in modern antibody development. Screening assays rapidly evaluate thousands of variants for key properties like binding affinity, stability, and solubility, facilitating the selection of promising candidates [10, 11]. Meanwhile, computational tools leveraging ML and structural modeling predict risks such as aggregation and immunogenicity, allowing for early intervention [11]. Integrating these approaches streamlines development, minimizes failure risks, and enhances therapeutic antibody design efficiency [9, 10].
ML methods such as SVMs and Multilayer Perceptrons have demonstrated high accuracy in early-stage antibody screening and developability optimization, as shown in studies using datasets of up to 2,400 antibodies [91]. Algorithms such as XGBoost and PyCaret modeled IgG developability using biophysical properties, leveraging 250,000 models to predict hydrophobic patch energies and CDR charges [90]. A Random Forest model trained on 64 monoclonal antibodies (mAbs) identified correlations between biophysical properties and pharmacokinetics. The study found that poly-specificity and isoelectric point (pI) contributed to slower clearance rates, whereas hydrophobicity and extreme charges were linked to faster clearance. This approach supports early pharmacokinetic profiling, aiding in antibody design and reducing the risk of clinical trial failures [92]. Additionally, structure-based approaches predict critical properties such as pI, viscosity, and clearance, emphasizing the relationships between charge and stability, as well as surface charge and viscosity, to optimize high-concentration antibody formulations [92, 93]. Hu-mAb, an ML-based tool for antibody humanization, evaluates "humanness" scores and recommends mutations to minimize immunogenicity while maintaining functionality, validated using a dataset of 481 antibodies [94]. BioPhi, an open-source DL platform, integrates Sapiens for humanization and OASis for humanness scoring. Tested on 177 antibodies, it demonstrated expert-level accuracy in distinguishing human from non-human sequences. With both web and command-line access, BioPhi simplifies antibody development [95]. These innovations harness natural antibody diversity and ML to develop safer, more effective therapies.
Solubility
Solubility is a major crucial factor in antibody design, directly influencing formulation, manufacturability, stability, and pharmacokinetics. High solubility is essential to prevent aggregation, ensure consistent dosing, and facilitate large-scale production. Poor solubility complicates purification, storage, and scalability, increasing costs and raising immunogenicity risks. Additionally, solubility affects pharmacokinetics by influencing absorption, distribution, and bioavailability [89, 96]. Key factors affecting solubility include amino acid composition, surface properties, post-translational modifications, and environmental conditions. Hydrophobic residues promote aggregation, whereas charged residues enhance solubility. In addition to sequence composition, glycosylation stabilizes protein folding, while formulation factors such as pH and excipients help maintain solubility. Addressing these factors through various strategies can enhance solubility and stability [97, 98]. Several approaches improve solubility, including sequence optimization to reduce hydrophobicity, glycoengineering, formulation adjustments with stabilizing excipients, structural modifications, and high-throughput screening to identify candidates with superior solubility profiles [98, 99].
To complement these experimental approaches, AI-driven predictive models have been developed to assess and enhance antibody solubility early in the design process. For instance, SOLpro is a sequence-based tool for predicting protein solubility during overexpression, utilizing 23 feature groups and a two-stage SVM model trained on over 17,000 proteins. It achieves 74% accuracy in tenfold cross-validation, outperforming models like PROSO in detecting insoluble proteins, and supports experimental planning and mutation design for improved solubility in protein engineering [100]. Similarly, CamSol and FoldX predict antibody solubility and stability through phylogenetic filtering. Validated on six antibodies, including nanobodies and scFv fragments, these models improved solubility and stability across 42 designs without compromising binding functionality [89, 96, 101].
Further advancements include PaRSnIP, which employs Gradient Boosting Machine algorithms to predict solubility with 74.11% accuracy—outperforming SOLpro and PROSO II by 9%. By integrating sequence features such as peptide frequencies with structural attributes, it identifies solubility determinants, including exposed residue thresholds and tripeptides like IHH [102]. SOLart, a Random Forest-based model, predicts protein solubility while minimizing aggregation risks, achieving a Pearson correlation of ~ 0.7 with experimental data [99]. Additionally, solPredict specializes in antibody solubility assessments for high-concentration monoclonal antibody (mAb) formulations. Unlike conventional models, it utilizes ESM1b embeddings without 3D modeling, demonstrating strong correlations with experimental solubility data across 260 antibodies. This tool enhances early-stage candidate selection by identifying solubility risks before experimental validation [103].
Aggregation and viscosity
Aggregation and viscosity are key biophysical properties in antibody design, influencing developability, manufacturability, and therapeutic efficacy [9, 104]. Aggregation occurs when antibodies self-associate due to hydrophobic or electrostatic interactions, environmental stressors, or post-translational modifications. This leads to reduced efficacy, increased immunogenicity, and manufacturing challenges [9, 11, 104, 105]. Additionally, aggregation can increase viscosity, further complicating formulation and administration [9, 10, 106]. Strategies to minimize aggregation include sequence optimization to eliminate aggregation-prone motifs, formulation adjustments with stabilizing excipients, glycoengineering to enhance stability, and high-throughput screening for low-aggregation variants [9, 11, 104, 106].
Viscosity, a critical factor in high-concentration antibody formulations, affects subcutaneous injection, manufacturing, and product stability [9, 104, 106]. It arises from charge-charge or hydrophobic interactions [9]. Mitigation strategies include charge engineering to optimize charge distribution, sequence and structural modifications to reduce intermolecular interactions, and excipients to lower viscosity [9, 11]. Optimizing both aggregation and viscosity enhances bioavailability, ease of administration, and regulatory compliance, ultimately improving clinical success [9, 11, 107]. To facilitate this process, computational tools have been developed to predict and optimize these properties in antibody formulations.
Aggrescan3D (A3D) 2.0 predicts and enhances protein solubility by identifying aggregation-prone regions. It integrates CABS-flex for flexibility simulations, an automated mutation tool for reducing aggregation, and FoldX for stability assessments to maintain structural integrity. This approach improves solubility without compromising functionality and has been validated across various proteins [108]. In therapeutic mAbs, solution viscosity is critical for high-concentration formulations. Studies have identified net charge and Fv amino acid regions as key factors, with hydrophilic profiles linked to high viscosity. The High Viscosity Index (HVI) has been introduced as a rapid screening tool for identifying high-viscosity candidates. ML models, including logistic regression and decision trees, were validated on 27 FDA-approved mAbs, demonstrating high accuracy in viscosity classification [109]. Additionally, a k-nearest neighbors (KNN) model showed strong correlations (r = 0.89) for predicting aggregation rates and viscosity in high-concentration formulations, using features such as CDRH2 positive charge and hydrophobic surface area [110]. These computational models enable efficient screening and support the development of stable, high-concentration formulations for clinical applications.
Epitope prediction
Epitope prediction is a crucial aspect of antibody design. It helps identify antigen regions targeted by antibodies for vaccine development, therapeutics, and diagnostics [111]. It plays a pivotal role in designing antibodies with high specificity and minimal off-target effects. In vaccine development, it helps identify immunogenic regions that elicit strong immune responses [111, 112].
Epitopes are classified as linear, consisting of contiguous amino acid sequences, or conformational, formed by non-contiguous residues in the antigen’s 3D structure [113]. Recent advancements have shifted from antibody-agnostic to antibody-aware approaches, incorporating structural and physicochemical features to improve prediction accuracy [112, 113]. Additionally, targeting functional or conserved epitopes enhances therapeutic efficacy, while epitope-specific antibodies improve diagnostic precision [112].
Accurate epitope identification relies on a combination of experimental and computational techniques, each offering distinct advantages in antibody and vaccine development [111, 114, 115]. Experimental methods, including X-ray crystallography, NMR spectroscopy, peptide ELISAs, and mass spectrometry, provide direct evidence of antibody binding [111]. Meanwhile, computational approaches such as ML, molecular docking, and bioinformatics tools offer scalable and cost-effective predictions [111]. Peptide scanning and alanine scanning mutagenesis further refine linear epitope identification by mapping critical binding sites with high precision [116, 117].
Computational epitope prediction tools fall into three main categories. Sequence-based methods, such as BepiPred [118] and ABCpred [119], focus on identifying linear epitopes by analyzing contiguous amino acid sequences. Structure-based approaches, including EpiPred [120] and EpiMap [117], predict conformational epitopes by considering the three-dimensional structure of the antigen. Additionally, hybrid models like DiscoTope [121] and ElliPro [122] integrate both sequence and structural data to enhance predictive accuracy, making them more effective in identifying epitopes with complex binding patterns.
Advanced ML-based tools, including DeepAb [88] and EpitopeVec [123], leverage large datasets to enhance predictive accuracy. Despite these advancements, epitope prediction remains challenging due to factors such as antigen conformational flexibility, limited training data, cross-reactivity, and epitope accessibility [111, 113]. Overcoming these limitations requires integrating experimental validation, refining computational algorithms, and expanding diverse training datasets to improve accuracy and reliability.
The Immune Epitope Database Analysis Resource (IEDB-AR) offers tools like TepiTool (for T cell epitope prediction), MHC-NP (for MHC ligand identification), and CD4EpiScore (for evaluating CD4 T cell reactivity) for B and T cell epitope prediction [124]. Several ML approaches significantly enhances epitope prediction, reducing the time and cost of traditional experimental methods [113]. These methods can predict both B-cell and T-cell epitopes, making them crucial for vaccine design and therapeutic antibody development [113]. Linear epitopes, consisting of contiguous amino acids, are predicted using sequence-based models like SVMs and RNNs, while conformational epitopes, formed by spatially close but non-contiguous residues, require models that integrate structural and sequence data to predict 3D configurations [113]. These methods streamline epitope-based peptide vaccine (EBPV) design by efficiently identifying immune-stimulating epitopes [113]. SVMs predict linear epitopes using sequence-based properties, while neural networks like RNNs, CNNs, and FNNs excel in both linear and conformational epitope prediction, with CNNs effectively capturing spatial relationships. Ensemble methods, such as random forests and gradient boosting, enhance accuracy for heterogeneous epitope datasets. Feature engineering further improves ML models by incorporating amino acid properties, hydrophobicity, charge, PCA, and evolutionary alignments to boost performance and precision [113]. ABCpred, an RNN-based model trained on 700 B-cell epitopes, predicts continuous B-cell epitopes with 65.93% accuracy, surpassing traditional methods and assisting vaccine and diagnostic research [119, 125, 126]. A biosupport vector machine (bSVM) achieved 90.31% accuracy in T-cell epitope prediction, outperforming traditional SVMs in identifying immune-relevant epitopes [127]. BCPred, an SVM-based model, improves linear B-cell epitope prediction with an AUC of 0.758, exceeding ABCpred and AAP in accuracy [128]. A logistic regression model using B-factor and RASA enhances discontinuous B-cell epitope prediction, outperforming DiscoTope and BEpro in sensitivity and AUC scores [129]. EPMLR, a multiple linear regression model, achieved 81.8% sensitivity and an AUC of 0.728 for linear B-cell epitope prediction [118, 128, 130]. Bagging meta decision trees (MDTs) integrated classifier outputs to improve conformational epitope predictions, reducing overfitting and surpassing 12 predictors, including SEPPA and DiscoTope [131]. A decision tree model also advanced metalloendopeptidase epitope prediction by identifying features like charged residues and physicochemical properties, validated experimentally to neutralize Atroxlysin-I's hemorrhagic activity [132]. Re-epitoping, a targeted antibody design approach, re-engineered antibodies like an IL-17A antibody for inflammatory diseases. Using ML to predict key binding contacts, it was validated with crystal structures, demonstrating its value in therapeutic antibody development [133]. Epitope3D, which uses graph-based signatures, predicts conformational B-cell epitopes with an MCC of 0.56 and F1 score of 0.57, supporting vaccine design and diagnostics [134].
Additionally, NetMHC, NetMHCpan, NetMHCII, and NetMHCIIpan are neural network–based computational tools that predict peptide binding to Major Histocompatibility Complex (MHC) molecules—an essential step in T-cell epitope identification and immunogenicity [124, 135,136,137]. NetMHC and NetMHCpan (including “pan” for a universal model) focus on MHC class I (CD8⁺ T‐cell) epitopes, while NetMHCII and NetMHCIIpan target MHC class II (CD4⁺ T‐cell) epitopes; each pair uses quantitative binding affinity data, with the “pan” versions able to predict across a broad range of MHC alleles [124, 135,136,137].
Antibody structure predictions and design
Antibody structure prediction and design are integral to the rational development of therapeutic, diagnostic, and research antibodies, enabling the optimization of binding affinity, specificity, stability, and developability [10]. By predicting the 3D structure of antibodies and their antigen complexes, researchers can pinpoint critical residues in CDRs and engineer them to enhance interactions [138]. Structural insights also facilitate strategies to reduce immunogenicity, such as humanizing murine antibodies, and improve stability by addressing issues like aggregation and poor solubility [97, 138]. Additionally, computational tools enable de novo antibody design for novel or challenging targets, accelerating the discovery process by screening large libraries in silico [126,127,128,129]. This approach offers a cost-effective and efficient alternative to traditional methods like hybridoma technology or phage display.
ML and DL approaches complement experimental methods in antibody discovery, design, and optimization by reducing costs and addressing the limitations of traditional structure prediction. [37, 112]. Tools such as Rosetta [82, 143, 144], AlphaFold2 and 3 [28, 145, 146], DeepAb [88], ABlooper [147], and DeepH3 [148] have contributed to advances in protein and antibody structure prediction, supporting the use of structural data in antibody design and optimization. These computational tools leverage AI and ML to improve understanding of protein folding and interactions, aiding in the prediction of antibody structures with potential for high affinity. However, challenges remain in fully predicting and optimizing antibody properties, requiring further validation and refinement. Rosetta [82, 143, 144] facilitates macromolecular modeling, aiding in antibody-antigen complex prediction and affinity optimization. AlphaFold2 and AlphaFold3 revolutionized protein folding predictions, providing high-accuracy models that inform experimental designs [28, 145, 146]. DeepAb [88] specializes in antibody structure prediction, while ABlooper [147] focuses on CDRs, refining binding specificity. DeepH3 [148] improves modeling of CDR-H3 loops, which are highly variable yet crucial for specificity and affinity. These tools streamline antibody design by bridging sequence-to-structure predictions, enabling high-throughput screening, and guiding mutagenesis for affinity maturation, significantly enhancing therapeutic antibody development. Despite challenges like model interpretability, data completeness, and VH-VL pairing, these advancements facilitate the study design, development, and optimization of computationally designed antibodies [34, 149].
SVMs were initially used to predict protein structural classes based on amino acid composition, categorizing proteins into classes such as all-α, all-β, α/β, and α + β using SCOP database. By employing polynomial and Gaussian RBF kernels, SVMs achieved 100% accuracy in self-consistency tests and 74.5% in jackknife cross-validation, outperforming neural networks in both accuracy and generalization [150]. AbCPE, a multi-label classification algorithm, predicts antibody classes (IgG, IgE, IgA, IgM) binding to specific B-cell epitopes using methods like Binary Relevance and Label Powerset with Random Forest and AdaBoost classifiers. It demonstrated high accuracy, achieving a Hamming Loss of 0.1074 on test data and 0.036 for IgG-binding predictions, excelling in tasks such as SARS-CoV-2 epitope prediction. As the first multi-label approach in this domain, AbCPE marks a significant advancement in immune-informatics, aiding vaccine development, therapeutic antibody design, and diagnostics [151].
Computational methods for antibody structure prediction face challenges in modeling variable regions, particularly the CDRH3 region, due to its length, flexibility, and the impact of VH-VL chain orientation on binding. Tools like Rosetta address these challenges by using template-based approaches for canonical CDR loops and de novo methods for CDRH3 loop prediction, refining VH-VL orientation to improve docking accuracy [82, 143, 144]. RosettaAntibody, a high-resolution homology modeling protocol, achieves a median RMSD of approximately 1.5 Å for antigen-binding pockets in benchmark studies, though its accuracy varies based on antibody complexity. It faces challenges in accurately modeling long CDRH3 loops, a common limitation in homology-based approaches. While its docking accuracy is moderate, it contributes to antibody design by providing structural models that inform stability and binding affinity optimization in combination with other computational and experimental techniques [152]. AbPredict, another Rosetta-based tool, assembles variable domain fragments without relying on homology but faces difficulties with rare loop lengths. New AI models such as DeepAb, DeepH3, and ABlooper offer faster, template-free 3D structure predictions, facilitating high-throughput therapeutic antibody screening [153].
OptMAVEn-2.0 is an advanced computational tool for designing antibody variable regions targeting specific epitopes. It addresses inefficiencies of its predecessor by using k-means clustering, humanization measures, and Modular Antibody Parts (MAPs) to create high-affinity antibodies, validated through molecular dynamics simulations. Its scalability makes it applicable to diverse antigens, including those from infectious diseases [141]. Gradient Boosting Machine (GBM) models enhance structural cluster prediction for non-H3 CDRs, increasing accuracy from 79% to 88.16% by integrating sequence features and outperforming traditional methods. Despite challenges such as data sparsity and cluster imbalances, synthetic data and semi-unsupervised learning approaches may further enhance these models [154].
SCALOP is a sequence-based tool for predicting CDR canonical forms, achieving 89.47% accuracy and processing 100 sequences in 0.29 s, making it ideal for large-scale antibody repertoire analysis and immunological research [155]. Recent studies highlight the structural significance of the DE loop, traditionally considered a framework region. Variations in the DE loop, driven by germline and somatic mutations, stabilize CDRs and enhance antigen-binding affinity, playing a crucial role in improving binding specificity, particularly in broadly neutralizing antibodies against HIV [156].
AbAgIntPre, a CNN-based tool, predicts antibody-antigen interactions with an AUC of 0.82 on SARS-CoV datasets [157]. A reinforcement learning framework further optimizes CDRH3 sequences for antigen specificity using methods such as Fitness Buffer and Q-Ensemble Stability, outperforming methods like Structured Q-learning and Bayesian Optimization [87]. ABDPO (Antibody Direct Preference Optimization) addresses antibody design as a sequence-structure co-design problem, using pretrained diffusion models and residue-level energy metrics to optimize CDR configurations. It resolves energy conflicts with gradient surgery, surpassing DiffAb and MEAN in reducing structural clashes and accelerating therapeutic antibody development with high functionality and natural structures [158].
Antigen–antibody interactions
Antigen–antibody interactions are essential for the immune response and serve as the foundation for therapeutic and diagnostic antibody design [159]. These interactions involve the binding of antibodies (immunoglobulins) to specific antigens, such as pathogens or toxins, with high specificity, where an antibody’s variable region recognizes a distinct epitope [38, 83, 159]. Binding occurs through non-covalent forces, including hydrogen bonds, ionic bonds, Van der Waals interactions, and hydrophobic interactions, ensuring strong yet reversible attachment [160]. The induced fit model explains how binding can induce conformational changes in both the antibody and antigen, enhancing specificity and affinity [159]. Additionally, these interactions are characterized by affinity (binding strength at a single site), avidity (multivalent binding effects), and structural complementarity, involving shape, hydrogen bonding, electrostatics, and hydrophobicity [161,162,163]. A deep understanding of these factors enables the rational engineering of antibodies with improved specificity, affinity, and stability, optimizing their application in immune defense, diagnostics, and therapy.
Researchers developed an SVM model to predict distance between antibody interface residues and antigens by using structural data from 37 antibody-antigen complexes in the Protein Data Bank (PDB). The model achieved up to 99% accuracy in predicting distance ranges (e.g., 8, 10, 12 Å) during validation, with improved performance for larger sequence patch sizes. Additionally, it classified antigen types (protein vs. non-protein) based on residue composition, aiding applications in epitope mapping, drug development, and vaccine design [164]. A method utilizing 3D Zernike Descriptors (3DZDs) and SVM classification accurately predicted antigen-binding regions (paratopes) on antibodies by incorporating geometric and physicochemical properties. It outperformed tools such as Paratome, Antibody i-Patch, and Parapred and was validated using Receiver Operating Characteristic (ROC) and Precision-Recall (PR) analyses. This approach optimized paratopes to enhanced antigen affinity and specificity while minimizing the need for extensive mutagenesis. It also demonstrated potential for antigen-specific predictions and integration with epitope and docking algorithms, enabling comprehensive interaction modeling [165].
A ML framework was developed for predicting antibody binding properties by incorporating data preprocessing, dimensionality reduction, and classification models. Using data from patent EP2275449B1, preprocessing included amino acid encoding and data balancing with Synthetic Minority Over-sampling Technique (SMOTE). Among six classifiers, Random Forest performed the best, achieving 97% accuracy for soluble BLyS and 83% for membrane-bound BLyS, excelling in precision, recall, and F-score [166]. Protein–protein interaction site prediction was improved using the XGBoost algorithm, which addressed imbalanced datasets with Repeated Edited Nearest Neighbor (RENN) and Instance Hardness Threshold (IHT) methods. By leveraging evolutionary conservation-based features, the model achieved 80.7% accuracy and a Matthews Correlation Coefficient (MCC) of 0.614 with IHT, outperforming traditional approaches. This method shows potential for large-scale protein interaction analysis, drug discovery, and cellular function studies [167].
A ML method predicted antibody-antigen binding directly from sequence data without requiring 3D structural information. Using Weighted k-NN and Random Forest models, it achieved 76% accuracy on a dataset of 600 computationally docked antibodies and 4441 interactions from the CoV-AbDab database. Features such as physicochemical properties and sequence metrics, refined using the BLOSUM62 matrix, improved prediction accuracy. While this approach advance immune repertoire analysis and antibody engineering, it underscores the need for more diverse, experimentally validated datasets [168]. The Antibody Random Forest Classifier (AbRFC) was developed to predict non-deleterious mutations in CDRs, improving antibody affinity. By leveraging structural and physicochemical features, AbRFC outperformed Graph Neural Networks (GNNs) and large language models (LLMs). Experimental validation demonstrated a 1,000-fold increase in SARS-CoV-2 antibody binding affinity against Omicron variants after minimal wet-lab screening [169]. A ML-guided platform incorporating AbRFC optimized antibody design by integrating computational prediction with experimental workflows. This platform enhanced antigen-binding affinity and developability using binary classification and advanced feature engineering, outperforming DL methods on small datasets. The iterative lab-in-a-loop framework achieved a two-order-of-magnitude improvement in binding affinity for anti-SARS-CoV-2 mAbs, demonstrating synergistic neutralization against Omicron variants and offering a scalable solution for therapeutic development against evolving pathogens [170].
Advancing antibody therapeutics with AI
Recent advancements in AI have revolutionized antibody discovery, design, and development. Traditional methods, such as phage or yeast display and animal immunization, are limited by biological and chemical constraints [35, 36, 171]. In the late twentieth century, computational biology introduced techniques like high-throughput virtual screening (HTVS), molecular docking, and molecular dynamics simulations, which were initially constrained by computational power [25, 172]. Improvements in hardware and processing technologies have enabled the development of more efficient algorithms, fostering progress in ML, DL, and AI, which further advanced antibody design [173].
ML and DL, subsets of AI, are powerful tools for analyzing large datasets. ML includes supervised learning (predicting outcomes from labeled data), unsupervised learning (finding patterns in unlabeled data), and reinforcement learning (goal-oriented learning through rewards and penalties) [174, 175]. DL, a subset of ML, utilizes deep neural networks to handle complex tasks but is computationally demanding and less interpretable due to its hidden layers [176, 177]. Both ML and DL have significant potential in drug design, particularly in structure-based small-molecule and antibody design using techniques like CNNs and RNNs [176,177,178,179].
Generative models like GANs, VAEs, and reinforcement learning (RL) are transforming drug design by identifying molecular patterns and enabling multi-objective optimization for properties such as drug-likeness, bioactivity, and pharmacokinetics [36, 180, 181]. These models have demonstrated success in creating molecules with high binding affinity, multi-target activity, and optimal ADMET profiles [36, 180, 181]. Such examples include generative tensorial reinforcement learning (GENTRL) and Policy Gradient for Forward Synthesis (PGFS), which identified DDR1 kinase inhibitors optimized for binding affinity and approved for preclinical testing within 46 days [182, 183]. Additionally, deep generative models (DGMs) in poly-pharmacology designed compounds targeting GSK3β and JNK3, enhancing both efficacy and safety [184]. Platforms such as Chemistry42 have advanced AI-driven drug design, exemplified by INS018_055 for idiopathic pulmonary fibrosis, which has progressed to Phase II trials [185, 186]. RNN-based models have also contributed, such as the generation of RIPK1 inhibitor RI-962, which exhibits potent in vitro and in vivo activity against inflammatory diseases [187]. Fragment-based design algorithms like RationaleRL optimize bioactivity for multiple targets, while flow-based models such as MoFlow efficiently map molecular graphs [184, 188]. PaccMann integrates 3D structural data for precise ligand–protein interactions, enhancing molecular generation accuracy and minimize reconstruction errors [36, 180, 181]. These advancements highlight AI's potential to streamline drug discovery, design, and optimization.
Transformer-based models, leveraging attention mechanisms, have significantly advanced sequential data processing for SMILES prediction, multitask learning, and chemical and antibody structure design and optimization [189,190,191]. Tools like AlphaPanda combine transformers models, 3D CNNs, and diffusion models to generate antibody structures, effectively capturing global sequence and local structural information for improved design [192]. In a case study, transformer and GAN models were used to diversify CDR3 regions, achieving an 87% success rate in identifying high-affinity antibodies [193]. AB-Gen, a generative pre-trained transformer with deep reinforcement learning, was employed to design HER2-targeting antibody libraries, with critical residues validated through simulations [194]. AntiBERTa, a transformer-based language model trained on 57 million human BCR sequences, excels in paratope prediction and outperforms tools such as Parapred [195], ProABC-2 [196], ProtBERT [197] and Sapiens [95]. It supports structure prediction, humanization, and BCR analysis, advancing antibody discovery, diagnostics, and engineering through self-supervised learning [198]. LLMs like ESM-1v predict the impact of mutations on protein function with accuracy comparable to supervised models using pre-trained data [199]. mCSM-AB2, a web server for antibody engineering, predicts mutation effects on antibody-antigen binding affinity using graph-based signatures, evolutionary data and energy-based features. It achieves high accuracy, with Pearson's correlation coefficients of 0.73 in training and 0.77 in blind tests, surpassing FoldX [101]. Based on a dataset of 1,810 mutations from AB-BIND [200], PROXiMATE [201], and SKEMPI2.0 [202], mCSM-AB2 aids in affinity maturation and large-scale mutation analysis, supporting therapeutic antibody development [203]. These advancements highlight the transformative role of transformer models and AI-driven tools in antibody engineering and drug design.
A ML framework predicts scFv poly-reactivity by combining antibody properties with NLP-based protein descriptors, identifying factors such as CDR2 loop net charge, specific residues, and loop length. Using models like GBM and Random Forest, it achieves high accuracy (AUC 0.840) and integrates aggregation scores and SASA via trRosetta for efficient scFv screening, advancing antibody and enzyme design [204]. Another framework leverages pretrained language models and Bayesian optimization to enhance scFv binding affinities, achieving up to 99% success, including a 28.7-fold improvement for heavy-chain scFvs. With up to 23 mutations, these diverse libraries outperform traditional methods like position-specific scoring matrix (PSSM), enabling advanced antibody engineering and multi-objective optimization [205].
NLP models such as ProtT5 [206], Transformer-XL [207], and BERT [208] are increasingly adapted in bioinformatics to analyze protein sequences without relying on evolutionary data [206]. Trained on large protein databases, these models accurately capture biophysical features like secondary structure and localization [206]. ProtT5, for instance, achieved state-of-the-art accuracy (81–87%) in secondary structure prediction while avoiding costly evolutionary data, enabling large-scale analyses like processing the entire human proteome in under an hour. These advancements demonstrate the potential of pre-trained language models in protein engineering, drug discovery, and functional annotation [206].
Structured Q-learning (SQL), an advanced reinforcement learning method, optimizes combinatorial structures in antibody design. Using Variable Allocation Markov Decision Process (VAMP) and structural exploration operators, SQL efficiently optimizes CDRH3 sequences for antigen binding, outperforming methods like policy gradients and simulated annealing. It generated over 300 unique optimal CDRH3 sequences per target with superior binding energy scores, including for SARS-CoV spike proteins, enhancing antibody diversity and quality while reducing computational demands [209]. AbBERT, trained on over 50 million antibody sequences from the Observed Antibody Space (OAS) dataset, integrates language models with sequence-structure co-design, achieving state-of-the-art accuracy in amino acid recovery (40.35%) and structural prediction (RMSD 1.62) for CDR-H3, enabling antigen-specific antibody design [210].
ReprogBERT repurposes a pretrained English BERT model for protein sequence infilling, specifically targeting antibody CDR design. It generates highly diverse CDR sequences, achieving a two-fold increase in diversity over baseline models while maintaining structural integrity and sequence naturalness. The model demonstrates superior performance, with lower perplexity scores and higher diversity in generated sequences. Structural validation using AlphaFold confirms the biological relevance of their outputs. These results suggest ReprogBERT’s potential for on-demand antibody design, making it a valuable tool for therapeutic and diagnostic applications [211]. AbImmPred predicts therapeutic antibody immunogenicity using AntiBERTy, a pre-trained antibody language model that extracts sequence features without 3D structural data [212]. Using AutoGluon, it achieves high accuracy (0.7273) with improved recall (0.9375), precision (0.7500), and F1-score (0.8333) over methods like PITHA [213]. This tool offers a cost-effective solution for early-stage antibody screening in computational immunology and therapeutic development [212].
Other LLMs and GPT inspired models such as ProtGPT2 [214], AbGPT [139], IgLM [215], AB-Gen [194], AntiBARTy Diffusion [216], and pAbT5 [217] have further advanced antibody design and engineering (discussed in later sections). These tools streamline therapeutic development by integrating AI-driven sequence and structure optimization, addressing immunological challenges, and enhancing protein engineering.
AI-driven methods have revolutionized protein structure prediction and interaction analysis, significantly advancing protein-based therapeutic development [218]. These algorithms excel at predicting structures, binding sites, and generating novel antibody sequences, achieving binding rates over 10% [34, 142]. By leveraging large datasets, ML, and NLP, AI enhances immunology, diagnostics, and drug discovery, accelerating the development of antibody-based therapeutics [219]. Table 1 summarizes the contributions of ML, DL, and generative AI to antibody design and optimization.
AI based early drug discoveries and platforms
AI-driven drug discovery has demonstrated significant success, beginning with BenevolentAI, which repurposed baricitinib for COVID-19, significantly reducing mortality rates in hospitalized patients [258, 259]. Following this, collaborations between Sumitomo Dainippon Pharma and Exscientia led to the rapid development of candidates such as DSP-1181 (targeting OCD) and DSP-2342 (for psychiatric diseases) within just 12 months, with Exscientia’s AI-driven platform accelerating synthesis and testing cycles [260, 261]. In 2021, Exscientia introduced an AI-designed A2A receptor antagonist, which entered clinical trials for advanced solid tumors [274]. Another key development was EXS-21546, co-developed with Evotec, targeting cancer by inhibiting the A2A receptor to enhance immune responses. Currently in Phase I/II trials (NCT05920408), it shows promise for treating solid tumors. Exscientia further introduced the adenosine burden score (ABS), an immuno-oncology biomarker predicting patient responses to EXS-21546, particularly in combination with checkpoint inhibitors [262].
AI has also accelerated drug discovery in neurological and inflammatory diseases, with BenevolentAI advancing candidates such as BEN-8744 (Phase I, NCT06118385) for ulcerative colitis, BEN-28010 for glioblastoma multiforme, and BEN-34712 for amyotrophic lateral sclerosis (ALS), which demonstrated significant preclinical efficacy [258, 259]. Beyond small-molecule drug discovery, AI is transforming antibody development, offering efficiency, accuracy, and cost reduction in therapeutic design [38]. AI-driven data analysis and pattern recognition enable precise identification of complex antibody-antigen interactions, enhancing therapeutic antibody design [38, 211, 263,264,265,266,267,268,269].
Several AI-powered platforms play crucial roles in antibody selection, modeling, and optimization. BenchSci (https://www.benchsci.com/) accelerates preclinical research, Atomwise's AtomNet (https://www.atomwise.com/how-we-do-it/) applies DL to molecular discovery, and DeepMind's AlphaFold (https://deepmind.google/) revolutionizes protein structure prediction [270]. Other platforms, such as Causaly (https://www.causaly.com/), Pharos iBio's Chemiverse (https://www.pharosibio.com/en/), Insilico Medicine's InClinico (https://insilico.com/), and Recursion Pharmaceuticals (https://www.recursion.com/), automate protein–ligand interaction analysis, clinical trial outcome predictions, and drug development acceleration. The Binding Site-Augmented DTA model, developed at the University of Central Florida, demonstrates how DL refines drug-target affinity predictions [269]. Generative biology, integrating AI with lab science, is transforming drug development by designing therapies that surpass natural proteins. Companies like Amgen and BigHat Biosciences use AI-driven wet lab experiments to enhance drug effectiveness, while AI-powered platforms such as NVIDIA's BioNeMo (https://www.nvidia.com/en-us/clara/bionemo/), BenchSci's ASCEND (https://knowledge.benchsci.com/home/platform-overview), and Receptor.AI (https://www.receptor.ai/) streamline drug optimization, reducing time and costs. Figure 5 showed the importance of several selected AI based platforms in drug discovery and design.
AI-driven antibody drug design has led to multiple candidates progressing into clinical trials. Biolojic Design’s AU-007, targeting IL-2 to enhance cancer immune responses, is currently undergoing trials [271]. AbCellera Biologics developed bamlanivimab for COVID-19, which entered Phase II trials. Adagene’s AI-derived antibodies, including ADG-106, 104, 116, 126, BC-006, and Adimab’s PM-1022, are being developed for oncology. Compugen's bapotulimab, COM-701, and COM-902, along with HiFiBiO Therapeutics' antibodies (HFB-301001, HFB-200301, and HFB-30132A for COVID-19), are undergoing Phase I or II trials, demonstrating AI’s transformative role in antibody drug design [272]. NL-201 by Neoleukin Therapeutics, a de novo protein mimicking IL-2 and IL-15, reached Phase 1 trials for cancer before discontinuation in 2022 [273].
Several industrial collaborations have accelerated AI-driven drug discovery. Recursion Pharmaceuticals and Exscientia partnered to scale AI-driven research, while Absci and AstraZeneca are co-developing an AI-designed oncology antibody. Antiverse and Nxera Pharma (formerly Sosei Heptares) are creating GPCR-targeted antibodies, integrating Antiverse's generative AI with NxWave™, a GPCR validation platform [274, 275]. AION Labs launched DenovAI, a startup focused on de novo antibody design, leveraging AI and biophysics to develop high-affinity antibodies and miniprotein binders, supported by Pfizer, AstraZeneca, Merck, Teva, AWS, and the Israel Biotech Fund [276]. These partnerships demonstrate AI’s growing role in drug discovery, accelerating therapeutic development and improving clinical success rates. As AI continues to evolve, its integration with experimental research is expected to drive unprecedented advancements in precision medicine and personalized therapeutics. AI-assisted drug candidates in clinical investigations are extensively reviewed elsewhere [36].
AIs application in antibody design and optimization
mAbs are essential for immune responses due to their broad functionality and prolonged half-lives [277]. However, stability issues vary across formats; full-sized antibodies often lack thermodynamic stability despite being kinetically stable, whereas single-chain variable fragments (scFvs) typically exhibit poor to moderate stability [278]. Advances in antibody engineering, such as humanization, point mutations, and stable framework grafting, have improved stability, folding efficiency, and reduced immunogenicity [277,278,279]. Modern antibody design integrates computational and experimental techniques for precision and efficiency. Computational tools like protein modeling, molecular docking, and dynamics predict structures, optimize electrostatics, and identify high-affinity mutations, particularly in regions like CDR-H3 [280, 281]. Structure-guided approaches analyze and optimizes antibody-antigen interactions using hydrogen bonding, electrostatics, and shape complementarity, while experimental methods like phage display, mutagenesis, and X-ray crystallography validate these predictions [282, 283]. This synergy enhances binding affinity, reduces aggregation, and accelerates therapeutic development, as exemplified by affinity improvements in cetuximab (140-fold affinity improvement) and engineered anti-VEGF antibodies [280, 281, 283]. These methods are essential in modern antibody engineering, focusing on principles like stabilizing framework-CDR interactions and preserving critical amino acids in the Fv backbone for stability and specificity [284]. Such designs achieve high binding specificity and stability, even with significant divergence from natural germlines, showcasing the effectiveness of computational approaches [284]. Computational approaches have also been used to predict beneficial mutations and redesign antibodies by altering a single CDR [80, 285, 286]. For instance, stepwise randomization of residues in CDR-H3 and CDR-L3, guided by computational docking, improved the binding affinities up to 25.7-fold, as seen with P96 mutations in CDR-L3. Additionally, efforts have focused on predicting antibody structures directly from amino acid sequences [28, 82]. These studies highlight the potential of computational methods to enhance antibody-binding properties. The primary goal of AI is to predict and automate the generation of CDR subsequences with properties essential for antigen binding and antibody effectiveness [42, 149]. Designing CDRs that target specific antigens is crucial for therapeutic antibody development but challenging due to the vast combinatorial space of over 2060 possible sequences [287]. Testing all CDR combinations experimentally is impractical, necessitating computational approaches. Traditional methods relying on biophysical energy functions are time-consuming and prone to local optima [288]. AI has transformed CDR design by enabling precise modeling and prediction, significantly accelerating the process. Beyond automated CDR prediction, AI also improves antibody stability, folding efficiency, and epitope-paratope interactions, particularly in accurately predicting CDR-H3 conformations.
Table 2 summarizes tools and models developed using AI, ML, generative AI, and LLM-based algorithms.
AI in B cell epitope prediction, paratope identification, and antibody-antigen interactions
Epitopes are specific regions on antigens that antibodies recognize and bind to, playing a key role in specificity, binding affinity, and immune targeting of cancer cells [305]. They contribute to immune system targeting of cancer cells and may support long-term immunity through immune recognition. Designing antibodies to target specific epitopes involves challenges such as ensuring accessibility and overcoming tumor variability, making epitopes central to the precision and effectiveness of therapeutic antibodies in cancer treatment [107].
A 2007 benchmark study revealed that no tools for antibody-protein interactions achieved precision over 40% or recall beyond 46% [306]. Since then, a knowledge-based method using consensus structural data for CDRs and B-cell epitopes has been developed and refined [267, 305]. The need for better in silico B-cell epitope prediction tools and the value of combining computational and experimental methods were also emphasized [114]. Recent advances in predicting conformational B-cell epitopes offer potential for vaccines and therapies, but further improvements in accuracy could enhance immunotherapeutic drug development [305].
A model named SEMA was developed using a transfer learning approach with pretrained DL models, ESM-1v and ESM-IF1. The SEMA model, fine-tuned to predict antibody-antigen interactions, achieved a ROC AUC of 0.76 on an independent test set, demonstrating competitive performance compared to peer-reviewed tools [237]. Additionally, a DL-based framework using graph convolutions (to capture spatial proximity) and attention layers (to capture antibody-antigen context, utilizing transfer learning from general protein interactions) accurately predicts binding interfaces on antibodies and antigens, enhancing accuracy and providing interpretable insights into antibody-antigen interactions [297]. The structure-based paratope prediction tool, Paragraph, has demonstrated strong performance in paratope prediction, offering an alternative to existing state-of-the-art methods [298]. It was shown that epitope residues are distant and antigen-dependent, leading to the development of Para-EPMP and Epi-EPMP models. Para-EPMP processes antibody features sequentially using a graph structure, while Epi-EPMP incorporates structural features and GNN layers with contextual information from the related antibody [299]. Unlike the complex EPMP approach, Paragraph focuses solely on structural features for accurate paratope prediction. A structure-based framework known as Deep Learning for Antibodies (DLAB) was created to enable virtual screening of antibodies against antigens, even in the absence of known binders. DLAB improves antibody-antigen docking by refining pose ranking and demonstrating strong performance in identifying compatible pairings in validation studies. Case studies highlight DLAB’s effectiveness in detecting binding antibodies, indicating its potential to support and streamline aspects of the antibody drug discovery process [296]. Similarly, AbAgIntPre, a DL-driven approach, was designed to swiftly predict antibody-antigen interactions using amino acid sequences. Leveraging a Siamese-like CNN, it achieved an AUC of 0.82 on an independent test set, demonstrating strong predictive accuracy in a SARS-CoV dataset. These models are expected to complement traditional methods and provide computational insights that may improve aspects of antibody design [157]. These studies emphasize the critical role of accurate B-cell epitope prediction, paratope identification, and antibody-antigen interaction analysis in antibody engineering.
AI applications in CDR prediction, generation and modeling
Structural Prediction of CDRs
DeepH3, a DL approach, predicts CDR H3 loop structures using inter-residue distances, orientations, and geometric potentials. It demonstrates improved structure identification accuracy by 32.1% and achieves a mean RMSD of 2.2 ± 1.1 Å in de novo predictions, outperforming traditional methods [148].
IgFold is a fast DL method for antibody structure prediction; it uses a pre-trained model on 558 million antibody sequences combined with graph networks to estimate backbone atom coordinates [24]. Its predictions are comparable to or, in some cases, surpass those of methods like AlphaFold, with significantly faster processing times, enabling large-scale studies. For instance, IgFold has been applied to predict structures for 1.4 million paired antibody sequences, vastly expanding insights into antibody diversity beyond experimentally resolved structures [24].
ImmuneBuilder is a toolkit for modeling immune-related proteins, featuring ABodyBuilder2 for antibodies, NanoBodyBuilder2 for nanobodies, and TCRBuilder2 for T-cell receptors [291]. ABodyBuilder2 predicts CDR-H3 loops with an RMSD of 2.81Å, slightly better than AlphaFold-Multimer, while NanoBodyBuilder2 achieves an RMSD of 2.89Å, surpassing AlphaFold2 by 0.55Å. ImmuneBuilder provides structure ensembles and error estimates, offering additional insights into prediction confidence [291].
ABlooper, an end-to-end DL tool, predicts CDR loop structures with a focus on the variable CDR-H3 loop. It provides rapid, reliable predictions with confidence estimates, achieving an average RMSD of 2.49 Å for CDR-H3, improving to 2.05 Å for the top 75% of high-confidence predictions on Rosetta Antibody Benchmark models. This tool contributes to advancements in antibody modeling, particularly for the challenging CDR-H3 loop [147].
AbFlex, an advanced antibody design model, addresses limitations in structural prediction and amino acid recovery. Over 38% of designed antibodies exhibited binding energies that were lower (indicative of stronger binding) compared to the wild-type in computational evaluations. Utilizing a data-efficient equivariant graph neural network and flexible CDR definitions with novel data augmentation, AbFlex enhances CDR predictions and antibody binding efficiency, advancing therapeutic antibody engineering [289].
AbDiffuser, an equivariant physics-informed diffusion model, was developed to improve antibody 3D structure and sequence generation. It features a novel protein structure representation, an innovative architecture for aligned proteins, and strong diffusion priors, enhancing denoising, handling sequence-length changes, and reducing memory complexity. Numerical experiments validated AbDiffuser’s ability to generate antibodies matching reference sequences and structures. Laboratory tests demonstrated expression for all 16 discovered HER2 antibodies, with 57.1% exhibiting tight binding characteristics [290].
DeepSCAb, a DL framework, predicts inter-residue geometries and side-chain dihedrals of antibody variable fragments using only sequence data. It excels with unknown backbone conformations, leveraging self-attention to detect conserved positions across species. DeepSCAb demonstrates competitive performance in identifying near-native structures and achieves accuracy comparable to rotamer repacking for side-chain prediction. This advancement enhances antibody structure prediction, aiding in antibody-antigen docking and therapeutic antibody design [292].
Graph-based and sequence-based models have been developed to predict antibody-antigen interactions and affinity without relying on crystal structures, facilitating structural inference [163]. Generative models employing backbone modeling and inverse folding techniques focus on predicting CDR structures, with some approaches exploring full-atom modeling [290, 300, 302, 307]. Additionally, diffusion probabilistic models integrated with equivariant neural networks have been introduced to co-design antibody sequences and structures, addressing challenges such as linking CDR sequences to 3D conformations and modeling their distribution within full antibody sequences [250].
CDR sequence optimization for affinity enhancement
Various DL techniques have been employed to enhance CDR sequence affinity. An LSTM-based approach was used to identify binding sites in DNA-binding hydrolytic antibodies (abzymes) from FASTA sequences [227], while comparative studies revealed that CNNs outperformed LSTMs in binding prediction. However, LSTMs provided valuable insights into subsequences correlated with known binding sites, demonstrating their utility in primary sequence analysis [227]. A transformer model, fine-tuned on 100,000 antibody sequences, was used for clustering clones, while a GAN model generated novel sequences, improving diversity and affinity [193]. A study using DGMs for de novo HER2 antibody design identified binders from a library of ~ 10⁶ HCDR variants, achieving binding rates of 10.6% for HCDR3 and 1.8% for HCDR1 [142]. Surface plasmon resonance (SPR) analysis of 421 binders found 71 with low nanomolar affinities, comparable to trastuzumab, and 11 high-affinity binders with equal or superior functionality, enhanced developability, and potency [142]. This approach has the potential to accelerate therapeutic antibody development for diverse targets.
Optimal CDR (OptCDR) generates CDR libraries with high antigen affinity while maintaining compatibility with humanization protocols and natural frameworks, demonstrating binding performance comparable to experimentally evaluated natural antibodies [80]. OptMAVEn enables de novo design of variable regions, enhancing binding affinity and reducing immunogenicity, with engineered CDRs showing high specificity and sensitivity for target epitopes [140]. The quality of the synthetic antibody libraries is vital for isolating effective recombinant antibodies [308]. Ens-Grad, an ML method, designs CDRs for human IgG antibodies and has shown improved target affinities over traditional phage display in experimental benchmarks [295]. By merging models from various experiments, it predicts effective antibody binding from high-throughput data without detailed structural information, enabling novel therapeutic development [295]. A DGM using long short-term memory (LSTM) was introduced for antibody sequence generation and prioritization, enabling the discovery of high-affinity antibody sequences [226]. These models demonstrate the ability to effectively model complex amino acid interactions critical for precise antigen recognition and binding, offering advantages over some conventional algorithms [149]. The LSTM model efficiently generates and prioritizes sequences, correlates likelihood with binding affinity, and monitors sequence enrichment, reducing repetitive mutation experiments and screening costs [149]. By leveraging next-generation sequencing (NGS) data, it optimizes CDR sequences, explores virtual libraries beyond phage display, and identifies key residues from limited data, enhancing high-affinity antibody discovery with fewer iterations [226].
GANs have been applied to generate functional protein sequences, addressing the randomness and low success rates of traditional GANs [85]. To optimize antibodies, a language-model-guided GAN (AbGAN-LMG) was developed, leveraging language models to improve GAN performance. AbGAN-LMG contributed to antibody optimization by generating diverse candidates and improving efficiency in design processes. In evaluations for COVID-19 and MERS, over 50% of sequences generated for AZD-8895 showed better developability than the original, and molecular docking identified 70 antibodies with higher affinity for the SARS-CoV-2 receptor-binding domain [85]. A convolutional neural network encodes antibody light and heavy chain CDR3s as images to distinguish binders from non-binders [223]. It employs in silico mutagenesis to identify critical CDR3 residues and generative adversarial networks to create synthetic antibodies targeting PD-1 and CTLA-4, as well as variable-length CDR3 sequences [223]. This study demonstrates the potential of DL to uncover patterns in antibody sequences, enhancing engineering, optimization, and discovery. IgDiff addresses antibody design challenges by generating highly designable antibodies with novel binding regions and well-aligned backbone dihedral angles to ensure structural integrity. It has shown strong performance in generating CDRs and pairing light and heavy chains, performing competitively with state-of-the-art models in benchmark evaluations [294]. A recent study introduced the Multi-channel Equivariant Attention Network (MEAN), which frames antibody design as a conditional graph translation problem. Using E(3)-equivariant message passing and a novel attention mechanism, MEAN demonstrated a 23% improvement in antigen-binding CDR design and a 34% boost in affinity optimization in benchmark evaluations [263].
Generating CDR libraries
AI-driven generative models have contributed significantly to creating diverse CDR libraries for antibody discovery. DGMs trained on extensive antibody sequence datasets have successfully designed high-affinity, epitope-specific antibodies, with some models demonstrating improved binding properties beyond those observed in training datasets [231]. A study utilizing a FACS-enriched yeast library from an immunized alpaca (Lama pacos) identified 104 sequences, yielding 103 unique single-domain antibodies (sdAbs) via next-generation sequencing [193]. The GAN model contributed by generating a virtual library to enhance CDR sequence diversity, potentially enabling a broader range of affinities and functions. Additionally, a lattice-based simulation framework was employed to evaluate ML-generated antibody sequences by simulating 3D structures from 1D sequences, further validating the feasibility of high-throughput antibody design [231]. OptCDR directly creates diverse CDR libraries tailored for high-affinity binding, broadening the screening space while maintaining strong binding properties in selected candidates [80]. The Immunoglobulin Language Model (IgLM), a DGM trained on 558 million antibody sequences, was recently developed. Using a text-infilling approach with bidirectional context, IgLM generates variable-length antibody sequences, enabling full-length antibody design across species. It creates CDR loop libraries with improved in silico developability, with optimizations aimed at reducing solubility issues, aggregation, and immunogenicity [215]. ReprogBERT, a novel approach using Model Reprogramming, adapts a pretrained English language model for protein sequence infilling. In benchmark evaluations, it generated highly diverse CDR sequences, showing up to twice the diversity of baseline models while maintaining structural integrity and naturalness [211]. GAN-based approaches, such as AbGAN-LMG and CNN-based mutagenesis, can expand candidate CDR sets for multiple targets, contributing to the generation of synthetic libraries that support antibody discovery workflows [85]. Computational protein design seeks to create novel, diverse protein sequences for a given structure, but it remains challenging. A recent study benchmarked three DGMs: the autoregressive model (AR), the graph neural network (GVP), and Fold2Seq. Fold2Seq generated diverse antibody sequences while maintaining structural integrity, demonstrating superior performance over other models in benchmark comparisons [254].
Optimizing CDR immunogenicity
To address immunogenicity concerns in antibody design, AI models have been developed to aid in immunogenicity prediction and optimization, demonstrating promising results in various benchmarks. EquiPocket, an E(3)-equivariant geometric graph neural network, was introduced to predict ligand-binding sites, effectively capturing irregular protein structures and surfaces, which can contribute to immunogenicity assessment [249]. Modeling sequence variation effects on protein function is vital for protein design [199]. Evolution encodes functional information in protein sequences, allowing unsupervised models to predict variant effects [199]. Protein language models utilizing zero-shot inference have shown strong performance in predicting the functional impacts of sequence variations, providing insights into immunogenicity assessment without requiring additional training or experimental data [199]. Further advancements in synthetic antibody research, supported by deep sequencing and advanced computational algorithms, have expanded antibody repertoire analysis, facilitated novel sequence prediction, and enabled de novo antibody generation, contributing to progress in immunology and biological therapeutics [23]. OptMAVEn’s de novo design strategy aims to reduce immunogenicity by emphasizing human-like sequences and minimizing T-cell epitopes, while maintaining high specificity and sensitivity for selected antigens [140]. Many generative models, such as IgLM or AbGAN-LMG, incorporate developability filters or scoring metrics to help address immunogenic risks, aiming to design candidate CDRs with reduced reactivity and closer alignment to human germline frameworks [85, 215]. OptCDR’s ‘standard humanization’ approach similarly focuses on mitigating potential immune responses by reducing non-human elements in designed CDR sequences, improving compatibility with human frameworks.
Other antibody engineering methods
Nach0, a multi-domain, multi-task encoder-decoder model pre-trained on datasets like scientific literature, patents, and molecule strings, was fine-tuned with the NeMo framework. It demonstrated competitive performance against state-of-the-art models in single- and cross-domain tasks, producing high-quality molecular and textual outputs [309]. An RNN trained on ~ 24 million UniRef50 sequences improved protein function prediction, addressing the challenge of generalizing predictions to evolutionarily distant sequences. These advances support protein engineering by aiding the identification and prioritization of functional sequences, contributing to efforts in optimizing protein diversity and function [304]. Diffusion probabilistic models and equivariant neural networks, including AbDiffuser, offer alternatives or complements to purely physics-based or purely data-driven approaches by modeling sequence and structure simultaneously, with the potential for improved efficiency [290]. Probabilistic methods, including directed search algorithms, have been applied to de novo antibody design, helping identify sequences with desired traits. Directed searches identify sequences with specific characteristics, while probabilistic approaches estimate site-specific amino acid probabilities to achieve target structures [310]. This approach supports de novo protein design and combinatorial library engineering by converting probabilities into nucleotide distributions, helping DNA synthesizers generate degenerate sequences with improved fidelity [310]. Designing diverse, stable, and well-expressed antibody libraries is challenging, as large synthetic libraries often contain sequences with reduced functionality [310]. To address this, advanced NLP-based computational methods were developed for alignment-free prediction and functional sequence design [253]. DGMs predicted missense and indel effects and were applied to design a 105-nanobody library, which demonstrated improved expression compared to larger synthetic libraries, contributing to protein design and biotherapeutic research [253]. A separate study combined phylogenetic and atomistic calculations to optimize protein stability, expressibility, and activity, offering evolutionary insights into enzymes and binders [311]. Table 3 summarizes AI-based computational studies, their applications, and methods, offering insights into addressed challenges and solutions.
Challenges and Future Prospects of Antibody Designing
Antibody design faces several challenges, including predictive limitations, computational complexity, data scarcity, and time-intensive processes [35, 37, 38, 331]. Many key physicochemical properties, such as stability, solubility, immunogenicity, and affinity, are challenging to predict solely from sequences and often require experimental validation for confirmation [35, 331]. These properties directly impact efficacy, manufacturability, and patient safety [10, 11, 331]. AI has contributed to advancements in antibody design by improving predictive accuracy, assisting in CDR sequence generation, enhancing homology modeling, and optimizing antibody-antigen interactions [22, 35, 37, 231, 306]. Despite ongoing advancements, AI faces challenges in generalization, dataset limitations, and computational demands, which may affect the seamless translation of in silico designs into real-world therapeutics. However, ongoing improvements in training methodologies and data availability have the potential to enhance its predictive power and applicability [22]. The choice of a machine learning algorithm depends on dataset availability and application objectives. DL typically benefits from large datasets, but for smaller datasets, traditional ML methods may sometimes perform better unless transfer learning or data augmentation techniques are applied [11]. Overcoming these obstacles requires integrating AI with physics-based modeling, multi-objective optimization, and experimental feedback loops to enhance predictive power, manufacturability, and patient safety [22, 184, 290, 294, 332].
Stability challenges
The stability of an antibody is critical for therapeutic efficacy, safety, and manufacturability, directly influencing shelf life, formulation, and patient compliance [13]. However, predicting stability remains a challenge due to the complex interplay between sequence, structure, and environmental conditions, despite advancements in computational modeling [13]. While the primary sequence provides valuable insights, it cannot fully determine stability, as subtle structural variations and intermolecular interactions significantly impact behavior [331]. Computational methods face limitations in accurately predicting these variations, making experimental validation an essential complement. Key challenges include thermal instability, leading to denaturation and efficacy loss; chemical degradation, such as oxidation and deamidation, which affect safety and require extensive testing; and aggregation, influenced by factors like pH and concentration, which can hinder therapeutic function and trigger immune responses [13, 89, 107, 138, 278, 331, 333]. Conformational stability is important for maintaining antibody functionality and reducing the risk of immunogenicity or aggregation [10, 331]. Environmental factors such as temperature, pH, and ionic strength can trigger denaturation, while protein misfolding and intermolecular interactions further impact stability [331]. Post-translational modifications, such as glycosylation, may enhance or destabilize antibody structure [10].
Long-term stability challenges, including temperature fluctuations and light exposure, often necessitate optimization strategies such as lyophilization to improve efficacy and manufacturability [331]. Thus, while predictive models have limitations, experimental validation remains an important tool for ensuring stability assessments [331]. AI-based stability prediction methods face challenges in fully capturing stability variations across different pH, temperature, and formulation conditions [334]. Trade-offs between stability, affinity, and immunogenicity add complexity to multi-objective optimization, and current AI models have limitations in fully capturing these intricate relationships [38]. Addressing these challenges requires integrating AI with experimental validation and optimization techniques to enhance predictive accuracy and develop stable, effective therapeutics.
Immunogenicity challenges
Immunogenicity is an important consideration in antibody development, influencing efficacy, safety, and regulatory approval [13, 212]. Highly immunogenic antibodies can trigger immune responses, neutralizing their therapeutic effects and increasing the risk of adverse reactions, including hypersensitivity and autoimmunity [13, 98, 212]. AI models face data limitations, as well-annotated datasets of immunogenic and non-immunogenic antibodies remain limited [98, 212, 335]. Additionally, immune system interactions are highly variable, influenced by host genetics, post-translational modifications, and prior exposures, making sequence-based immunogenicity predictions challenging and context-dependent [11, 13, 212]. AI models have limitations in accurately predicting T-cell and B-cell epitopes, which are key determinants of immune reactions [336]. AI-generated antibodies may exhibit low immunogenicity, but without proper constraints, they could deviate from natural antibody structures, potentially affecting expression or functionality [337]. Addressing these challenges requires integrating AI-driven predictions with experimental validation to improve the reliability of immunogenicity assessments.
Solubility challenges
Solubility is essential for antibody stability, efficacy, and manufacturability. Poor solubility can lead to aggregation, reducing function and increasing immunogenicity risks [89, 96, 98, 99]. AI-based solubility prediction faces challenges due to limited high-quality training data and the complexity of solubility-influencing factors such as amino acid composition, surface charge, and hydrophobicity [22, 97, 99, 338]. AI-generated antibodies may include aggregation-prone regions, which could impact manufacturability and therapeutic efficacy if not properly addressed [339]. The design process, even with advanced computational methods, may not always capture the full complexity of amino acid interactions that contribute to protein stability [96, 97, 340]. These aggregation-prone regions can lead to increased aggregation rates, potentially affecting solubility, functional activity, and immunogenicity risk [103, 341]. Enhancing solubility may require mutations that alter charge distribution and hydrophobicity, which could impact affinity or stability, adding complexity to multi-objective optimization [89, 96, 97, 108, 342]. Addressing these challenges requires a balance between computational predictions and experimental validation to develop stable and therapeutically effective antibodies.
Affinity challenges
Antibody affinity is essential for therapeutic efficacy, dose optimization, and stability. High-affinity antibodies improve pathogen neutralization and therapeutic targeting, allowing for lower doses and reducing side effects [162, 205, 264]. However, optimizing affinity remains challenging due to the complexity of antibody-antigen interactions, which involve structural conformations, thermodynamics, and binding kinetics [10, 83, 161, 162, 205, 264, 331]. AI models face challenges due to the limited availability of high-quality structural data and often rely on small, curated datasets, which may introduce biases [169]. Traditional affinity maturation techniques, such as phage display and hybridoma methods, are costly and time-consuming, while AI-generated predictions still require experimental validation to confirm accuracy and functional performance [98]. The vast combinatorial space of CDR sequences adds complexity to optimization, requiring AI to balance high-affinity binder selection with the maintenance of other essential properties. Addressing these challenges involves integrating AI-driven predictions with experimental feedback to improve affinity maturation strategies efficiently.
Broader challenges in AI-based antibody design
AI-based antibody design faces broader challenges beyond optimizing individual properties, which can affect its generalizability in real-world applications. AI models can face challenges with generalization, sometimes generating variations of known antibodies rather than entirely novel candidates, which may affect adaptability to emerging diseases [11, 22, 26, 34]. Overfitting remains a challenge in AI-driven antibody design, potentially limiting the exploration of diverse solutions and requiring advanced regularization techniques to enhance innovation [193].
AI-generated sequences may exhibit limited conformational flexibility in some cases, potentially affecting affinity, accuracy, and functionality, especially when models rely heavily on predefined templates and training data patterns [22]. A key challenge is the limited availability of high-quality data, which can impact AI’s ability to generalize across diverse therapeutic targets. AI models require diverse, unbiased datasets, but data scarcity—particularly for rare or newly emerging pathogens—can impact model accuracy and generalizability [210, 211, 218, 343, 344]. Integrating in vitro and in vivo experimental data remains challenging due to biological variability, which can affect AI’s ability to generalize across different biological systems. Navigating the vast sequence and chemical space of antibodies adds another layer of complexity [345].
Binding affinity and specificity predictions require advanced learning frameworks to model intricate molecular interactions [22, 88, 342]. Many-to-many binding dynamics introduce nonlinear dependencies that traditional models may struggle to capture, making optimization more difficult [346, 347]. Parameter interdependence further complicates AI-driven design, as improving one property—such as affinity—may negatively impact stability, pharmacokinetics, or immunogenicity. Computational inefficiencies remain a challenge in AI-based antibody optimization. Traditional sampling algorithms and statistical energy functions may struggle to efficiently explore the vast search space, sometimes getting trapped in local optima [11, 26, 34, 158, 288]. Antibody structural complexity further complicates predictive modeling, as capturing 3D conformational dynamics and their effects on binding interactions requires high computational resources. Designing 3D ligands, CDRs, and accurately modeling protein interactions remains a challenge, though advances in AI-driven structural modeling are improving predictive accuracy [344, 348].
Integrating multi-omics data improves predictive accuracy by combining genomic, proteomic, and transcriptomic insights but adds computational complexity due to format and context variations. Antibody optimization requires balancing efficacy, safety, and manufacturability, making multi-objective parameter refinement challenging [349, 350]. Optimizing different antibody regions simultaneously demands robust algorithms to navigate trade-offs between binding strength, stability, and immunogenicity [349,350,351]. Overcoming these challenges requires advancements in AI methodologies, improved data curation, and better integration with experimental approaches. Enhancing AI with physics-based modeling, reinforcement learning, and deep generative models will improve antibody discovery, making AI-driven design more effective for next-generation therapeutics.
Future prospects
The validation of AI-generated antibodies presents significant challenges due to the complexity of biological systems, necessitating extensive in vitro and in vivo testing [35]. These processes are time-consuming and resource-intensive, requiring specialized equipment, skilled personnel, and high costs, which further slow development. Iterative feedback loops extend timelines as AI models refine predictions based on experimental results. The limited predictive power of current models and discrepancies in antibody-antigen interactions necessitate further improvements. Additionally, regulatory hurdles add complexity, delaying approval processes [22].
Overfitting remains a challenge, as some AI models generate candidates that resemble known antibodies rather than entirely novel structures [85, 290]. High-throughput experimental data can mitigate overfitting by exposing AI models to greater sequence diversity, promoting the creation of novel yet functional antibodies [142]. In some cases, AI-generated antibodies may exhibit limited structural flexibility, potentially affecting binding efficacy, emphasizing the need for experimental validation to refine AI predictions. Integrating ligand- and structure-based design strategies may also enhance predictive accuracy by capturing atomic-level interactions and improving AI-driven antibody discovery reliability [352].
While sequence-based DL methods have successfully generated antibody candidates, some models face challenges in achieving precise antigen specificity due to their limited ability to model atomic-level interactions [19, 163, 257, 260]. This limitation hinders precise antibody design, as these models primarily rely on sequence patterns rather than fully capturing structural and functional constraints. While tools like AlphaFold2 effectively predict per-amino-acid orientations, they face challenges in generating diverse and functionally relevant protein structures [28]. Deep generative diffusion models show promise in addressing these challenges by facilitating the design of CDR structures, novel sequences, and molecular 3D conformations [178, 288, 294, 328, 353, 354]. These models contribute to bridging the gap by translating sequence and multiple sequence alignments (MSAs) into atomic-level 3D structures, potentially enhancing precision for specific antibody targets [293, 327, 330]. AI-driven platforms that combine deep sequencing-based computational methods with advanced data processing further expand antibody exploration. While these advancements show promise, experimental validation remains essential to ensure real-world applicability and functional reliability.
To further improve AI-driven antibody design, several key strategies are needed. Physics-based modeling, including molecular dynamics simulations and free energy calculations, may provide additional mechanistic insights into antibody-antigen interactions, complementing AI-driven sequence and structure predictions. Multi-objective optimization is critical for balancing stability, affinity, immunogenicity, and manufacturability, with reinforcement learning and evolutionary algorithms optimizing multiple properties simultaneously [349, 350]. Additionally, improving data integration and augmentation is necessary, as expanding high-quality, experimentally validated datasets and leveraging deep sequencing-driven computational techniques will enhance model robustness [210, 211, 218]. Data augmentation strategies, such as self-supervised learning and generative modeling, can help overcome data scarcity and improve AI generalization. Explainable AI (XAI) is expected to play a key role in increasing trust in AI-generated predictions by enhancing transparency in decision-making for scientists and regulatory authorities. As AI continues to evolve, integrating advanced computational methods with experimental validation will be crucial for addressing current limitations and advancing next-generation antibody therapeutics and precision medicine.
Conclusion
This review highlights the transformative impact of AI in antibody drug design, with deep generative models demonstrating significant potential in complementing traditional methods by automating CDR sequence generation, refining antibody-antigen interaction modeling, and improving predictive accuracy. These advances streamline sequence optimization and accelerate drug development. Despite these advancements, challenges persist, including data quality limitations, model interpretability, high computational costs, and the vast complexity of antibody sequence space. Addressing these issues requires integrating ligand- and structure-based design strategies, incorporating explainable AI (XAI) for transparency, and continuously improving data availability and model generalization. Future AI-driven protein design is likely to focus on scaling models and datasets to improve generative performance, with the aim of enabling more precise and function-specific antibody designs. The integration of AI with natural language models has the potential to simplify design workflows, while laboratory automation may aid in validation and real-world applications. However, AI-driven antibody design faces challenges in generalization, real-world applicability, and reliance on high-quality experimental data. Despite these challenges, AI-driven approaches—particularly in oncology—offer promising potential for automated de novo antibody design and optimization. Startups such as Atomwise and BenevolentAI are contributing to advancements in AI-driven drug discovery by exploring faster and potentially more cost-effective solutions. With continued advancements in deep learning, physics-based modeling, and experimental validation, AI is likely to play an increasingly important role in accelerating therapeutic antibody development.
Data availability
No datasets were generated or analysed during the current study.
Abbreviations
- 3D:
-
3 Dimensional
- ADCs:
-
Antibody-Drug Conjugates
- ADMET:
-
Absorption, Distribution, Metabolism, Excretion and Toxicity
- AI:
-
Artificial Intelligence
- AR:
-
Autoregressive Model
- AUC:
-
Area Under the Curve
- bsAbs:
-
Bispecific Antibodies
- CAR:
-
Chimeric Antigen Receptor
- CDR:
-
Complementarity-Determining Region
- CM:
-
Co-stimulatory molecule
- CNNs:
-
Convolutional Neural Networks
- CTLA-4:
-
Cytotoxic T-Lymphocyte Associated Protein 4
- DGMs:
-
Deep Generative Models
- DL:
-
Deep Learning
- DNN:
-
Deep Neural Networks
- EGFR:
-
Epidermal Growth Factor Receptor
- Fab:
-
Fragment antigen-binding
- Fc:
-
Fragment crystallizable
- FDA:
-
Food and Drug Administration
- GAN:
-
Generative Adversarial Networks
- GENTRL:
-
Generative Tensorial Reinforcement Learning
- GPU:
-
Graphics Processing Unit
- HC:
-
Heavy Chain
- LC:
-
Light Chain
- LLM:
-
Large Language Models
- LSTM:
-
Long short-term memory
- mAbs:
-
Monoclonal Antibodies
- MET:
-
Mesenchymal-Epithelial Transition Factor
- ML:
-
Machine Learning
- NFAT:
-
Nuclear Factor of the Activated T cell
- NGS:
-
Next Generation Sequencing
- NK:
-
Natural Killer
- NSCLC:
-
Non-Small Cell Lung Cancer
- OAS:
-
Observed Antibody Space
- PD-1:
-
Programmed Cell Death Protein1
- PDB:
-
Protein Data Bank
- PGFS:
-
Policy Gradient for Forward Synthesis
- PSSM:
-
Position-Specific Scoring Matrix
- RL:
-
Reinforcement Learning
- RNNs:
-
Recurrent Neural Network
- ROC:
-
Receiver-Operating Characteristic
- SASA:
-
Solvent Accessible Surface Area
- scFv:
-
Single-Chain Variable Fragment
- SMILES:
-
Simplified Molecular Input Line Entry System
- SQL:
-
Structured Q-learning
- TL:
-
Transfer Learning
- VAE:
-
Variational Autoencoders
- VAMP:
-
Variable Allocation Markov Decision Process
- VEGF:
-
Vascular Endothelial Growth Factor
- VH:
-
Heavy Chain Variable
- VL:
-
Light Chain Variable
- XAI:
-
Explainable AI
References
Köhler G, Milstein C. Continuous cultures of fused cells secreting antibody of predefined specificity. Nature. 1975;256:495–7.
Carter PJ. Potent antibody therapeutics by design. Nat Rev Immunol. 2006;6:343–57.
Larson SM, Mariani G, Strauss HW. Tumor biology as a basis for molecular targeting in cancer. Clin Transl Imaging. 2013;1:397–406.
Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144:646–74.
Vinay DS, Ryan EP, Pawelec G, Talib WH, Stagg J, Elkord E, et al. Immune evasion in cancer: Mechanistic basis and therapeutic strategies. Semin Cancer Biol. 2015;35:S185–98.
Kallingal A, Olszewski M, Maciejewska N, Brankiewicz W, Baginski M. Cancer immune escape: the role of antigen presentation machinery. J Cancer Res Clin Oncol. 2023;149:8131–41.
Singh T, Bhattacharya M, Mavi AK, Gulati A, Rakesh, Sharma NK, et al. Immunogenicity of cancer cells: An overview. Cell Signal. 2024;113:110952.
Kerkar SP, Restifo NP. Cellular constituents of immune escape within the tumor microenvironment. Cancer Res. 2012;72:3125–30.
Zhang W, Wang H, Feng N, Li Y, Gu J, Wang Z. Developability assessment at early-stage discovery to enable development of antibody-derived therapeutics. 2023;6:13–29.
Bauer J, Rajagopal N, Gupta P, Gupta P, Nixon AE, Kumar S. How can we discover developable antibody-based biotherapeutics? Front Mol Biosci. 2023;10:1–21.
Khetan R, Curtis R, Deane CM, Hadsund JT, Kar U, Krawczyk K, et al. Current advances in biopharmaceutical informatics: guidelines, impact and challenges in the computational developability assessment of antibody therapeutics. MAbs. 2022;14:2020082.
Pérez AW, Sormanni P, Andersen JS, Sakhnini LI, Rodriguez-leon I, Bjelke JR, et al. In vitro and in silico assessment of the developability of a designed monoclonal antibody library. MAbs. 2019;11:388–400.
Kuroda D, Tsumoto K. Engineering Stability, Viscosity, and Immunogenicity of Antibodies by Computational Design. J Pharm Sci. 2020;109:1631–51.
Hait WN. Anticancer drug development: The grand challenges. Nat Rev Drug Discov. 2010;9:253–4.
Gajewski TF. Failure at the effector phase: Immune barriers at the level of the melanoma tumor microenvironment. Clin Cancer Res. 2007;13:5256–61.
Gatti-Mays ME, Balko JM, Gameiro SR, Bear HD, Prabhakaran S, Fukui J, et al. If we build it they will come: targeting the immune response to breast cancer. npj Breast Cancer. 2019;5:37.
Escors D. Tumour immunogenicity, antigen presentation and immunological barriers in cancer immunotherapy. New J Sci. 2014;2014:1–25.
Martinez VG, O’Neill S, Salimu J, Breslin S, Clayton A, Crown J, et al. Resistance to HER2-targeted anti-cancer drugs is associated with immune evasion in cancer cells and their derived extracellular vesicles. Oncoimmunology. 2017;6:e1362530.
Hummer AM, Abanades B, Deane CM. Advances in computational structure-based antibody design. Curr Opin Struct Biol. 2022;74:102379.
Sormanni P, Aprile FA, Vendruscolo M. Third generation antibody discovery methods: In silico rational design. Chem Soc Rev. 2018;47:9137–57.
Sevy AM, Meiler J. Antibodies: Computer-Aided Prediction of Structure and Design of Function. Microbiol Spectr. 2014;2:173–90.
Meng F, Zhou N, Hu G, Liu R, Zhang Y, Jing M, et al. A comprehensive overview of recent advances in generative models for antibodies. Comput Struct Biotechnol J. 2024;23:2648–60.
Gallo E. The rise of big data: deep sequencing-driven computational methods are transforming the landscape of synthetic antibody design. J Biomed Sci. 2024;31:1–19.
Ruffolo JA, Gray JJ. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Biophys J. 2022;121:155a–6a.
Zheng M, Zhao J, Cui C, Fu Z, Li X, Liu X, et al. Computational chemical biology and drug design: Facilitating protein structure, function, and modulation studies. Med Res Rev. 2018;38:914–50.
Norman RA, Ambrosetti F, Bonvin AMJJ, Colwell LJ, Kelm S, Kumar S, et al. Computational approaches to therapeutic antibody design: Established methods and emerging trends. Brief Bioinform. 2020;21:1549–67.
Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo L, et al. RCSB Protein Data Bank: Biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 2019;47:D464–74.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–9.
Lawrenz M, Shukla D, Pande VS. Cloud computing approaches for prediction of ligand binding poses and pathways. Sci Rep. 2015;5:1–5.
Joshi H, Lewis K, Singharoy A, Ortoleva PJ. Epitope engineering and molecular metrics of immunogenicity: A computational approach to VLP-based vaccine design. Vaccine. 2013;31:4841–7.
Robert PA, Akbar R, Frank R, Pavlović M, Widrich M, Snapkov I, et al. A billion synthetic 3D-antibody-antigen complexes enable unconstrained machine-learning formalized investigation of antibody specificity prediction. bioRxiv. 2021;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2021.07.06.451258.
Schneider G. Automating drug discovery. Nat Rev Drug Discov. 2018;17:97–113.
Ren F, Aliper A, Chen J, Zhao H, Rao S, Kuppe C, et al. A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models. Nat Biotechnol. 2025;43:63–75.
Wilman W, Wróbel S, Bielska W, Deszynski P, Dudzic P, Jaszczyszyn I, et al. Machine-designed biotherapeutics: Opportunities, feasibility and advantages of deep learning in computational antibody discovery. Brief Bioinform. 2022;23:1–20.
Kim J, McFee M, Fang Q, Abdin O, Kim PM. Computational and artificial intelligence-based methods for antibody development. Trends Pharmacol Sci. 2023;44:175–89.
Gangwal A, Lavecchia A. Unleashing the power of generative AI in drug discovery. Drug Discov Today. 2024;29: 103992.
Cheng J, Liang T, Xie XQ, Feng Z, Meng L. A new era of antibody discovery: an in-depth review of AI-driven approaches. Drug Discov Today. 2024;29: 103984.
Bai G, Sun C, Guo Z, Wang Y, Zeng X, Su Y, et al. Accelerating antibody discovery and design with artificial intelligence: Recent advances and prospects. Semin Cancer Biol. 2023;95:13–24.
Weiner LM, Murray JC, Shuptrine CW. Antibody-based immunotherapy of cancer. Cell. 2012;148:1081–4.
Jin S, Sun Y, Liang X, Gu X, Ning J, Xu Y, et al. Emerging new therapeutic antibody derivatives for cancer treatment. Signal Transduct Target Ther. 2022;7:39.
Ayyar BV, Arora S, O’Kennedy R. Coming-of-Age of Antibodies in Cancer Therapeutics. Trends Pharmacol Sci. 2016;37:1009–28.
Presta LG. Antibody engineering. Curr Opin Biotechnol. 1992;3:394–8.
Kontermann RE, Brinkmann U. Bispecific antibodies. Drug Discov Today. 2015;20:838–47.
Chothia C, Lesk AM. Canonical structures for the hypervariable regions of immunoglobulins. J Mol Biol. 1987;196:901–17.
Chothia C, Lesk AM, Tramontano A, Levitt M, Smith-Gill SJ, Air G, et al. Conformations of immunoglobulin hypervariable regions. Nature. 1989;342:877–83.
Kaplon H, Crescioli S, Chenoweth A, Visweswaraiah J, Reichert JM. Antibodies to watch in 2023. MAbs. 2023;15:1–42.
Lyu X, Zhao Q, Hui J, Wang T, Lin M, Wang K, et al. The global landscape of approved antibody therapies. Antib Ther. 2022;5:233–57.
Harrison AM, Thalji NM, Greenberg AJ, Tapia CJ, Windebank AJ. Rituximab for Non-Hodgkin’s Lymphoma: A story of rapid success in translation. Clin Transl Sci. 2014;7:82–6.
L. Y, K. H, R.A. B. Antibody-based therapy for solid tumors. Cancer J. 2008;14:178–83.
Mansh M. Ipilimumab and cancer immunotherapy: A new hope for advanced stage melanoma. Yale J Biol Med. 2011;84:381–9.
Fu Z, Li S, Han S, Shi C, Zhang Y. Antibody drug conjugate: the “biological missile” for targeted cancer therapy. Signal Transduct Target Ther. 2022;7:93.
Ducry L, Stump B. Antibody-drug conjugates: Linking cytotoxic payloads to monoclonal antibodies. Bioconjug Chem. 2010;21:5–13.
Upeslacis J. Antibody-Drug Conjugates for Cancer Therapy. Am J Pharm Educ. 1992;56:464–7.
Diamantis N, Banerji U. Antibody-drug conjugates - An emerging class of cancer treatment. Br J Cancer. 2016;114:362–7.
Lambert JM, Chari RVJ. Ado-trastuzumab Emtansine (T-DM1): an antibody-drug conjugate (ADC) for HER2-positive breast cancer. J Med Chem. 2014;57:6949–64.
Huang S, van Duijnhoven SMJ, Sijts AJAM, van Elsas A. Bispecific antibodies targeting dual tumor-associated antigens in cancer therapy. J Cancer Res Clin Oncol. 2020;146:3111–22.
Chon K, Larkins E, Chatterjee S, Mishra-Kalyani PS, Aungst S, Wearne E, et al. FDA Approval Summary: Amivantamab for the Treatment of Patients with Non-Small Cell Lung Cancer with EGFR Exon 20 Insertion Mutations. Clin Cancer Res. 2023;29:3262–6.
Weidle UH, Kontermann RE, Brinkmann U. Tumor-antigen-binding bispecific antibodies for cancer treatment. Semin Oncol. 2014;41:653–60.
Sun D, Shi X, Li S, Wang X, Yang X, Wan M. CAR-T cell therapy: A breakthrough in traditional cancer treatment strategies (Review). Mol Med Rep. 2024;29:1–9.
Sterner RC, Sterner RM. CAR-T cell therapy: current limitations and potential strategies. Blood Cancer J. 2021;11:69.
Asmamaw Dejenie T, Tiruneh G/Medhin M, Dessie Terefe G, Tadele Admasu F, Wale Tesega W, Chekol Abebe E. Current updates on generations, approvals, and clinical trials of CAR T-cell therapy. Hum Vaccin Immunother. 2022;18:2114254.
Simmons GL, Satta T, Castaneda PO. Clinical experience of CAR T cells for multiple myeloma. Best Pract Res Clin Haematol. 2021;34: 101306.
Munshi NC, Anderson LD, Shah N, Madduri D, Berdeja J, Lonial S, et al. Idecabtagene Vicleucel in Relapsed and Refractory Multiple Myeloma. N Engl J Med. 2021;384:705–16.
Riedell PA, Grady C, Nastoupil LJ, Luna A, Ahmed N, Maziarz RT, et al. Lisocabtagene Maraleucel in Relapsed/Refractory Large B-Cell Lymphoma: Real World Analysis from the Cell Therapy Consortium. Blood. 2023;142:617–617.
Madduri D, Berdeja JG, Usmani SZ, Jakubowiak A, Agha M, Cohen AD, et al. CARTITUDE-1: Phase 1b/2 Study of Ciltacabtagene Autoleucel, a B-Cell Maturation Antigen-Directed Chimeric Antigen Receptor T Cell Therapy, in Relapsed/Refractory Multiple Myeloma. Blood. 2020;136:22–5.
Mavi AK, Gaur S, Gaur G, Babita, Kumar N, Kumar U. CAR T-cell therapy: Reprogramming patient’s immune cell to treat cancer. Cell Signal. 2023;105:110638.
Subklewe M, Von Bergwelt-Baildon M, Humpe A. Chimeric Antigen Receptor T Cells: A Race to Revolutionize Cancer Therapy. Transfus Med Hemotherapy. 2019;46:15–24.
Laskowski TJ, Biederstädt A, Rezvani K. Natural killer cells in antitumour adoptive cell immunotherapy. Nat Rev Cancer. 2022;22:557–75.
Liu E, Marin D, Banerjee P, Macapinlac HA, Thompson P, Basar R, et al. Use of CAR-Transduced Natural Killer Cells in CD19-Positive Lymphoid Tumors. N Engl J Med. 2020;382:545–53.
Rezvani K, Rouce R, Liu E, Shpall E. Engineering Natural Killer Cells for Cancer Immunotherapy. Mol Ther. 2017;25:1769–81.
Berrien-Elliott MM, Wagner JA, Fehniger TA. Human Cytokine-Induced Memory-Like Natural Killer Cells. J Innate Immun. 2015;7:563–71.
Alturki NA. Review of the Immune Checkpoint Inhibitors in the Context of Cancer Treatment. J Clin Med. 2023;12:4301.
Bagchi S, Yuan R, Engleman EG. Immune Checkpoint Inhibitors for the Treatment of Cancer: Clinical Impact and Mechanisms of Response and Resistance. Annu Rev Pathol Mech Dis. 2021;16:223–49.
Shiravand Y, Khodadadi F, Kashani SMA, Hosseini-Fard SR, Hosseini S, Sadeghirad H, et al. Immune Checkpoint Inhibitors in Cancer Therapy. Curr Oncol. 2022;29:3044–60.
Marin-Acevedo JA, Kimbrough EMO, Lou Y. Next generation of immune checkpoint inhibitors and beyond. J Hematol Oncol. 2021;14:1–29.
Kourou K, Exarchos KP, Papaloukas C, Sakaloglou P, Exarchos T, Fotiadis DI. Applied machine learning in cancer research: A systematic review for patient diagnosis, classification and prognosis. Comput Struct Biotechnol J. 2021;19:5546–55.
Agrawal S, Agrawal J. Neural network techniques for cancer prediction: A survey. Procedia Comput Sci. 2015;60:769–74.
Zhang B, Shi H, Wang H. Machine Learning and AI in Cancer Prognosis, Prediction, and Treatment Selection: A Critical Approach. J Multidiscip Healthc. 2023;16:1779–91.
Bhinder B, Gilvary C, Madhukar NS, Elemento O. Artificial Intelligence in Cancer Research and Precision Medicine. Cancer Discov. 2021;11:900–15.
Pantazes RJ, Maranas CD. OptCDR: A general computational method for the design of antibody complementarity determining regions for targeted epitope binding. Protein Eng Des Sel. 2010;23:849–58.
Lapidoth GD, Baran D, Pszolla GM, Norn C, Alon A, Tyka MD, et al. AbDesign: An algorithm for combinatorial backbone design guided by natural conformations and sequences. Proteins Struct Funct Bioinforma. 2015;83:1385–406.
Adolf-Bryfogle J, Kalyuzhniy O, Kubitz M, Weitzner BD, Hu X, Adachi Y, et al. RosettaAntibodyDesign (RAbD): A general framework for computational antibody design. PLoS Comput Biol. 2018;14: e1006112.
Warszawski S, Katz AB, Lipsh R, Khmelnitsky L, Nissan G Ben, Javitt G, et al. Optimizing antibody affinity and stability by the automated design of the variable light-heavy chain interfaces. PLoS Comput Biol. 2019;15:1–24.
Shirai H, Prades C, Vita R, Marcatili P, Popovic B, Xu J, et al. Antibody informatics for drug discovery. Biochim Biophys Acta - Proteins Proteomics. 2014;1844:2002–15.
Zhao W, Luo X, Tong F, Zheng X, Li J, Zhao G, et al. Improving antibody optimization ability of generative adversarial network through large language model. Comput Struct Biotechnol J. 2023;21:5839–50.
Franceschelli G, Musolesi M. Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges. arXiv. 2023;1–26; https://doiorg.publicaciones.saludcastillayleon.es/10.1613/jair.1.15278.
Vogt Y, Naouar M, Kalweit M, Miething CC, Duyster J, Mertelsmann R, et al. Stable Online and Offline Reinforcement Learning for Antibody CDRH3 Design. arXiv Prepr arXiv. 2023;https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2401.05341.
Ruffolo JA, Sulam J, Gray JJ. Antibody structure prediction using interpretable deep learning. Patterns. 2022;3: 100406.
Rosace A, Bennett A, Oeller M, Mortensen MM, Sakhnini L, Lorenzen N, et al. Automated optimisation of solubility and conformational stability of antibodies and proteins. Nat Commun. 2023;14:1–15.
Waight AB, Prihoda D, Shrestha R, Metcalf K, Bailly M, Ancona M, et al. A machine learning strategy for the identification of key in silico descriptors and prediction models for IgG monoclonal antibody developability properties. MAbs. 2023;15:2248671.
Chen X, Dougherty T, Hong C, Schibler R, Zhao YC, Sadeghi R, et al. Predicting Antibody Developability from Sequence using Machine Learning. bioRxiv. 2020;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2020.06.18.159798.
Grinshpun B, Thorsteinson N, Pereira JNS, Rippmann F, Nannemann D, Sood VD, et al. Identifying biophysical assays and in silico properties that enrich for slow clearance in clinical-stage therapeutic antibodies. MAbs. 2021;13:1932230.
Thorsteinson N, Gunn JR, Kelly K, Long W, Labute P. Structure-based charge calculations for predicting isoelectric point, viscosity, clearance, and profiling antibody therapeutics. MAbs. 2021;13:1981805.
Marks C, Hummer AM, Chin M, Deane CM. Humanization of antibodies using a machine learning approach on large-scale repertoire data. Bioinformatics. 2021;37:4041–7.
Prihoda D, Maamary J, Waight A, Juan V, Fayadat-Dilman L, Svozil D, et al. BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning. MAbs. 2022;14:2020203.
Sormanni P, Aprile FA, Vendruscolo M. The CamSol method of rational design of protein mutants with enhanced solubility. J Mol Biol. 2015;427:478–90.
Arslan M, Karadağ D, Kalyoncu S. Protein engineering approaches for antibody fragments: Directed evolution and rational design approaches. Turkish J Biol. 2019;43:1–12.
Mason DM, Friedensohn S, Weber CR, Jordi C, Wagner B, Meng SM, et al. Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat Biomed Eng. 2021;5:600–12.
Hou Q, Kwasigroch JM, Rooman M, Pucci F. SOLart: A structure-based method to predict protein solubility and aggregation. Bioinformatics. 2020;36:1445–52.
Magnan CN, Randall A, Baldi P. SOLpro: Accurate sequence-based prediction of protein solubility. Bioinformatics. 2009;25:2200–7.
Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: An online force field. Nucleic Acids Res. 2005;33:382–8.
Rawi R, Mall R, Kunji K, Shen CH, Kwong PD, Chuang GY. PaRSnIP: Sequence-based protein solubility prediction using gradient boosting machine. Bioinformatics. 2018;34:1092–8.
Feng J, Jiang M, Shih J, Chai Q. Antibody apparent solubility prediction from sequence by transfer learning. iScience. 2022;25:105173.
Geng SB, Wittekind M, Vigil A, Tessier PM. Measurements of Monoclonal Antibody Self-Association Are Correlated with Complex Biophysical Properties. Mol Pharm. 2016;13:1636–45.
Shan L, Mody N, Sormani P, Rosenthal KL, Damschroder MM, Esfandiary R. Developability Assessment of Engineered Monoclonal Antibody Variants with a Complex Self-Association Behavior Using Complementary Analytical and in Silico Tools. Mol Pharm. 2018;15:5697–710.
Tomar DS, Kumar S, Singh SK, Goswami S, Li L. Molecular basis of high viscosity in concentrated antibody solutions: Strategies for high concentration drug product development. MAbs. 2016;8:216–28.
Goulet DR, Atkins WM. Considerations for the Design of Antibody-Based Therapeutics. J Pharm Sci. 2020;109:74–103.
Kuriata A, Iglesias V, Pujols J, Kurcinski M, Kmiecik S, Ventura S. Aggrescan3D (A3D) 2.0: Prediction and engineering of protein solubility. Nucleic Acids Res. 2019;47:W300–7.
Lai PK, Fernando A, Cloutier TK, Gokarn Y, Zhang J, Schwenger W, et al. Machine Learning Applied to Determine the Molecular Descriptors Responsible for the Viscosity Behavior of Concentrated Therapeutic Antibodies. Mol Pharm. 2021;18:1167–75.
Lai PK, Gallegos A, Mody N, Sathish HA, Trout BL. Machine learning prediction of antibody aggregation and viscosity for high concentration formulation development of protein therapeutics. MAbs. 2022;14:1–12.
Sela-Culang I, Ofran Y, Peters B. Antibody specific epitope prediction - Emergence of a new paradigm. Curr Opin Virol. 2015;11:98–102.
Khuat TT, Bassett R, Otte E, Grevis-James A, Gabrys B. Applications of machine learning in antibody discovery, process development, manufacturing and formulation: Current trends, challenges, and opportunities. Comput Chem Eng. 2024;182: 108585.
Bukhari SNH, Jain A, Haq E, Mehbodniya A, Webber J. Machine Learning Techniques for the Prediction of B-Cell and T-Cell Epitopes as Potential Vaccine Targets with a Specific Focus on SARS-CoV-2 Pathogen: A Review. Pathogens. 2022;11:1–18.
Sun P, Guo S, Sun J, Tan L, Lu C, Ma Z. Advances in In-silico B-cell Epitope Prediction. Curr Top Med Chem. 2018;19:105–15.
Potocnakova L, Bhide M, Pulzova LB. An Introduction to B-Cell Epitope Mapping and In Silico Epitope Prediction. J Immunol Res. 2016;2016:6760830.
Chen CW, Chang CY. Peptide scanning-assisted identification of a monoclonal antibody-recognized linear B-cell epitope. J Vis Exp. 2017;2017:1–8.
Keen MM, Keith AD, Ortlund EA. Epitope mapping via in vitro deep mutational scanning methods and its applications. J Biol Chem. 2025;301: 108072.
Larsen JEP, Lund O, Nielsen M. Improved method for predicting linear B-cell epitopes. Immunome Res. 2006;2:2.
Saha S, Raghava GPS. Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins Struct Funct Bioinforma. 2006;65:40–8.
Krawczyk K, Liu X, Baker T, Shi J, Deane CM. Improving B-cell epitope prediction and its application to global antibody-antigen docking. Bioinformatics. 2014;30:2288–94.
Høie MH, Gade FS, Johansen JM, Würtzen C, Winther O, Nielsen M, et al. DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations. Front Immunol. 2024;15:1–12.
Ponomarenko J, Bui HH, Li W, Fusseder N, Bourne PE, Sette A, et al. ElliPro: A new structure-based tool for the prediction of antibody epitopes. BMC Bioinformatics. 2008;9:1–8.
Bahai A, Asgari E, Mofrad MRK, Kloetgen A, McHardy AC. EpitopeVec: Linear epitope prediction using deep protein sequence embeddings. Bioinformatics. 2021;37:4517–25.
Dhanda SK, Mahajan S, Paul S, Yan Z, Kim H, Jespersen MC, et al. IEDB-AR: immune epitope database - analysis resource in 2019. Nucleic Acids Res. 2019;47:W502–6.
Karplus PA, Schulz GE. Prediction of chain flexibility in proteins - A tool for the selection of peptide antigens. Naturwissenschaften. 1985;72:212–3.
Parker JMR, Guo D, Hodges RS. New Hydrophilicity Scale Derived from High-Performance Liquid Chromatography Peptide Retention Data: Correlation of Predicted Surface Residues with Antigenicity and X-ray-Derived Accessible Sites. Biochemistry. 1986;25:5425–32.
Yang ZR, Johnson FC. Prediction of T-cell epitopes using biosupport vector machines. J Chem Inf Model. 2005;45:1424–8.
El-Manzalawy Y, Dobbs D, Honavar V. Predicting linear B-cell epitopes using string kernels. J Mol Recognit. 2008;21:243–55.
Liu R, Hu J. Prediction of discontinuous B-cell epitopes using logistic regression and structural information. J Proteomics Bioinforma. 2011;4:010–5.
Lian Y, Ge M, Pan XM. EPMLR: Sequence-based linear B-cell epitope prediction method using multiple linear regression. BMC Bioinformatics. 2014;15:1–6.
Hu Y-J, You S-NA, meta decision tree approach for B-cell epitope mining. IEEE Conf Comput Intell Bioinforma Comput Biol. IEEE. 2016;2016:1–5.
Kozlova EEG, Cerf L, Schneider FS, Viart BT, NGuyen C, Steiner BT, et al. Computational B-cell epitope identification and production of neutralizing murine antibodies against Atroxlysin-I. Sci Rep. 2018;8:1–13.
Nimrod G, Fischman S, Austin M, Herman A, Keyes F, Leiderman O, et al. Computational Design of Epitope-Specific Functional Antibodies. Cell Rep. 2018;25:2121-2131.e5.
Da Silva BM, Myung Y, Ascher DB, Pires DEV. Epitope3D: A machine learning method for conformational B-cell epitope prediction. Brief Bioinform. 2022;23:1–8.
Jensen KK, Andreatta M, Marcatili P, Buus S, Greenbaum JA, Yan Z, et al. Improved methods for predicting peptide binding affinity to MHC class II molecules. Immunology. 2018;154:394–406.
Andreatta M, Nielsen M. Gapped sequence alignment using artificial neural networks: Application to the MHC class i system. Bioinformatics. 2016;32:511–7.
Jurtz V, Paul S, Andreatta M, Marcatili P, Peters B, Nielsen M. NetMHCpan-4.0: Improved Peptide–MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol. 2017;199:3360–8.
Kang TH, Seong BL. Solubility, Stability, and Avidity of Recombinant Antibody Fragments Expressed in Microorganisms. Front Microbiol. 2020;11:1–10.
Kuan D, Farimani AB. AbGPT: De Novo Antibody Design via Generative Language Modeling. arXivv. 2024;1–29 ; https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2409.06090.
Li T, Pantazes RJ, Maranas CD. OptMAVEn--a new framework for the de novo design of antibody variable region models targeting specific antigen epitopes. Dübel S, editor. PLoS One. 2014;9:e105954.
Chowdhury R, Allan MF, Maranas CD. OptMAVEn-2.0: De novo design of variable antibody regions against targeted antigen epitopes. Antibodies. 2018;7:1–24.
Shanehsazzadeh A, McPartlon M, Kasun G, Steiger AK, Sutton JM, Yassine E, et al. Unlocking de novo antibody design with generative artificial intelligence. bioRxiv. 2024;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2023.01.08.523187.
Weitzner BD, Jeliazkov JR, Lyskov S, Marze N, Kuroda D, Frick R, et al. Modeling and docking of antibody structures with Rosetta. Nat Protoc. 2017;12:401–16.
Schoeder CT, Schmitz S, Adolf-Bryfogle J, Sevy AM, Finn JA, Sauer MF, et al. Modeling Immunity with Rosetta: Methods for Antibody and Antigen Design. Biochemistry. 2021;60:825–46.
Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500.
Jumper J, Hassabis D. Protein structure predictions to atomic accuracy with AlphaFold. Nat Methods 2022 191. 2022;19:11–2.
Abanades B, Georges G, Bujotzek A, Deane CM. ABlooper: Fast accurate antibody CDR loop structure prediction with accuracy estimation. Bioinformatics. 2022;38:1877–80.
Ruffolo JA, Guerra C, Mahajan SP, Sulam J, Gray JJ. Geometric potentials from deep learning improve prediction of CDR H3 loop structures. Bioinformatics. 2020;36:I268–75.
Akbar R, Bashour H, Rawat P, Robert PA, Smorodina E, Cotet T-S, et al. Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies. MAbs. 2022;14:2008790.
Sun XD, Huang RB. Prediction of protein structural classes using support vector machines. Amino Acids. 2006;30:469–75.
Kadam K, Peerzada N, Karbhal R, Sawant S, Valadi J, Kulkarni-Kale U. Antibody Class(es) Predictor for Epitopes (AbCPE): A Multi-Label Classification Algorithm. Front Bioinforma. 2021;1:1–13.
Sivasubramanian A, Sircar A, Chaudhury S, Gray JJ. Toward high-resolution homology modeling of antibody F v regions and application to antibody-antigen docking. Proteins Struct Funct Bioinforma. 2009;74:497–514.
Guarra F, Colombo G. Computational Methods in Immunology and Vaccinology: Design and Development of Antibodies and Immunogens. J Chem Theory Comput. 2023;19:5315–33.
Long X, Jeliazkov JR, Gray JJ. Non-H3 CDR template selection in antibody modeling through machine learning. PeerJ. 2019;2019:1–25.
Wong WK, Georges G, Ros F, Kelm S, Lewis AP, Taddese B, et al. SCALOP: Sequence-based antibody canonical loop structure annotation. Bioinformatics. 2019;35:1774–6.
Kelow SP, Adolf-Bryfogle J, Dunbrack RL. Hiding in plain sight: structure and sequence analysis reveals the importance of the antibody DE loop for antibody-antigen binding. MAbs. 2020;12:1840005.
Huang Y, Zhang Z, Zhou Y. AbAgIntPre: A deep learning method for predicting antibody-antigen interactions based on sequence information. Front Immunol. 2022;13:1–10.
Zhou X, Xue D, Chen R, Zheng Z, Wang L, Gu Q. Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization. arXiv. 2024;https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2403.16576.
Rini JM, Schulze-Gahmen U, Wilson IA. Structural evidence for induced fit as a mechanism for antibody-antigen recognition. Science. 1992;255:959–65.
Sundberg EJ, Mariuzza RA. Molecular recognition in antibody-antigen complexes. Adv Protein Chem. 2002;61:119–60.
Wark KL, Hudson PJ. Latest technologies for the enhancement of antibody affinity. Adv Drug Deliv Rev. 2006;58:657–70.
Clark LA, Boriack-Sjodin PA, Eldredge J, Fitch C, Friedman B, Hanf KJM, et al. Affinity enhancement of an in vivo matured therapeutic antibody using structure-based computational design. Protein Sci. 2006;15:949–60.
Kang Y, Leng D, Guo J, Pan L. Sequence-based deep learning antibody design for in silico antibody affinity maturation. arXiv. 2021;https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2103.03724.
Shi Y, Zhang X, Wan J, Wang Y, Yin W, Cao Z, et al. Predicting the distance between antibody’s interface residue and antigen to recognize antigen types by support vector machine. Neural Comput Appl. 2007;16:481–90.
Daberdaku S, Ferrari C. Antibody interface prediction with 3D Zernike descriptors and SVM. Bioinformatics. 2019;35:1870–6.
Egaji OA, Ballard-Smith S, Asghar I, Griffiths M. A Machine Learning Approach for Predicting Antibody Properties. ACM Int Conf Proceeding Ser. 2020;20–4.
Deng A, Zhang H, Wang W, Zhang J, Fan D, Chen P, et al. Developing Computational Model to Predict Protein-Protein Interaction Sites Based on the XGBoost Algorithm. Int J Mol Sci. 2020;21:2274.
Ye C, Hu W, Gaeta B. Machine learning prediction of Antibody-Antigen binding: dataset, method and testing. bioRxiv. 2021;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2021.03.19.435772.
Clark T, Subramanian V, Jayaraman A, Fitzpatrick E, Gopal R, Pentakota N, et al. Enhancing antibody affinity through experimental sampling of non-deleterious CDR mutations predicted by machine learning. Commun Chem. 2023;6:1–13.
Clark T, Subramanian V, Jayaraman A, Fitzpatrick E, Gopal R, Pentakota N, et al. Machine Learning-Guided Antibody Engineering That Leverages Domain Knowledge To Overcome The Small Data Problem. bioRxiv. 2023;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2023.06.02.543458.
Kandari D, Bhatnagar R. Antibody engineering and its therapeutic applications. Int Rev Immunol. 2023;42:156–83.
Kuhlman B, Bradley P. Advances in protein structure prediction and design. Nat Rev Mol Cell Biol. 2019;20:681–97.
Dean J. A Golden Decade of Deep Learning: Computing Systems & Applications. Daedalus. 2022;151:58–74.
Dara S, Dhamercherla S, Jadav SS, Babu CM, Ahsan MJ. Machine Learning in Drug Discovery: A Review. Artif Intell Rev. 2022;55:1947–99.
Taherdoost H, Madanchian M. AI Advancements: Comparison of Innovative Techniques. AI. 2023;5:38–54.
Lavecchia A. Deep learning in drug discovery: opportunities, challenges and future prospects. Drug Discov Today. 2019;24:2017–32.
Turzo SBA, Hantz ER, Lindert S. Applications of machine learning in computer-aided drug discovery. QRB Discov. 2022;3: e14.
Jing Y, Bian Y, Hu Z, Wang L, Xie X-QS. Deep Learning for Drug Design: an Artificial Intelligence Paradigm for Drug Discovery in the Big Data Era. AAPS J. 2018;20:58.
Ekins S. The Next Era: Deep Learning in Pharmaceutical Research. Pharm Res. 2016;33:2594–603.
Zeng X, Wang F, Luo Y, Kang S, Tang J, Lightstone FC, et al. Deep generative molecular design reshapes drug discovery. Cell Reports Med. 2022;3: 100794.
Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep Generative Models in De Novo Drug Molecule Generation. J Chem Inf Model. 2024;64:2174–94.
Zhavoronkov A, Ivanenkov YA, Aliper A, Veselov MS, Aladinskiy VA, Aladinskaya AV, et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol. 2019;37:1038–40.
Gottipati SK, Sattarov B, Niu S, Pathak Y, Wei H, Liu S, et al. Learning to navigate the synthetically accessible chemical space using reinforcement learning. 37th Int Conf Mach Learn ICML 2020. 2020;PartF16814:3626–37.
Jin W, Barzilay R, Jaakkola T. Multi-Objective Molecule Generation using Interpretable Substructures. 37th Int Conf Mach Learn ICML 2020. 2020;PartF16814:4799–809.
Insilico Medicine. First Generative AI Drug Begins Phase II Trials with Patients | Insilico Medicine. Insilico. 2023 [cited 2024 Nov 13]. Available from: https://insilico.com/blog/first_phase2
Ivanenkov YA, Polykovskiy D, Bezrukov D, Zagribelnyy B, Aladinskiy V, Kamya P, et al. Chemistry42: An AI-Driven Platform for Molecular Design and Optimization. J Chem Inf Model. 2023;63:695–701.
Li Y, Zhang L, Wang Y, Zou J, Yang R, Luo X, et al. Generative deep learning enables the discovery of a potent and selective RIPK1 inhibitor. Nat Commun. 2022;13:6891.
Zang C, Wang F. MoFlow: An Invertible Flow Model for Generating Molecular Graphs. Proc ACM SIGKDD Int Conf Knowl Discov Data Min. 2020;617–26.
Zhang S, Huo D, Horne RI, Qi Y, Ojeda SP, Yan A, et al. Sequence-based drug design using transformers. bioRxiv. 2023;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2023.11.27.568880.
Bran AM, Schwaller P. Transformers and Large Language Models for Chemistry and Drug Discovery. arXiv. 2023;1–22 ; https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2310.06083.
Omote Y, Matsushita K, Iwakura T, Tamura A, Ninomiya T. Transformer-based Approach for Predicting Chemical Compound Structures. Proc 1st Conf Asia-Pacific Chapter Assoc Comput Linguist 10th Int Jt Conf Nat Lang Process. 2020;154–62.
Hu Y, Tao F, Lan W, Zhang J. Combining transformer and 3DCNN models to achieve co-design of structures and sequences of antibodies in a diffusional manner. bioRxiv. 2024;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2024.04.25.587828.
Zhang H, Lyu X, Zhao Q, Liu B. Generation of novel antibody candidates using transformer and GAN-based deep learning artificial intelligence. Antib Ther. 2023;6:1–2.
Xu X, Xu T, Zhou J, Liao X, Zhang R, Wang Y, et al. AB-Gen: Antibody Library Design with Generative Pre-trained Transformer and Deep Reinforcement Learning. Genomics, Proteomics Bioinforma. 2023;21:1043–53.
Liberis E, Velickovic P, Sormanni P, Vendruscolo M, Lio P. Parapred: Antibody paratope prediction using convolutional and recurrent neural networks. Bioinformatics. 2018;34:2944–50.
Ambrosetti F, Olsen TH, Olimpieri PP, Jiménez-García B, Milanetti E, Marcatilli P, et al. ProABC-2: PRediction of AntiBody contacts v2 and its application to information-driven docking. Bioinformatics. 2020;36:5107–8.
Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, et al. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans Pattern Anal Mach Intell. 2022;44:7112–27.
Leem J, Mitchell LS, Farmery JHR, Barton J, Galson JD. Deciphering the language of antibodies using self-supervised learning. Patterns. 2022;3: 100513.
Meier J, Rao R, Verkuil R, Liu J, Sercu T, Rives A. Language models enable zero-shot prediction of the effects of mutations on protein function. Adv Neural Inf Process Syst. 2021;35:29287–303.
Sirin S, Apgar JR, Bennett EM, Keating AE. AB-Bind: Antibody binding mutational database for computational affinity predictions. Protein Sci. 2016;25:393–409.
Jemimah S, Yugandhar K, Michael GM. PROXiMATE: a database of mutant protein-protein complex thermodynamics and kinetics. Bioinformatics. 2017;33:2787–8.
Jankauskaite J, Jiménez-García B, Dapkunas J, Fernández-Recio J, Moal IH. SKEMPI 2.0: An updated benchmark of changes in protein-protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics. 2019;35:462–9.
Myung Y, Rodrigues CHM, Ascher DB, Pires DEV. MCSM-AB2: Guiding rational antibody design using graph-based signatures. Bioinformatics. 2020;36:1453–9.
Lim H, No KT. Prediction of polyreactive and nonspecific single-chain fragment variables through structural biochemical features and protein language-based descriptors. BMC Bioinformatics. 2022;23:1–19.
Li L, Gupta E, Spaeth J, Shing L, Jaimes R, Engelhart E, et al. Machine learning optimization of candidate antibody yields highly diverse sub-nanomolar affinity antibody libraries. Nat Commun. 2023;14:1–12.
Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, et al. ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Learning. Ieee Trans Pattern Anal Mach Intell. 2021;14:1–29.
Dai Z, Yang Z, Yang Y, Carbonell J, Le Q V., Salakhutdinov R. Transformer-XL: Attentive language models beyond a fixed-length context. ACL 2019 - 57th Annu Meet Assoc Comput Linguist Proc Conf. 2020;2978–88.
Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conf North Am Chapter Assoc Comput Linguist Hum Lang Technol - Proc Conf. 2019;1:4171–86.
Cowen-Rivers AI, Gorinski PJ, Sootla A, Khan A, Furui L, Wang J, et al. Structured Q-learning For Antibody Design. arXiv. 2022;https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2209.04698.
Gao K, Wu L, Zhu J, Peng T, Xia Y, He L, et al. Incorporating Pre-training Paradigm for Antibody Sequence-Structure Co-design. arXiv. 2022;1–18 ; https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2211.08406.
Melnyk I, Chenthamarakshan V, Chen PY, Das P, Dhurandhar A, Padhi I, et al. Reprogramming Pretrained Language Models for Antibody Sequence Infilling. Proc Mach Learn Res. 2023;202:24398–419.
Wang H, Hao X, He Y, Fan L. AbImmPred: An immunogenicity prediction method for therapeutic antibodies using AntiBERTy-based sequence features. PLoS ONE. 2024;19:1–15.
Liang S, Zhang C. Prediction of immunogenicity for humanized and full human therapeutic antibodies. PLoS ONE. 2020;15:1–14.
Ferruz N, Schmidt S, Höcker B. ProtGPT2 is a deep unsupervised language model for protein design. Nat Commun. 2022;13:4348.
Shuai RW, Ruffolo JA, Gray JJ. IgLM: Infilling language modeling for antibody sequence design. Cell Syst. 2023;14:979-989.e4.
Venderley J. AntiBARTy Diffusion for Property Guided Antibody Design. arXiv. 2023;1–7 ; https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2309.13129.
Hie BL, Shanker VR, Xu D, Bruun TUJ, Weidenbacher PA, Tang S, et al. Efficient evolution of human antibodies from general protein language models. Nat Biotechnol. 2024;42:275–83.
Kim DN, McNaughton AD, Kumar N. Leveraging Artificial Intelligence to Expedite Antibody Design and Enhance Antibody-Antigen Interactions. Bioengineering. 2024;11:185.
Graves J, Byerly J, Priego E, Makkapati N, Parish S, Medellin B, et al. A Review of Deep Learning Methods for Antibodies. Antibodies. 2020;9:12.
Corecco S, Adorni G, Gambardella LM. Proximal Policy Optimization-Based Reinforcement Learning and Hybrid Approaches to Explore the Cross Array Task Optimal Solution. Mach Learn Knowl Extr. 2023;5:1660–79.
Chen C, Zhang Q, Ma Q, Yu B. LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion. Chemom Intell Lab Syst. 2019;191:54–64.
Wallach I, Dzamba M, Heifets A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. arXiv. 2015;1–11 ; https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.1510.02855.
Lim YW, Adler AS, Johnson DS. Predicting antibody binders and generating synthetic antibodies using deep learning. MAbs. 2022;14:2069075.
Lai P-K. DeepSCM: An efficient convolutional neural network surrogate model for the screening of therapeutic antibody viscosity. Comput Struct Biotechnol J. 2022;20:2143–52.
Liu Z, Jin J, Cui Y, Xiong Z, Nasiri A, Zhao Y, et al. DeepSeqPanII: An Interpretable Recurrent Neural Network Model With Attention Mechanism for Peptide-HLA Class II Binding Prediction. IEEE/ACM Trans Comput Biol Bioinforma. 2022;19:2188–96.
Saka K, Kakuzaki T, Metsugi S, Kashiwagi D, Yoshida K, Wada M, et al. Antibody design using LSTM based deep generative model from phage display library for affinity maturation. Sci Rep. 2021;11:1–13.
St. Clair R, Teti M, Pavlovic M, Hahn W, Barenholtz E. Predicting residues involved in anti-DNA autoantibodies with limited neural networks. Med Biol Eng Comput. 2022;60:1279–93.
Syrlybaeva R, Strauch E-M. Deep learning of protein sequence design of protein-protein interactions. Bioinformatics. 2023;39:btac733.
Syrlybaeva R, Strauch E-M. One-sided design of protein-protein interaction motifs using deep learning. bioRxiv. 2022;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2022.03.30.486144.
Taylor JA, Rutilio H, Smith J, Citters D Van, Siska CC, Smidt P, et al. Designing Feature-Controlled Humanoid Antibody Discovery Libraries Using Generative Adversarial Networks. bioRxiv. 2020;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2020.04.12.024844.
Akbar R, Robert PA, Weber CR, Widrich M, Frank R, Pavlović M, et al. In silico proof of principle of machine learning-based antibody design at unconstrained scale. MAbs. 2022;14:2031482.
Friedensohn S, Neumeier D, Khan TA, Csepregi L, Parola C, Gorter de Vries AR, et al. Convergent selection in antibody repertoires is revealed by deep learning. bioRxiv. 2020;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2020.02.25.965673.
Davidsen K, Olson BJ, DeWitt WS, Feng J, Harkins E, Bradley P, et al. Deep generative models for T cell receptor protein sequences. Elife. 2019;8:1–18.
Eguchi RR, Choe CA, Huang PS. Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput Biol. 2022;18:1–18.
Ramon A, Saturnino A, Didi K, Greenig M, Sormanni P. AbNatiV: VQ-VAE-based assessment of antibody and nanobody nativeness for engineering, selection, and computational design. bioRxiv. 2023;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2023.04.28.538712.
Hawkins-Hooker A, Depardieu F, Baur S, Couairon G, Chen A, Bikard D. Generating functional protein variants with variational autoencoders. PLoS Comput Biol. 2021;17:1–23.
Shashkova TI, Umerenkov D, Salnikov M, Strashnov PV, Konstantinova AV, Lebed I, et al. SEMA: Antigen B-cell conformational epitope prediction using deep transfer learning. Front Immunol. 2022;13:1–11.
Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci. 2021;118: e2016239118.
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379:1123–30.
Nijkamp E, Ruffolo JA, Weinstein EN, Naik N, Madani A. ProGen2: Exploring the boundaries of protein language models. Cell Syst. 2023;14:968-978.e3.
Ruffolo JA, Gray JJ, Sulam J. Deciphering antibody affinity maturation with language models and weakly supervised learning. arxiv. 2021;https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2112.07782.
Hadsund JT, Satława T, Janusz B, Shan L, Zhou L, Röttger R, et al. nanoBERT: a deep learning model for gene agnostic navigation of the nanobody mutational space. Bioinforma Adv. 2024;4:vbae033.
Kenlay H, Dreyer FA, Kovaltsuk A, Miketa D, Pires D, Deane CM. Large scale paired antibody language models. arxiv. 2024;https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2403.17889.
McPartlon M, Xu J. Deep Learning for Flexible and Site-Specific Protein Docking and Design. bioRxiv. 2023;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2023.04.01.535079.
Shuai RW, Ruffolo JA, Gray JJ. Generative Language Modeling for Antibody Design. bioRxiv. 2021;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2021.12.13.472419.
Olsen TH, Moal IH, Deane CM. AbLang: an antibody language model for completing antibody sequences. Bioinforma Adv. 2022;2:1–6.
Kucera T, Togninalli M, Meng-Papaxanthos L. Conditional generative modeling for de novo protein design with hierarchical functions. Bioinformatics. 2022;38:3454–61.
Widatalla T, Rollins Z, Chen M-T, Waight A, Cheng AC. AbPROP: Language and graph deep learning for antibody property prediction. ICML Work Comput Biol. 2023;
Zhang Y, Huang W, Wei Z, Yuan Y, Ding Z. EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction. arxiv. 2023;https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2302.12177.
Jin W, Wohlwend J, Barzilay R, Jaakkola T. Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design. ICLR 2022 - 10th Int Conf Learn Represent. 2021;Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.211.
Zhang J, Du Y, Zhou P, Ding J, Xia S, Wang Q, et al. Predicting unseen antibodies’ neutralizability via adaptive graph neural networks. Nat Mach Intell. 2022;4:964–76.
Lu S, Li Y, Wang F, Nan X, Zhang S. Leveraging Sequential and Spatial Neighbors Information by Using CNNs Linked with GCNs for Paratope Prediction. IEEE/ACM Trans Comput Biol Bioinforma. 2022;19:68–74.
Shin JE, Riesselman AJ, Kollasch AW, McMahon C, Simon E, Sander C, et al. Protein design and variant prediction using autoregressive generative models. Nat Commun. 2021;12:1–11.
Melnyk I, Das P, Chenthamarakshan V, Lozano A. Benchmarking deep generative models for diverse antibody sequence design. arxiv. 2021;https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2111.06801.
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language models are few-shot learners -- special version. Conf Neural Inf Process Syst (NeurIPS 2020). 2020;1–25.
Anand N, Eguchi R, Mathews II, Perez CP, Derry A, Altman RB, et al. Protein sequence design with a learned potential. Nat Commun. 2022;13:1–11.
Xu B, Wang Y, Chen W, Shan S. AntibodyFlow: Normalizing Flow Model for Designing Antibody Complementarity-Determining Regions. arxiv. 2024;1–17, https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2406.13162.
Richardson P, Griffin I, Tucker C, Smith D, Oechsle O, Phelan A, et al. Baricitinib as potential treatment for 2019-nCoV acute respiratory disease. Lancet. 2020;395:e30–1.
Stebbing J, Phelan A, Griffin I, Tucker C, Oechsle O, Smith D, et al. COVID-19: combining antiviral and anti-inflammatory treatments. Lancet Infect Dis. 2020;20:400–2.
Burki T. A new paradigm for drug development. Lancet Digit Heal. 2020;2:e226–7.
Farghali H, Canová NK, Arora M. The Potential Applications of Artificial Intelligence in Drug Discovery and Development. Physiol Res. 2021;70:715–22.
Bagane M, Jorgewad DR. From AI Labs to Clinics: A Review of 21st-Century Drug Candidates Powered by Artificial Intelligence. Int J Res Appl Sci Eng Technol. 2024;12:1419–28.
Kong X, Huang W, Liu Y. Conditional Antibody Design as 3D Equivariant Graph Translation. arXiv. 2022;1–21 ; https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2208.06073.
Gopal R, Fitzpatrick E, Pentakota N, Jayaraman A, Tharakaraman K, Capila I. Optimizing Antibody Affinity and Developability Using a Framework–CDR Shuffling Approach—Application to an Anti-SARS-CoV-2 Antibody. Viruses. 2022;14:2694.
Khalilian S, Nasr Isfahani M, Moti Z, Baloochestani A, Chavosh A, Hemmatian Z. A Deep Dimensionality Reduction method based on Variational Autoencoder for Antibody Complementarity Determining Region Sequences Analysis. Epic Ser Comput. 2022. p. 116–105.
Fu T, Sun J. Antibody Complementarity Determining Regions (CDRs) design using Constrained Energy Model. Proc ACM SIGKDD Int Conf Knowl Discov Data Min. 2022;389–99.
Ofran Y, Schlessinger A, Rost B. Automated Identification of Complementarity Determining Regions (CDRs) Reveals Peculiar Characteristics of CDRs and B Cell Epitopes. J Immunol. 2008;181:6230–5.
Di Rienzo L, Milanetti E, Ruocco G, Lepore R. Quantitative Description of Surface Complementarity of Antibody-Antigen Interfaces. Front Mol Biosci. 2021;8:1–10.
Yousefi N, Yazdani-Jahromi M, Tayebi A, Kolanthai E, Neal CJ, Banerjee T, et al. BindingSite-AugmentedDTA: enabling a next-generation pipeline for interpretable prediction models in drug repurposing. Brief Bioinform. 2023;24:1–13.
Sun Y, Jiao Y, Shi C, Zhang Y. Deep learning-based molecular dynamics simulation for structure-based drug design against SARS-CoV-2. Comput Struct Biotechnol J. 2022;20:5014–27.
Amit I, Levin I, Wyant T, Levitin N, Barak R, Ben-Mayor M, et al. 704 The computationally designed human antibody, AU-007, mediates human immune activation by endogenous IL-2, while uniquely breaking the IL-2 auto-inhibitory loop and preventing Treg expansion. Regul Young Investig Award Abstr. BMJ Publishing Group Ltd; 2021. p. A732–4.
Rodriguez A, Meier C, Serazin E, Gooch J, Aggarwal P, Steene A, et al. Unlocking the potential of AI in drug discovery. Wellcome BCG. 2023;74pp. Available from: https://.wellcome.org/reports/unlocking-potential-ai-drug-discovery%0A https://www.bcg.com/publications/2023/unlocking-the-potential-of-ai-in-drug-discovery
Walkey C, Swanson R, Ulge U, Silva Manzano DA, Drachman J. 576 NL-201, a de novo IL-2 and IL-15 agonist, demonstrates enhanced in vivo antitumor activity in combination with multiple cancer immunotherapies. Regul young Investig Award Abstr. BMJ Publishing Group Ltd; 2020. p. A346.1-A346.
Absci. Absci Announces Collaboration with AstraZeneca to Advance AI-Driven Oncology Candidate. 2023 [cited 2024 Nov 28]. Available from: https://www.absci.com/absci-announces-collaboration-with-astrazeneca-to-advance-ai-driven-oncology-candidate/?utm_source=chatgpt.com
globenewswire. Nxera Pharma and Antiverse Enter Collaboration To Design. 2024 [cited 2024 Nov 28]. Available from: https://www.globenewswire.com/news-release/2024/11/05/2974601/0/en/Nxera-Pharma-and-Antiverse-Enter-Collaboration-To-Design-Novel-GPCR-Targeted-Antibody-Therapeutics-Using-Generative-AI.html?utm_source=chatgpt.com
prnewswire. AION Labs Launches AI Startup for De Novo Antibody Design. 2023 [cited 2024 Nov 28]. Available from: https://www.prnewswire.com/news-releases/aion-labs-launches-ai-startup-for-de-novo-antibody-design-301757775.html?utm_source=chatgpt.com
Stone CA, Spiller BW, Smith SA. Engineering therapeutic monoclonal antibodies. J Allergy Clin Immunol. 2024;153:539–48.
Honegger A. Engineering antibodies for stability and efficient folding. Handb Exp Pharmacol. 2008;181:47–68.
Wu H, Nie Y, Huse WD, Watkins JD. Humanization of a murine monoclonal antibody by simultaneous optimization of framework and CDR residues. J Mol Biol. 1999;294:151–62.
Kuroda D, Shirai H, Jacobson MP, Nakamura H. Computer-aided antibody design. Protein Eng Des Sel. 2012;25:507–21.
Lippow SM, Wittrup KD, Tidor B. Computational design of antibody-affinity improvement beyond in vivo maturation. Nat Biotechnol. 2007;25:1171–6.
A. Caravella J, Wang D, M. Glaser S, Lugovskoy A. Structure-Guided Design of Antibodies. Curr Comput Aided-Drug Des. 2010;6:128–38.
Chen Y, Wiesmann C, Fuh G, Li B, Christinger HW, McKay P, et al. Selection and analysis of an optimized Anti-VEGF antibody: Crystal structure of an affinity-matured Fab in complex with antigen. J Mol Biol. 1999;293:865–81.
Baran D, Pszolla MG, Lapidoth GD, Norn C, Dym O, Unger T, et al. Principles for computational design of binding antibodies. Proc Natl Acad Sci U S A. 2017;114:10900–5.
Barderas R, Desmet J, Timmerman P, Meloen R, Casal JI. Affinity maturation of antibodies assisted by in silico modeling. Proc Natl Acad Sci U S A. 2008;105:9029–34.
Clark LA, Boriack-Sjodin PA, Day E, Eldredge J, Fitch C, Jarpe M, et al. An antibody loop replacement design feasibility study and a loop-swapped dimer structure. Protein Eng Des Sel. 2009;22:93–101.
Raybould MIJ, Marks C, Krawczyk K, Taddese B, Nowak J, Lewis AP, et al. Five computational developability guidelines for therapeutic antibody profiling. Proc Natl Acad Sci U S A. 2019;116:4025–30.
Luo S, Su Y, Peng X, Wang S, Peng J, Ma J. Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures. Adv Neural Inf Process Syst. 2022;35:1–14.
Jeon W, Kim D. AbFlex: designing antibody complementarity determining regions with flexible CDR definition. Bioinformatics. 2024;40:btae122.
Martinkus K, Ludwiczak J, Cho K, Liang W-C, Lafrance-Vanasse J, Hotzel I, et al. AbDiffuser: Full-Atom Generation of in vitro Functioning Antibodies. arXiv. 2023;1–31 ; https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2308.05027.
Abanades B, Wong WK, Boyles F, Georges G, Bujotzek A, Deane CM. ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins. Commun Biol. 2023;6:1–8.
Akpinaroglu D, Ruffolo JA, Mahajan SP, Gray JJ. Simultaneous prediction of antibody backbone and side-chain conformations with deep learning. PLoS ONE. 2022;17:1–14.
Trippe BL, Yim J, Tischer D, Baker D, Broderick T, Barzilay R, et al. Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem. arXiv. 2022;https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2206.04119.
Cutting D, Dreyer FA, Errington D, Schneider C, Deane CM. De novo antibody design with SE(3) diffusion. 2024;1–20 ; https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2405.07622.
Liu G, Zeng H, Mueller J, Carter B, Wang Z, Schilz J, et al. Antibody complementarity determining region design using high-capacity machine learning. Bioinformatics. 2020;36:2126–33.
Schneider C, Buchanan A, Taddese B, Deane CM. DLAB: deep learning methods for structure-based virtual screening of antibodies. Bioinformatics. 2022;38:377–83.
Pittala S, Bailey-Kellogg C. Learning context-aware structural representations to predict antigen and antibody binding interfaces. Bioinformatics. 2020;36:3996–4003.
Chinery L, Wahome N, Moal I, Deane CM. Paragraph—antibody paratope prediction using graph neural networks with minimal feature vectors. Bioinformatics. 2023;39:2016–8.
Del Vecchio A, Deac A, Liò P, Veličković P. Neural message passing for joint paratope-epitope prediction. arXiv. 2021;https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2106.00757.
Chu AE, Cheng L, Nesr G El, Xu M, Huang P-S. An all-atom protein generative model. bioRxiv Prepr Serv Biol. 2023;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2023.05.24.542194.
Jin W, Barzilay R, Jaakkola T. Antibody-Antigen Docking and Design via Hierarchical Structure Refinement. Proc Mach Learn Res. 2022;162:10217–27.
Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, et al. Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. bioRxiv. 2022;https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2022.12.09.519842.
Cao Y, Das P, Chenthamarakshan V, Chen PY, Melnyk I, Shen Y. Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design. Proc Mach Learn Res. 2021;139:1261–71.
Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational protein engineering with sequence-based deep representation learning. Nat Methods. 2019;16:1315–22.
Zeng X, Bai G, Sun C, Ma B. Recent Progress in Antibody Epitope Prediction. Antibodies. 2023;12:1–14.
Ponomarenko JV, Bourne PE. Antibody-protein interactions: benchmark datasets and prediction tools evaluation. BMC Struct Biol. 2007;7:64.
Lin Y, AlQuraishi M. Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds. Proc Mach Learn Res. 2023;202:20978–1002.
Mandrup OA, Friis NA, Lykkemark S, Just J, Kristensen P. A Novel Heavy Domain Antibody Library with Functionally Optimized Complementarity Determining Regions. PLoS ONE. 2013;8:1–15.
Livne M, Miftahutdinov Z, Tutubalina E, Kuznetsov M, Polykovskiy D, Brundyn A, et al. nach0: multimodal natural and chemical languages foundation model. Chem Sci. 2024;15:8380–9.
Park S, Kono H, Wang W, Boder ET, Saven JG. Progress in the development and application of computational methods for probabilistic protein design. Comput Chem Eng. 2005;29:407–21.
Weinstein J, Khersonsky O, Fleishman SJ. Practically useful protein-design methods combining phylogenetic and atomistic calculations. Curr Opin Struct Biol. 2020;63:58–64.
Al-Lazikani B, Lesk AM, Chothia C. Canonical structures for the hypervariable regions of T cell alphabeta receptors. J Mol Biol. 2000;295:979–95.
Whitelegg NRJ, Rees AR. WAM: An improved algorithm for modelling antibodies on the WEB. Protein Eng. 2000;13:819–24.
Kringelum JV, Nielsen M, Padkjær SB, Lund O. Structural analysis of B-cell epitopes in antibody: Protein complexes. Mol Immunol. 2013;53:24–34.
Sela-Culang I, Benhnia MREI, Matho MH, Kaever T, Maybeno M, Schlossman A, et al. Using a combined computational-experimental approach to predict antibody-specific B cell epitopes. Structure. 2014;22:646–57.
Liu T, Liu Y, Wang Y, Hull M, Schultz PG, Wang F. Rational design of CXCR4 specific antibodies with elongated CDRs. J Am Chem Soc. 2014;136:10557–60.
Alford RF, Leaver-Fay A, Jeliazkov JR, O’Meara MJ, DiMaio FP, Park H, et al. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J Chem Theory Comput. 2017;13:3031–48.
Sohl-Dickstein J, Weiss EA, Maheswaranathan N, Ganguli S. Deep unsupervised learning using nonequilibrium thermodynamics. 32nd Int Conf Mach Learn ICML 2015. 2015;3:2246–55.
Ingraham J, Garg VK, Barzilay R, Jaakkola T. Generative models for graph-based protein design. Adv Neural Inf Process Syst. 2019;32.
Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Adv Neural Inf Process Syst. 2020;2020-Decem:1–12.
Song Y, Sohl-Dickstein J, Kingma DP, Kumar A, Ermon S, Poole B. Score-Based Generative Modeling Through Stochastic Differential Equations. ICLR 2021 - 9th Int Conf Learn Represent. 2021;1–36.
Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373:871–6.
Shi C, Luo S, Xu M, Tang J. Learning Gradient Fields for Molecular Conformation Generation. Proc Mach Learn Res. 2021;139:9558–68.
Hoogeboom E, Nielsen D, Jaini P, Forré P, Welling M. Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions. Adv Neural Inf Process Syst. 2021;15:12454–65.
Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D. Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci U S A. 2020;117:1496–503.
Austin J, Johnson DD, Ho J, Tarlow D, Van Den Berg R. Structured Denoising Diffusion Models in Discrete State-Spaces. Adv Neural Inf Process Syst. 2021;22:17981–93.
Anand N, Achim T. Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models. arXiv. 2022;1–18 ; https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2205.15019.
De Bortoli V, Mathieu E, Hutchinson M, Thornton J, Teh YW, Doucet A. Riemannian Score-Based Generative Modelling. Adv Neural Inf Process Syst. 2022;35:Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.220.
Song Y, Ermon S. Generative Modeling by Estimating Gradients of the Data Distribution. Adv Neural Inf Process Syst. 2019;32:Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.190.
Wu KE, Yang KK, van den Berg R, Alamdari S, Zou JY, Lu AX, et al. Protein structure generation via folding diffusion. Nat Commun. 2024;15:1–12.
Svilenov HL, Arosio P, Menzen T, Tessier P, Sormanni P. Approaches to expand the conventional toolbox for discovery and selection of antibodies with drug-like physicochemical properties. MAbs. 2023;15:2164459.
Khan MK, Raza M, Shahbaz M, Hussain I, Khan MF, Xie Z, et al. The recent advances in the approach of artificial intelligence (AI) towards drug discovery. Front Chem. 2024;12:1–10.
Niedziela-Majka A, Kan E, Weissburg P, Mehra U, Sellers S, Sakowicz R. High-throughput screening of formulations to optimize the thermal stability of a therapeutic monoclonal antibody. J Biomol Screen. 2015;20:552–9.
Chen Z, Wang X, Chen X, Huang J, Wang C, Wang J, et al. Accelerating therapeutic protein design with computational approaches toward the clinical stage. Comput Struct Biotechnol J. 2023;21:2909–26.
Qiu J, Qiu T, Huang Y, Cao Z. Identifying the Epitope Regions of Therapeutic Antibodies Based on Structure Descriptors. Int J Mol Sci. 2017;18:2457.
Nguyen TL, Nguyen TB, Kim H. Computational identification of B and T-cell epitopes for designing a multi-epitope vaccine against SARS-CoV-2 spike glycoprotein. J Struct Biol. 2025;217: 108177.
Parkinson J, Hard R, Wang W. The RESP AI model accelerates the identification of tight-binding antibodies. Nat Commun. 2023;14:454.
Raybould MIJ, Marks C, Lewis AP, Shi J, Bujotzek A, Taddese B, et al. Thera-SAbDab: The Therapeutic Structural Antibody Database. Nucleic Acids Res. 2020;48:D383–8.
Cao L, Coventry B, Goreshnik I, Huang B, Sheffler W, Park JS, et al. Design of protein-binding proteins from the target structure alone. Nature. 2022;605:551–60.
Manning MC, Chou DK, Murphy BM, Payne RW, Katayama DS. Stability of protein pharmaceuticals: An update. Pharm Res. 2010;27:544–75.
Mazurenko S, Prokop Z, Damborsky J. Machine Learning in Enzyme Engineering. ACS Catal. 2020;10:1210–23.
Makowski EK, Kinnunen PC, Huang J, Wu L, Smith MD, Wang T, et al. Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space. Nat Commun. 2022;13:3788.
Libouban P-Y, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci. 2023;24:16120.
Yang J, Shen C, Huang N. Predicting or Pretending: Artificial Intelligence for Protein-Ligand Interactions Lack of Sufficiently Large and Unbiased Datasets. Front Pharmacol. 2020;11:1–9.
Wu F, Li SZ. A Hierarchical Training Paradigm for Antibody Structure-sequence Co-design. Adv Neural Inf Process Syst. 2023;36:Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.231.
Lambiotte R, Rosvall M, Scholtes I. From networks to optimal higher-order models of complex systems. Nat Phys 2019 154. 2019;15:313–20.
Mangan NM, Brunton SL, Proctor JL, Kutz JN. Inferring Biological Networks by Sparse Identification of Nonlinear Dynamics. IEEE Trans Mol Biol Multi-Scale Commun. 2016;2:52–63.
Madsen AV, Mejias-Gomez O, Pedersen LE, Preben Morth J, Kristensen P, Jenkins TP, et al. Structural trends in antibody-antigen binding interfaces: a computational analysis of 1833 experimentally determined 3D structures. Comput Struct Biotechnol J. 2024;23:199–211.
Segall D, Multi-Parameter M. Optimization: Identifying High Quality Compounds with a Balance of Properties. Curr Drug Metab. 2012;18:1292–310.
Nicolaou CA, Brown N. Multi-objective optimization methods in drug design. Drug Discov Today Technol. 2013;10:e427–35.
Notin P, Rollins N, Gal Y, Sander C, Marks D. Machine learning for functional protein design. Nat Biotechnol. 2024;42:216–28.
Vázquez J, López M, Gibert E, Herrero E, Luque FJ. Merging Ligand-Based and Structure-Based Methods in Drug Discovery: An Overview of Combined Virtual Screening Approaches. Molecules. 2020;25:4723.
Hoogeboom E, Satorras VG, Vignac C, Welling M. Equivariant Diffusion for Molecule Generation in 3D. Proc Mach Learn Res. 2022;162:https://doiorg.publicaciones.saludcastillayleon.es/10.48550/arXiv.2203.17003.
Holzinger A, Keiblinger K, Holub P, Zatloukal K, Müller H. AI for life: Trends in artificial intelligence for biotechnology. N Biotechnol. 2023;74:16–24.
Acknowledgements
We gratefully acknowledge the valuable suggestions provided by the reviewers and the fruitful scientific discussion and suggestions provided by Dr. Y. S. Prabhakar, Chief Scientist, CSIR-Central Drug Research Institute (India).
Funding
This work was supported by Hallym University Research Fund and “Research Program for Agricultural Science & Technology Development (Project No. RS-2022-RD010366)”, National Institute of Agricultural Sciences, Rural Development Administration, Republic of Korea.
Author information
Authors and Affiliations
Contributions
V.D.: Conceptualization, Investigation, Data Curation, Writing – Original Draft, Writing – Review & Editing. V.K.M.: Validation, Supervision, Data Curation, Writing – Review & Editing. Y.H.K. (Yoo Hee Kim): Data Curation, Writing – Review & Editing S.T.P.: Funding, Supervision, Data Curation, Writing – Review & Editing. H.S.K.: Funding, Conceptualization, Supervision, Writing – Review & Editing. Y.H.K.(Young Ho Koh): Funding, Conceptualization, Supervision, Writing – Review & Editing. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Dewaker, V., Morya, V.K., Kim, Y.H. et al. Revolutionizing oncology: the role of Artificial Intelligence (AI) as an antibody design, and optimization tools. Biomark Res 13, 52 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40364-025-00764-4
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40364-025-00764-4