EGFR Binder Cancer Drug Design
EGFR Binder
Designed Cancer Drug Candidate to Inhibit Protein EGFR
August - Dec 2024
Rosetta, PyMol, AlphaFold, Protein Design, Drug Design, Biochemistry, Biophysics
New York (NY)

ROUND 1
Allon designed with Rosetta/RosettaScripts a 4-helix helical bundle that is designed to bind to Epidermal Growth Factor Receptor's (EGFR, a protein implicated in several cancers such as epithelial and lung cancers when overactive) Domain III. A schematic of the protein and its 'normal' binding to Epidermal Growth Factor (EGF) can be seen below. This was undertaken as part of a public call/protein design competition from AdaptyvBio, in which the public competed to see who can design the best binders to the extracellular domains of EGFR as screened computationally, then validated experimentally.

Allon designed the binding interface so as to overlap multiple known antibody and EGF epitopes, including EGFR's D355 and the Domain III central hydrophobic patch, both key to EGF binding. The binding interface on Domain III was selected to promote EGFR’s open (inactive), pre-dimerization configuration, which is also the same domain some antibodies target that treat the very treatment-intractable exons 18-21 mutations in some lung cancers.

The design is completely De Novo (novel) and took into consideration the hydrophobicity of the core/design, the charges of the individual helices, the overall bundle, and the target interface, as well as the amino acid composition of the designed binder to promote helical folding and loop rigidification. Several hundreds of thousands of designs were considered total. Designs submitted to the competition were additionally structure-validated with AlphaFold3 and AlphaFold2.


SEE ROUND 2 DESC. BELOW FOLLOWING FIGURES
Stylized Render of Drug and Active Site of Target

Domain Schematic of EGFR Protein Complex (PDB 6ARU) inspired by
[1] Kathryn M Ferguson, Mitchell B Berger, Jeannine M Mendrola, Hyun-Soo Cho, Daniel J Leahy, Mark A Lemmon. EGF Activates Its Receptor by Removing Interactions that Autoinhibit Ectodomain Dimerization. Molecular Cell. 2003;11(2). 507-17. doi: 10.1016/S1097-2765(03)00047-9.
VIDEO: Molecular Dynamics (MD) Simulation Spanning 1 ns of EGFR Inhibited by Antibody Treatment Cetuximab (PDB: 6ARU) made with Amber
Schematic of the Full Binding Process of EGF to EGFR that the Designed Drug/Binder is Aiming to Disrupt. Dimerized Structure on Right is from PDB 8HGS. Figure inspired by [1].

Top Down View of Domain III of EGFR from PDB 6ARU showing the Properties of the Target Interface
Top Down View of Domain III of EGFR from PDB 6ARU showing the Binding Epitopes of Various Binders Including Exisiting Antibody Cancer Treatments, Native EGF, and a Designed Binder from this Project
Inspired and Informed by [2] Voigt, M., Braig, F., Göthel, M., Schulte, A., Lamszus, K., Bokemeyer, C., & Binder, M. (2012). Functional dissection of the epidermal growth factor receptor epitopes targeted by panitumumab and cetuximab. Neoplasia (New York, N.Y.), 14(11), 1023–1031. https://doi.org/10.1593/neo.121242
VIDEO: Zoomed View of Candidate Design (Colored) Docked to EGFR Domain III (White)
[hydrophobic atoms colored orange, binding interface light blue, hydrophyllic atoms cyan, positive amino acids blue, negative amino acids red, cystines in yellow, EGFR's D355 in magenta]

Multiple Sequence Alignment of the 10 Submitted De Novo Sequences made with AlignmentViewer.org. The Top 5 Designs were Selected by AdaptyvBio for their Lab "Affinity Characterization" Binding Assay Tests as marked with 't' on the Left Column. The AdaptyvBio-Reported AlphaFold 2 iPAE Score, an (Albeit Imperfect) Computational Binding Affinity Estimation Metric (Low = Tighter), is detailed on the Right Column.



Schematic of Overall Rosetta Design and Design Validation Protocol as well as Computational Resource Utilization


Designed vs. predicted structures overlayed
[Rosetta-designed candidate in cyan, AlphaFold3 prediction (RMSD 2.043) in pink, and Alphafold2 prediction (RMSD 1.927) in green]
VIDEO: Animation of Designed vs. predicted structures overlayed
[Rosetta-designed candidate in cyan, AlphaFold3 prediction (RMSD 2.043) in pink, and Alphafold2 prediction (RMSD 1.927) in green]



ROUND 2

Allon revised his designs and methods from the first round and introduced new design goals through analyzing the data published on the round 1 submissions. None of his 5 designs selected for testing in Round 1 were able to be expressed; designing for expression was a key design goal in Round 2.

Round 2 Protocol Flow Chart for the 4-Helix Designs

Allon noticed a large hydrophobic patch towards the beginning of his submitted sequences and additionally large numbers of Q residues in the sequences, both of which may have increased the difficulty of them to be expressed (See MSA of Round 1 submitted sequences above). Allon used Rosetta to redesign his 10 submitted sequences first by redesigning the surface to introduce a constraint on the Q composition percentage and then redesigning the interface where the hydrophobic patch was with revised constraints (and without the `buried_unsatisifed_penalty` on anymore as it was discovered to be interfering too strongly with the implementation of the composition constraints). This yielded designs that were more well rounded, diverse, likely less improbable, hopefully more expressible. 100 of such designs were then chosen to be run on the full AlphaFold2_Multimer_v3 model on the Flatiron Institute cluster to evaluate more accurate metrics of iPAE/ipTM, and see the approximate binding site. The AF2 results were then sifted through and whittled down to a handful where the model predicted an interface with the precise docking conformation that the 4-helical bundles were designed to bind in. Then, to improve the metrics Adaptyv Bio used to rank Round 2 submissions further and increase the confidence in the predicted fold of the binder, these were run through Simon Dürr’s ProteinMPNN Gradio Webapp with the interface held fixed dozens of times to yield sequences that had better iPAE, ipTM, and ESM2 Log-Likelihood.

Graph of AlphaFold2-Calculated PLDDT vs iPAE from All Submited Designs in Round 1, Colored by Expression Rate, with the 7 Identified Novel Binders Denoted According to the Legend

Ridge Plot of iPAE Score Landscapes at each Identified Expression Level



Expression Rates as Fractions per-Bin of iPAE
The Round 1 data Allon utilized to identify any design-relevant trends showed that iPAE didn’t necessarily correlate to binding affinity (given there were only 200 data points though) but for Adaptyv Bio’s cell-free pipeline, there was a threshold of 18 iPAE, below which had a very high expression rate, and 25, above which there was a significant rate of expression failure. Another threshold noticed was sequences below amino-acid-length 100 had a much higher medium- and high-reported expression rate. ‘Worse’ performance in the Adaptyv Bio Round 2 metrics (for the Round 1 data) also correlated with sequence length, especially ESM2 Log-Likelihood. This prompted Allon to then design a new 3-helix 55-mer with a slightly different method than the first larger 4-helical bundles.

Round 2 Protocol Flow Chart for the 3-Helix Designs
Render of the 3-Helix Bundle
The 3-helix bundles were designed in less than a week with the following method: First Allon used PyMol to chop up alpha helices from one of the 4-helix bundles and then stitched 3 of the helices together and positionally docked them about EGFR’s Domain III near the central hydrophobic patch and D355 both key for EGF binding (a similar target site as the 4-helix bundles as informed by papers and known antibody cancer treatment epitopes). The geometry was then run through the ProteinMPNN webapp to yield a plausible sequence to fold into the desired geometry at sub 1.00 Å RMSD. That model and sequence was the starting point for a Rosetta protocol to design first the core (hydrophobically) and the loops (with prolines for low entropy, high rigidity). The resulting designs were then randomly ‘docked’ about the target site with a small translation and rotation space, where the interface was then designed with the Rosetta score function optimizing resulting energy. These designs were then evaluated rudimentarily with AlphaFold2 Multimer (through Andrew White’s ‘mimimalaf’-Modal implementation). 4 low-iPAE designs were then refined in a similar manner as the 4-helix designs with the ProteinMPNN web app, holding the interface fixed.
ProteinMPNN often successfully dropped the iPAE (especially for the larger 4-helix designs), improved the ipTM, and brought the ESM2-LL closer to zero.

It is important to note design ‘4_44_14_refined_IDEAL’ had a particularly low iPAE in the region of the design interacing with EGFR’s Domain III (and much lower than that of Domain I), which is exactly what the PAE map of Round 1 winner Martin Pacesa’s ‘martin.pacesa-EGFR_l138_s90285_mpnn2’ PAE map looked like (which was made with BindCraft)—this design is of key interest.

All sequences remain entirely De Novo.
PAE map of Design, '4_44_14_refined_IDEAL'









EGFR Binder Cancer Drug
August - Dec 2024
New York (NY)