A Two-phase Docking Protocol
Once we had selected a representative conformation from the NMR ensemble, we carried out a two-stage docking simulation. The first stage was a preliminary search for potential binders to ERCC1 using the full set of the CN library. This search resulted in a wide spectrum of the binding energy values ranging from 211 kcal/mol to 20 kcal/mol. Based on our experience and on similar studies in the literature, we decided to truncate the hit list at 25 kcal/mol. Taking the population of the largest cluster to be greater than 25%, this energy cutoff resulted in a set of 2,000 hits ranked according to AutoDock scoring function. The second stage was a more rigorous docking approach that employed the RCS methodology [21]. In the RCS approach, allatom MD simulations (e.g., 2? ns simulation) are applied toexplore the conformational space of the target, while docking is subsequently used for the fast screening of drug libraries against an ensemble of receptor conformations. This ensemble is extracted at predetermined time intervals (e.g., 10 ps) from the simulation, resulting in hundreds of thousands of protein conformations. Each conformation is then used as a target for an independent docking experiment. The RCS methodology has been successfully applied to a number of cases. An excellent example is that of an HIV inhibitor, raltegravir which became the first FDA approved drug targeting HIV integrase [22], [23].
Other successful examples include the identification of novel inhibitors of the acetylcholine binding protein [24], RNA-editing ligase 1 [25], the influenza protein neuraminidase [26] and Trypanosoma brucei uridine diphosphate galactose 49-epimerase [27]. These applications employed alternative ways to solve two main problems with the method, namely, reducing the number of extracted target conformations and deciding on how to select the final set of hits after carrying out the screening process. For the first problem, a number of studies suggested extracting the structures at larger intervals of the MD simulation (e.g. every 5 ns or so), [24] condensing the structural ensemble generated from MD simulations using QR factorization, [25] or clustering the MD trajectory using root-mean-square-deviation (RMSD) conformational clustering, [26], [27] On the other hand, to rank the screened compounds and suggest a final set of top hits, some studies used only docking predictions, [24], [25], [26]while others suggested (as in this thesis) using a more accurate scoring method (e.g. MM/ PBSA (Molecular Mechanics/Poisson Boltzmann Surface Area)) to refine the final selected hits. [21] All of these approaches, similar to the work presented here, were aiming at keeping the balance between significantly reducing the number of target structures and, in the meantime, retaining their capacity to describe the conformational space of the target. To partially introduce receptor flexibility within the docking, the top 2,000 hits from the initial screening were re-docked against the remaining 9 NMR conformations. As expected, this produced a new ranking for the 2,000 hits. At this stage, autodock-scoring function and an adaptive clustering method (see methodology) were used to suggest a preliminary ranking of the 2,000 compounds. After that, visual inspection combined with this scoring method reduced the 2,000 hits to only 200 molecules that have acceptable population size (see below). We noticed that most of them are properly fitted within the ERCC1 pocket. The binding energies of the successfully docked structures (, 170 hits) ranged from 212 kcal/mol to 27 kcal/mol. It is worth mentioning that the binding site of ERCC1 has limited flexibility. Based on our previous investigations, [15], the important residues that mostly contribute to its interaction with ligands are Gly109, Pro111, Asn110, Asp 129, Phe140, Tyr145, and Arg156 (Figure 2). However, most of the binding energy values obtained from the two docking stages were not statistically significant. The separation between the energies was not able to select hits for experimental testing based on docking results. Therefore, we decided to perform MD simulations on the top 170 RCS hits starting from their minimal energy conformations within the ERCC1 binding site.
Clustering of Docked Conformations and Extraction of Binding Modes
Docking simulations produce massive numbers of possible solutions. Each proposed solution represents a potential binding mode for the tested ligand within the targeted site. Mining these data sets and pulling out the most probable solution for each compound is tricky and requires careful treatment. Figure 1. Selection of an initial ERCC1 target. The root mean square deviation (RMSD) of 9 ERCC1 NMR structures relative to an arbitrary NMR conformation. The centroid of the 9 structures (highlighted in red) was selected as the initial target structure against the full set of compounds included in the CN library.clustering metrics (see Materials and Methods). This adaptive approach was tested on other targets and led to successfuloutcomes [28,29,30]. For MD simulations, starting from the optimal binding mode is the most efficient route to reach equilibrium. Therefore, by running the clustering protocol on each ligand and filtering the hits in terms of the population of the largest cluster (see Materials and Methods), we were able to prepare a set of 170 distinct hits ranked by their binding energies. The selected hits were subjected to all-atoms, explicit solvent MD simulations.
MD Simulations on Promising Hits
MD simulations introduced target flexibility to the molecular recognition problem. It allowed all protein side chains to move, rotate and interact with the different parts of the ligands. The conclusion reached after running MD simulations on the complexes was decisive and provided answers to many relevant inquiries, in particular: “Was the binding mode stable and realistic? How did the ligand stability evolve in time? What were the major interactions that made this ligand bind? Were there any water-mediated interactions involved?”. Approximately half of the docking-predicted hits were stable within the binding site. They had proper interactions with various regions of the target. They also formed hydrogen bonds directly with the protein side chains or indirectly through water molecules. As an example, Figure 3 shows the RMSD and atomic fluctuations of two selected hits; NERI01 (compound 12 in Figure 4 and Table 1, also known as AB-00026258) and a similar lead structure (compound 2 in Figure 4 and Table 1). The average RMSD for ?the two compounds was around 6 A, which is consistent with values obtained in similar studies [31]. The RMSD for NERI01 (Figure 3-A) was more fluctuating than that of the other compound (Figure 3-B), indicating higher flexibility. This was evident in the atomic fluctuatation analysis. Many parts of NERI01 are flexible (Figure 3-C) including the three nitro groups and the single rotatable bond in the middle of its structure.