Share this post on:

keeping the most meaningful information to find novel inhibitors during a docking search. The reduced ensemble is called a RFFR model. 20 / 25 An Approach for Clustering MD Trajectory Using Cavity-Based Features Discussion We have presented a strategy to generate ensembles of representative MD conformations that are more sensitive to changes in the substrate-binding cavity properties than the widely used RMSD approaches. This strategy uses two partitioning clustering methods and four agglomerative order TG-02 hierarchical clustering methods. We use them in order to compare and analyze the quality of partitioning outcomes between the binding cavity data set that we are proposing and two different data sets composed by pairwise RMSD distances. To provide the PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/1974940 optimal ensemble of representative MD conformations, we obtained the FEB values from docking experiments with 20 known inhibitors of the InhA enzyme by which we identified optimal partitions by statistical assessments and calculated their percentage of similarity with the original MD trajectory. The results for hierarchical algorithms highlighted their main advantages, i.e. they are more versatile and embed flexibility regarding seeking a proper level of granularity. Comparing the performance from clustering methods, Fig 5 shows that UPGMA, WPGMA and Complete are good methods for clustering kinds of data sets similar to Cavity Attributes and Cavity RMSD while Ward’s methods can be considered a good solution for all data set. However, the ability of Ward’s in grouping objects that are as homogeneous as possible ended in partitions with central tendencies considerably far from that found in the MD’s full trajectory. Further, the high cohesion in the clusters generated from UPGMA and WPGMA methods were unable to reach low SQD values and number of clusters. Complete method looks for maximum distance to merge a new object in a cluster and therefore it becomes more susceptible to noise and outliers. Remember that the first 500 conformations from the MD trajectory were eliminated as they constitute the equilibration phase. For this reason Complete method shows the lowest SQD values and number of clusters for the Cavity Attributes. Hence, we conclude that due to the farthest neighbor method the representative ensemble of MD conformations is composed by medoids belonging to compact clusters of approximately equal diameters. The complexity of clustering algorithms is strongly related to the number n of data objects and the number k of clusters. From all experiments, CLARA was the algorithm that required the longest execution time, considering an experiment when the number of PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/19748594 partitions starts from 2 until 200. The time noticeably increased since the size of the sample grows proportionally to the number of clusters. It happens on the account of k-medoids is more robust in the presence of noise and outliers. The complexity to compute and select a new medoid from representative objects by PAM algorithm is O2). Algorithms from hierarchical agglomerative methods are in second position. They are expensive in terms of their computational and storage requirements. Agglomerative methods compute the proximity matrix that needs O time to store and keep track of the clusters. The total time required for these algorithms is O where logn is the additional complexity of keeping data in a sorted list. In contrast to the hierarchical algorithms that have the quadratic asymptotic running time with respect to the number of o

Share this post on:

Author: heme -oxygenase