Title: Geometric Invariant based framework for analysis of protein conformational space

Speaker: Ashish V Tendulkar

Designation: Research Scholar

Affiliation: K. R. School of IT, IIT Bombay

Date: February 26, 2008

Abstract: Proteins are versatile macromolecules in the living organisms, which are involved in many processes such as catalysis, metabolism, signaling, regulation etc. The proteins are made up of amino acids and are present in form of a three-dimensional (3D) structure. The 3D structure is determined by the amino acid content of the protein. Understanding the principles of the sequence-structure relationship has been the topic of considerable interest to the community of computational biologists. This talk focuses on visualization of the protein conformation space.

Characterization of the restricted nature of the protein local conformational space has remained a challenge, thereby necessitating a computationally expensive conformational search in protein modeling. Moreover, due to the lack of unilateral structural descriptors, conventional data-mining techniques such as clustering and classification have not been applied in protein structure analysis. We first map the local conformations in a fixed dimensional space by using a carefully selected suite of geometric invariants (GI's) and then reduce the number of dimensions via principal component analysis (PCA). Distribution of the conformations in the space spanned by the first four PC's is visualized as a set of conditional bi-variate probability distribution plots, where the peaks correspond to the preferred conformations. The locations of the different canonical structures in the PC-space have been interpreted in the context of the weights of the GI's to the first four PC's. Clustering of the available conformations reveals that the number of preferred local conformations is several orders of magnitude smaller than that suggested previously. Further, we utilize these classes to predict local conformation structures of the protein solely based on its sequence.