Naeem Aslam

Protein molecules are important nutrients which play key role in growth of a human body. They are the building blocks of body tissue, and are also rich source of fuel. The structure of a protein plays a vital role in determining its function. Protein Data Bank (PDB) is most popular repository of three dimensional structures of protein molecules. Protein structures are used in large number of experiments such as virtual screening, designing site-directed mutagenesis experiments, rationalising the effects of sequence variations or structure-based discovery of specific inhibitors.

Protein structure alignment is an approach in the domain of structural biology that aligns protein residues and determines 3D superimposition of protein structures. Structure alignment is very useful for comparing proteins that have low sequence similarity. It is, thus, very useful tool for establishing evolutionary relationships between the protein molecules that have very less sequence similarity. Protein structural alignment can be implied both for pairwise comparison as well as for comparing multiple proteins. Similarly to sequence alignment, structural alignment reveals equivalences between residues of two proteins. The true history of structural alignment begins from 1960 when Perutz et al. (1960) used the approach of structural alignment and described that structures of myoglobin and hemoglobin are similar in spite of the fact that their sequences differ. Since then, structural biologists are more, interested in structural similarity to detect the unknown function of a Protein. Structural similarity is conserved more than sequence similarity; therefore, it can be used to trace the evolutionary history. Systematic structural alignment started ill the decade of 1970.

Structural alignment is conducted among the known structures and in contrast to sequence alignment where the distance between amino acid ‘types’ is kept in view, it is based on the Euclidean distance between the residues being compared. The approaches of structural alignment are helpful in organising and classifying known structures and present gold standard for sequence alignment. Consequently a large number of protein structure alignment methods have been developed such as those described.

Protein Databank provides protein structures in simple text format organised into various records and fields. Retrieving data from a text file is time consuming and complicated. Some attempts have been made to organise and store protein structures in relational form but they don’t provide a systemic way to manipulate all records of a protein structures. Therefore, it is extremely needed to develop software which can display, compare, analyze and provide user friendly view of protein molecules. Institute of Biochemistry & Biotechnology, University of Veterinary and Animal Sciences, Lahore (IBBT -UV AS) took a leading step to develop software which provides view and allows the user to analyze various components of a protein molecule. Keeping in view the importance of protein molecules in various domains of molecular biology Suite PST was developed which is a package of tools for performing various types of activities’ relevant to analysis and comparison of protein structures, One of the most important components of SuitePST is relational protein databank (RPDB). RPDB is developed in MySQL which is an open source relational database management system. All PDB data has been transformed and stored in RPDB. The proposed database for protein structures have made convenient to retrieve and analyze data of a single protein. For example, the user can obtain header data, the journal in which the protein structure was published, the atom coordinates, sequence residues, helix! beta sheets, the chains and any other information of a given PDB ID. RPDB also allows the user to, compare various components of one protein structure to the other protein structure. For example, a user can compare secondary structure elements, atom coordinates or sequence residues of two given protein structures. Similarly all these activities can be performed while comparing a single protein structure against multiple structures. RPDB outperforms other relational databases in the sense that it allows a user to find out comprehensive information about a single protein using a number of criteria, it supports search of protein structures using large number of records, it allows to search protein structures using nested search criteria, it provides a user friendly interface for usage of relational operators to make a search query comprehensive, it allows user to compare a given protein structure with the whole database, it allows to compare two given protein structures based on various criteria.

A web interface of RPDB has also been provided. The web interface allows the user to perform activities for a single protein structure, pairwise and multiple protein structure comparison. The interface is designed using hyper text markup language (HTML), cascading style sheet (CSS). Business logic is written in Java programming language.

SuitePST also provides the implementation of an Efficient and Accurate Protein Structure Alignment (EAPSA) algorithm. The proposed algorithm filters aligned fragment pairs (AFPs) in two stages for the purpose of achieving maximum accuracy and utilises the powerful feature of multithreading of Java programming language to achieve maximum efficiency. Results showed that accuracy of SuitePST was comparable to CE and DALI whereas its efficiency was very close to the efficiency of PDBE fold. This software application was developed using Java programming language and BioJava library which is an open source library for developing bioinformatics tools. Latest version of Net beans which is an integrated development environment (IDE) was used. SuitePST is available both in form of a desktop as well as a web application. It supports generation of more than two pairwise alignments simultaneously and searches the structure neighbors. Divide and conquer approach was used to achieve efficiency. SuitePST provides the features of color by deviation, editing and saving the alignment and visualisation by integrating the Jmol. SuitePST also provides the feature of converting one format of protein structure into the other.