Bioinformatics Miscellaneous


  1. An example of a derived protein structure database is









  1. View Hint View Answer Discuss in Forum

    Protein structural database: Primary database: PDB Secondary database: SCOP, CATH

    Correct Option: B

    Protein structural database: Primary database: PDB Secondary database: SCOP, CATH


  1. Match the entries in the Group Iwith the entries inGroup II.
    Group IGroup II
    P. Threading1. Gene duplication
    Q. FASTA2. Fold prediction
    R. Profile3. HMM
    S. Paralogs4. k-tuple










  1. View Hint View Answer Discuss in Forum

    Threading is the method of three dimensional structure predictions of proteins by generating threads that help to determine folds in a protein.  FASTA is a program for rapid alignment of pairs of protein and DNA sequences. Rather than comparing individual residues in the two sequences, FASTA instead looks for matching sequence patterns or words, called k-tuples, and then attempts to build a local alignment based upon these word matches. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement standard pairwise comparison methods for large-scale sequence analysis. Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.

    Correct Option: B

    Threading is the method of three dimensional structure predictions of proteins by generating threads that help to determine folds in a protein.  FASTA is a program for rapid alignment of pairs of protein and DNA sequences. Rather than comparing individual residues in the two sequences, FASTA instead looks for matching sequence patterns or words, called k-tuples, and then attempts to build a local alignment based upon these word matches. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement standard pairwise comparison methods for large-scale sequence analysis. Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.



  1. Determine the correctness or otherwise of the following Assertion (A) and Reason (R).
    Assertion: UPGMA method produces ultrametric tree.
    Reason:Sequence alignment is converted into evolutionary distances in UPGMA method.









  1. View Hint View Answer Discuss in Forum

    The unweighted pair-group method with arithmetic mean (UPGMA) is a popular distance analysis method. UPGMA is the simplest method for constructing trees. The great disadvantage of UPGMA is that it assumes the same evolutionary speed on all lineages, i.e. the rate of mutations is constant over time and for all lineages in the tree. This is called a ‘molecular clock hypothesis’. This would mean that all leaves (terminal nodes) have the same distance from the root. In reality the individual branches are very unlikely to have the same mutation rate. Sequence alignment is converted into evolutionary distances. UPGMA is “ultrametric”, meaning that all the terminal nodes (i.e. the sequences/taxa) are equally distance from the root.  In molecular terms, this means that UPGMA assumes a molecular clock, i.e. all lineages are evolving at a constant rate.

    Correct Option: B

    The unweighted pair-group method with arithmetic mean (UPGMA) is a popular distance analysis method. UPGMA is the simplest method for constructing trees. The great disadvantage of UPGMA is that it assumes the same evolutionary speed on all lineages, i.e. the rate of mutations is constant over time and for all lineages in the tree. This is called a ‘molecular clock hypothesis’. This would mean that all leaves (terminal nodes) have the same distance from the root. In reality the individual branches are very unlikely to have the same mutation rate. Sequence alignment is converted into evolutionary distances. UPGMA is “ultrametric”, meaning that all the terminal nodes (i.e. the sequences/taxa) are equally distance from the root.  In molecular terms, this means that UPGMA assumes a molecular clock, i.e. all lineages are evolving at a constant rate.


  1. The amino acid substitution matrices in decreasing order of stringency for comparing protein sequences are









  1. View Hint View Answer Discuss in Forum

    Substitution matrices like PAM100, PAM120 and PAM250 are constructed by observing the frequencies of amino acid replacements in large samples of protein sequences. For a given replacement, the PAM value is proportional to the natural log of the frequency with which that replacement was observed to occur. One PAM unit is defined as the
    amount of sequence divergence corresponding to a 1% amino acid replacement rate. For closely-related sequences, it is appropriate to use a PAM100 matrix, in which PAM units have been extrapolated to 100% replacement. In other words, 100% of the positions show at least one replacement.  For most database searches, a PAM250 matrix is preferred, since larger databases will tend to have sets of more distantly-related sequences. So, decreasing order of stringency of point accepted mutation matrices would be: PAM100 > PAM120 > PAM250.

    Correct Option: B

    Substitution matrices like PAM100, PAM120 and PAM250 are constructed by observing the frequencies of amino acid replacements in large samples of protein sequences. For a given replacement, the PAM value is proportional to the natural log of the frequency with which that replacement was observed to occur. One PAM unit is defined as the
    amount of sequence divergence corresponding to a 1% amino acid replacement rate. For closely-related sequences, it is appropriate to use a PAM100 matrix, in which PAM units have been extrapolated to 100% replacement. In other words, 100% of the positions show at least one replacement.  For most database searches, a PAM250 matrix is preferred, since larger databases will tend to have sets of more distantly-related sequences. So, decreasing order of stringency of point accepted mutation matrices would be: PAM100 > PAM120 > PAM250.



  1. In an affine gap penalty model, if the gap opening penalty is -20, gap extension penalty is -4 and gap length is 8, the gap score is ___.









  1. View Hint View Answer Discuss in Forum

    Gap score = 2 × Gap opening penalty + Gap extension penalty – Gap length
    Here, Gap opening penalty = –20,
    Gap extension penalty = –4 and gap length = 8
    Therefore, Gap score = 2 × (–20) – 4 – 8 = – 52

    Correct Option: C

    Gap score = 2 × Gap opening penalty + Gap extension penalty – Gap length
    Here, Gap opening penalty = –20,
    Gap extension penalty = –4 and gap length = 8
    Therefore, Gap score = 2 × (–20) – 4 – 8 = – 52