Bioinformatics Miscellaneous Difficult Questions and Answers

Bioinformatics Miscellaneous

An example of a derived protein structure database is

1. Pfam
1. SCOP
1. GEO
1. Prosite

View Hint View Answer Discuss in Forum

Protein structural database: Primary database: PDB Secondary database: SCOP, CATH

Correct Option: B

Protein structural database: Primary database: PDB Secondary database: SCOP, CATH

Match the entries in the Group Iwith the entries inGroup II.
Group I Group II
P. Threading 1. Gene duplication
Q. FASTA 2. Fold prediction
R. Profile 3. HMM
S. Paralogs 4. k-tuple

Group I	Group II
P. Threading	1. Gene duplication
Q. FASTA	2. Fold prediction
R. Profile	3. HMM
S. Paralogs	4. k-tuple

1. P-2, Q-1, R-3, S-4
1. P-2, Q-4, R-3, S-1
1. P-3, Q-4, R-2, S-1
1. P-1, Q-4, R-3, S-2

View Hint View Answer Discuss in Forum

Threading is the method of three dimensional structure predictions of proteins by generating threads that help to determine folds in a protein. FASTA is a program for rapid alignment of pairs of protein and DNA sequences. Rather than comparing individual residues in the two sequences, FASTA instead looks for matching sequence patterns or words, called k-tuples, and then attempts to build a local alignment based upon these word matches. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement standard pairwise comparison methods for large-scale sequence analysis. Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.

Correct Option: B

Threading is the method of three dimensional structure predictions of proteins by generating threads that help to determine folds in a protein. FASTA is a program for rapid alignment of pairs of protein and DNA sequences. Rather than comparing individual residues in the two sequences, FASTA instead looks for matching sequence patterns or words, called k-tuples, and then attempts to build a local alignment based upon these word matches. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement standard pairwise comparison methods for large-scale sequence analysis. Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one.

Determine the correctness or otherwise of the following Assertion (A) and Reason (R).
Assertion: UPGMA method produces ultrametric tree.
Reason:Sequence alignment is converted into evolutionary distances in UPGMA method.

1. Both (A) and (R) are true and (R) is the correct reason for (A)
1. Both (A) and (R) are true and (R) is not the correct reason for (A)
1. (A) is true but (R) is false
1. (A) is false but (R) is true

View Hint View Answer Discuss in Forum

The unweighted pair-group method with arithmetic mean (UPGMA) is a popular distance analysis method. UPGMA is the simplest method for constructing trees. The great disadvantage of UPGMA is that it assumes the same evolutionary speed on all lineages, i.e. the rate of mutations is constant over time and for all lineages in the tree. This is called a ‘molecular clock hypothesis’. This would mean that all leaves (terminal nodes) have the same distance from the root. In reality the individual branches are very unlikely to have the same mutation rate. Sequence alignment is converted into evolutionary distances. UPGMA is “ultrametric”, meaning that all the terminal nodes (i.e. the sequences/taxa) are equally distance from the root. In molecular terms, this means that UPGMA assumes a molecular clock, i.e. all lineages are evolving at a constant rate.

Correct Option: B

The unweighted pair-group method with arithmetic mean (UPGMA) is a popular distance analysis method. UPGMA is the simplest method for constructing trees. The great disadvantage of UPGMA is that it assumes the same evolutionary speed on all lineages, i.e. the rate of mutations is constant over time and for all lineages in the tree. This is called a ‘molecular clock hypothesis’. This would mean that all leaves (terminal nodes) have the same distance from the root. In reality the individual branches are very unlikely to have the same mutation rate. Sequence alignment is converted into evolutionary distances. UPGMA is “ultrametric”, meaning that all the terminal nodes (i.e. the sequences/taxa) are equally distance from the root. In molecular terms, this means that UPGMA assumes a molecular clock, i.e. all lineages are evolving at a constant rate.

The amino acid substitution matrices in decreasing order of stringency for comparing protein sequences are

1. PAM250, PAM120, PAM100
1. PAM100, PAM120, PAM250
1. PAM250, PAM100, PAM120
1. PAM120, PAM250, PAM100

View Hint View Answer Discuss in Forum

Substitution matrices like PAM100, PAM120 and PAM250 are constructed by observing the frequencies of amino acid replacements in large samples of protein sequences. For a given replacement, the PAM value is proportional to the natural log of the frequency with which that replacement was observed to occur. One PAM unit is defined as the
amount of sequence divergence corresponding to a 1% amino acid replacement rate. For closely-related sequences, it is appropriate to use a PAM100 matrix, in which PAM units have been extrapolated to 100% replacement. In other words, 100% of the positions show at least one replacement. For most database searches, a PAM250 matrix is preferred, since larger databases will tend to have sets of more distantly-related sequences. So, decreasing order of stringency of point accepted mutation matrices would be: PAM100 > PAM120 > PAM250.

Correct Option: B

Substitution matrices like PAM100, PAM120 and PAM250 are constructed by observing the frequencies of amino acid replacements in large samples of protein sequences. For a given replacement, the PAM value is proportional to the natural log of the frequency with which that replacement was observed to occur. One PAM unit is defined as the
amount of sequence divergence corresponding to a 1% amino acid replacement rate. For closely-related sequences, it is appropriate to use a PAM100 matrix, in which PAM units have been extrapolated to 100% replacement. In other words, 100% of the positions show at least one replacement. For most database searches, a PAM250 matrix is preferred, since larger databases will tend to have sets of more distantly-related sequences. So, decreasing order of stringency of point accepted mutation matrices would be: PAM100 > PAM120 > PAM250.

In an affine gap penalty model, if the gap opening penalty is -20, gap extension penalty is -4 and gap length is 8, the gap score is ___.

1. - 30 to -5
1. - 40 to -40
1. - 52 to -52
1. - 65 to -40

View Hint View Answer Discuss in Forum

Gap score = 2 × Gap opening penalty + Gap extension penalty – Gap length
Here, Gap opening penalty = –20,
Gap extension penalty = –4 and gap length = 8
Therefore, Gap score = 2 × (–20) – 4 – 8 = – 52

Correct Option: C

Gap score = 2 × Gap opening penalty + Gap extension penalty – Gap length
Here, Gap opening penalty = –20,
Gap extension penalty = –4 and gap length = 8
Therefore, Gap score = 2 × (–20) – 4 – 8 = – 52

Bioinformatics Miscellaneous

Bioinformatics Miscellaneous

Bioinformatics

Correct Option: B

Correct Option: B

Correct Option: B

Correct Option: B

Correct Option: C