L10&11 - Bioinformatics Flashcards

1
Q

What is bioinformatics?

A

“Science of storing, retrieving and analysing large amounts of biological data”

Combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret biological data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are protein domains?

A

Domains are distinct functional or structural sites in a protein sequence and may contain one or more motifs (short recurring patterns in a protein)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are protein families?

A

A group of proteins that share a common evolutionary origin, reflected by their related functions and similarities in sequence or structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a database?

A

A structured set of data held in a computer, especially one that is accessible in many ways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Scoring when comparing sequences

A

Scores:
2 for a match
0 for a mis-match
-1 for an insert (gap)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Difference between local and global alignments

A

Global = similarity across the full length of the sequence (one unit)

Local = considers regions of similarity in parts of the sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is BLAST?

A

Basic Local Alignment Search Tool

Theory: uses a segment pair, searches fixed length segments, these hits are then extended until they score above pre-set threshold

Lots of different programs available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Programs available in BLAST

A
blastp
blastn
blastx
tblastn
tblastx
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

blastp

A

An amino acid query sequence against a protein database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

blastn

A

A nucleotide query sequence against a nucleotide sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

blastx

A

A nucleotide query sequence translated in all reading frames against a protein sequence database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

tblastn

A

A protein query sequence against a nucleotide sequence database dynamically translated in all reading frames

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

tblastx

A

The six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the expect (E) value in BLAST?

A

A parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size

Similar to a p value
The smaller the better

If we manually increase this value get more ‘hits’ but may not be better (increasing chance of alignment but may be poor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Other parameters in BLAST

A

Low complexity filter - removes regions of low complexity from alignments (high frequency simple repeats) can be problematic

Lots of other settings that can be altered for specific searches- massive capacity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are accession numbers?

A

Unique identifier given to a DNA or protein sequence record to allow for tracking of different versions of that sequence record

NM, XM, NP or XP followed by a number then .001 (version number)

Want to work with the most complete sequence

17
Q

What does NM mean in an accession number?

A

mRNA sequence

18
Q

What does XM mean in an accession number?

A

Predicted mRNA (from genome analysis)

19
Q

What does NP mean in an accession number?

A

Protein sequence

20
Q

What does XP mean in an accession number?

A

Predicted protein sequence (from genome analysis)