Bioinformatics Flashcards

(56 cards)

0
Q

Given gene how would you find info about it

A
Find sequence (EMBL, DDBJ)
Literature database
Genomics database (MIM)
Gene expression database (NCBIGEO)
Interaction databases (intact, BIND)
Metabolic pathway (ENZYME, KEGG, reactome)
Mutation/ polymorphism databases (dbSNP)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

What is a database

A

Data collection that is structured, searchable, updatable, cross-linked and publicly available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why does BLAST work?

A

Similar sequences tend to have similar function

Similar sequences tend to be evolutionarily related

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can you be sure our blast match is significant

A

E score (roughly equal to probability of chance)

E = mn2s
M - #nucleotides your sequence was compared against
N - #nucleotides in your sequence
2s - 2 to the power of match score (smaller as sequence get more similar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Blast aa

Blast nucleotide

A

BLASTp

BLASTn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is BLASTx

A

BLAST a translated nucleotide sequence in all 6 frames against aa sequence database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does tBLASTn do

A

BLASTs aa sequence against nucleotide sequence that has been translated in all 6 frames

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does tBLASTx do

A

Your nucleotide in six frames translated into aa against database nucleotides in six frames translated into aa

Good for distantly related sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

MegaBLAST

A

Quicker than BLASTn but less sensitive

Use this for everything unless looking for distantly related sequences (use tBLASTx for that)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

PSI-BLAST

A

Very sensitive blast that takes into account that some regions are more conserved than others. Takes LONG.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is special about multiple sequence alignments

A

Can reveal subtle conservation of genome features as these areas evolve/change slower. >3 sequence alignments can show evolutionary relationships.
Eg. Demographic and ecological histories of pops - gene flow, size changes, nat selection, migrations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Local vs global alignments

A

Global - end to end alignments

Local - specific regions of sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Common mismatch scoring schemes

A
Nucleotide mismatch 
Aa mismatch (BLOSSM, PAM)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are most multiple alignments done

A

Build multiple alignments from pair wise alignments. Use mismatch scores to find best score. Use a technique called Dynamic Programming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pair wise alignment methods

A

ClustlW - global alignment 20kb long
MUSCLE - global and local 100kb long
MAUVE - global 10Mb long

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Uses of sequence databases in bioinformatics

A
Retrieve known gene sequence
Finding info on gene
Compare sequence to others in DB
Submit sequence to be stored with rest
Find how many genes an organism has
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is it harder to do gene prediction in humans vs bacteria

A

Bacteria have specific and well understood proctor sequences (easy to identify) Protein coding sequences one contiguous ORF.

Human promotors less well understood and complex (harder) Protein coding is divided into exons and spliced variably.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why want to know GC content of sequence

A

Higher GC generally = longer protein coding region.
Melting temp for PCR.
Different orgs have varying GC content
Useful in mapping exon rich regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which genes are more homologous this or that

A

You can’t quantify homology. It is a conceptual framework to define the evolutionary relationship between two genes. You can quantify similarity. If they come from dif species you can look at orthology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why bioinformatics needed

A
Small and large scale analysis
New lab techniques
Single -> whole genome
Collection/storage of data
Manipulation of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Egs of sequence databases

A

EMBL
DDBJ
GeneBank

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What do genomics databases contain

A

Info about gene chromosomal location
Nomenclature
Links to sequence databases

Eg MIM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is an isoform

A

Alternative to a sequence

23
Q

Egs of gene expression databases

24
How to remove vector sequence from DNA sample sequence
Run against vector sequence database eg. UniVec
25
How to chose most likely translation result
``` Usually longest ORF Starting with Met Ending in stop No stops wonton sequence Confirm with promoter prediction ```
26
Egs of gene prediction software
GeneMark | GENSCAN
27
Translators and promoter prediction software
NCBI ORF Finder | Promotor 2.0 prediction server
28
Protein sequence databases
UniProt GenPept RefSeq
29
Database of 3D structures
Protein Data Bank (PDB)
30
Protein domain / family databases integrated into what site
InterPro
31
What is a motif
Sequence of aa encoding for a certain molecular function ``` Short = motif Long = functional domain ```
32
Short linear motifs
Unrelated proteins sharing a functional feature like to contain similar motifs Etc
33
Classification of motifs
Modification Ligand Targeting Cleavage
34
What is a regular expression
Determines what aa is allowed in each position | Used by PROSITE
35
BioEdit analysis for cloning
``` Nucleotide composition Six frame translation Determine ORF Length of insert/DNA RE mapping ```
36
Transition vs transversion
Transition is purine to purine or pyrimidine to pyrimidine (eg A to G , T to C) Transversions are opposite (twice as many transversions possible but twice as many transitions occur)
37
Types of sequence formates
Fasta Genbank Nexus Phylip
38
Types of sequence viewers
Sea view Aliview Mesquite MEGA
39
What is an open reading frame?
A string of in-frame codons that specify an amino acid Starts with ATG (meth) or Val Ends with stop codon
40
Gene prediction software
GeneMark GENSCAN microbial Gene Prediction Systm Glimmer
41
What are promoters?
DNA sequence involved in regulating transcription
42
Types of promoters
- core - proximal - distal
43
Functions of promoters
- integrate info about cell conditions and alter rate of transcription in response - different components responsible for different parts of expression pattern
44
Tasks of bioinformatics
- identify promoter regions - find TFBS and TFBS modules in a sequence - discover novel TFBS motifs - construct TFBS and their motifs - analysis of expression data
45
How to represent TFBS motifs
- consensus sequence | - position weight matrix
46
Databases of TFBS motifs
Transfac | Jaspar
47
What is phylogenetic foot printing?
Use of comparative genomics to infer functional genomic regions from conservation
48
What does phylogenetic foot printing require?
- comparison of correctly identified orthologous promoter regions - conserved function across species - species sufficiently diverged to reduce passive conservation
49
POSSUM workflow
- set of co-expressed genes - automated sequence retrieval from ensembl - phylogenetic foot printing - detection of TFBS - statistical significance of binding sites
50
What are methods of miRNA identification based on.
- targets tend to be located in 3'UTR | - some are complementary to the target RNA
51
What is a motif ?
A sequence of amino acids encoding a particular molecular function
52
What is PROSITE
Library of regular expressions describing each enzyme active site
53
Advantages of regular expressions
- memorable to humans - computationally fast - standardized in scripting languages - can describe a motif very well
54
Disadvantages of regular expressions
- over predict - motif may vary in other lineages - do not capture weaker preferences - easy to make poor representation
55
Example methods if protein functional domains
Matrix/profile Hidden Markov model Sequence clustering