Bioinformatics Flashcards
(23 cards)
Bioinformatics
The use of computers to collect, analyse and
store biological data
Genome
all the genetic
material in the chromosomes
of a particular organism
Genomics
the study of genes and their function
- Sequence Assembly
- DNA sequencers generate short overlapping sequences (~500bp)
- These fragments must be assembled into their correct order
along the chromosome - This generates a single consensus sequence of DNA
- Finding the genes
- Less than 2% of the human genome encodes proteins
- How do we find the regions (i.e. genes) that do?
Gene Prediction tools look for sequence features that occur at
defined sites within eukaryotic genes that perform specific roles
during transcription/translation.
These include: - Promoter sites
- Translation start sites
- Termination codons
- Intron/Exon splice sites
- Assigning function
- First translate DNA into a theoretical protein sequence
- Then find the open reading frame (ORF)
- Use the translated protein sequence to look for similar
sequences with known function – similarity searching - Many protein domains are evolutionarily conserved (i.e. will
look the same even in un-related species; human vs worm) - Proteins that are similar in sequence across several species are
likely to have a similar function
- Annotation
- The process of attaching biological information to sequences
- Provides a record of valuable information for scientists trying
to understand gene function - Can include:
- Location of ORFs
- Gene structure (introns/exons)
- Regulatory elements (e.g. promoters)
- Biochemical function of the encoded protein
- Conserved domains
- Protein interactions
Dotplot
a simple matrix that gives an overview of the
similarities between two sequences
Global Alignments
compare sequences in their entirety
Local alignments
find the regions of highest similarity between two sequences and extend the alignment outwards
Multiple Sequence Alignment (MSA)
used to compare
three or more sequences
Percentage Sequence Identity
a simple scoring method which
indicates the extent to which two sequences are invariant
BLAST
Basic Local Alignment Search Tool
a search program that finds regions of local similarity
between amino acid or nucleotide sequences
Protein Family
a group of proteins that share a common evolutionary origin, reflected by their related functions and similarities in sequence or structure
Domains
distinct functional and/or structural units in a
protein.
- Domains are responsible for a particular function or
interaction, contributing to the overall role of a protein
- Similar domains can be found in proteins with different
functions
Sequence features
small groups of amino acids that confer some biochemical property upon a protein
Active sites
catalytic residues of enzymes
Binding sites
residues that directly bind molecules or ions
Post-Translational modifications (PTM) sites
chemically modified residues
Sequence Motifs
short conserved
regions of amino acid (or DNA)
sequence that are important
structurally & functionally
Sequence Profiles
describe motifs using quantitative information captured in a position specific scoring matrix
Signal Sequences
deliver proteins to specific sites within or outside the cell
Signal Peptides
short sequences that direct proteins for secretion outside the cell