Bioinformatic applications and Genome databases Flashcards

1
Q

What is Bioinformatics?

A

It is the integration of mathematical, computer, statistical and biological sciences to analyze biological “big data”.
is used for a variety of tasks, some of the most common are:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bioinformatics uses computer/software-based approaches to:

A
  1. Comparisons of related sequences from different species/organisms (alignments)
  2. Assemble genomes
    Sequence reads → contigs → scaffolds → chromosomes → genomes
  3. Annotate genomes
    - Identifying genes and regulatory elements (promoters, enhancers, terminators)
    - Structural sequences like telomere regions
    - Repetitive sequences (microsatellites)
  4. Investigate gene expression patterns (transcriptomics)
  5. Translate ORF to amino acid sequence for protein analysis
    - Prediction of protein function (domains and motifs)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Genbank?

A
  • A large repository of digital nucleic acid information and analyses tools – at your fingertips!
  • The largest publicly available database of DNA sequences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

As sequences are identified and genes are named, each sequence deposited into GenBank is provided with an accession number that scientists can use to access and retrieve that sequence for analysis. The NCBI is an invaluable source of public access data-bases and bioinformatics tools for analyzing genome data.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Genbank is maintained by

A

the National Center for Biotechnology Information (NIH)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Each sequence deposited in GenBank receives an

A

accession number - used to access and retrieve a sequence for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

To identify gene sequences

A

Genome projects generate tremendous amounts of DNA-sequence information, these data are simply a string of letters (ATGC) and are of little use until they have been analysed and interpreted.
For example, if we assemble contigs - how do we know where a gene is (promoter, regulatory elements, exons, introns) within these contigs?
Apart from experimental procedures to determine gene function, bioinformatics approaches can be used for the prediction of function by comparing with similar sequences that already exist in the database – BLAST.
Basic Local Alignment Search Tool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Protein-encoding sequences can be identified as open reading frames (ORFs)
Contain start (ATG) and stop (TAA, TAG, TGA) codons
Eukaryotic genes have defined regulatory elements (promoters, terminators, UTRs, polyadenylation signals and CpG islands)
Eukaryotic genes comprise of exons and introns (with defined splice sites between these)
Software can identify all these elements

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Translating nucleotide sequence to amino acid sequence

A

Bioinformatic software can also be used to “translate” ORFs into possible polypeptide sequences as a way to predict the protein encoded by a gene.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Slide 10 -12. Chapt 21 - Part 1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly