Structural Genomics Flashcards

1
Q

What are the two types of databases for genome projects

A
  • General: GenBank and GenomeBrowser
  • Specific: Influenza Research Database, PopFly and FlyBase
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

As ______ advances, so does the development of ______ which results in the ____________

A
  • Computer connectivity
  • Molecular technologies
  • Establishment of public sequence databases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a biological database?

A
  • A large body of data organized to give access to information
    Allows for:
  • connectivity between databases
  • tools for servers and analyses
  • training and tutorials to understand how to do things
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is a GenBank accession number

A

A unique identifier for a sequence record. It has a combination of letters and numbers that indicate types of molecules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kind of information can be found on a genome sequence

A
  • Open Reading Frames (ORFs) and proteins
  • Regions of interest like promoter regions and receptor binding regions
  • Protein domains
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a RT-PCR Primer

A
  • A number of primer sets from around the world used to find viruses in biological samples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is structural genomics

A

An assembly of contiguous stretched of chromosomal DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List the three types of genomics

A
  • Structural genomics
  • Functional genomics; characterizing the role played by transcripts and proteins
  • Comparative genomics; comparing he genomes of different organisms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are whole genomes generated?

A

By;
- Sequencing and assembly
- A desired level of quality—for variable
- Annotation for function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Hierarchical Genome Shotgun (HGS) approach to sequencing the human genome

A
  • Using random markers to fragment genome and the map of the fragments are organized
  • The minimum amount of overlapping clones are sequenced repeatedly and the adjacent clones are merged
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Whole Genome Shotgun (WGS) approach to sequencing the human genome

A
  • The whole genome is fragmented into pieced then, sequenced.
  • A series of overlapping DNA sequences are used to build the map
  • The pieces are assembled into contigs that span across each chromosome
  • It is faster and cheaper than HGS, but when two genome regions are similar in sequence, they’re lumped together creating gaps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between WGS and HGS?

A
  • In WGS, the entire genome is cut randomly into small fragments without being mapped and then reassembled. In HGS, the genome is first mapped with clones chosen and then sequenced.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the problem during the assembly of sequence date from the WGS method?

A

There is no map to tell the true order of the sequence so repetitive DNA and duplications can fit into different places
If putting he wrong place it can cause gaps in the sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is the problem pertaining to the WGS method solved?

A

If the end reads of two different contigs are recognized within a vector, the paired-ends will indicate they belong together physiologically and no gaps are created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How are genomes mapped?

A

we can look at variations of DNA based on the coinheritance of molecular markers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What techniques are used you obtain fingerprints?

A
  • RFLP: Restriction Fragment Length Polymorphism
  • STS: Sequence-Tagged Sites
17
Q

Explain RFLP

A
  • The fingerprint being obtain is cloned twice with overlapping sections
  • These clones are placed on a band and read, the shared bands between the two indicates DNA sequence
18
Q

Explain STS

A
  • This is used to amplify DNA sequences with PCR
  • This will produce a simple and reproducible pattern on a gel
  • The marker finds a site on the genome with PCR
19
Q

What is de novo sequencing

A

Determining the complete genome sequence of the first genome of a species but, you need to assemble it first because the genome is not present at first

20
Q

Describe resequencing

A

there is a reference genome from another species used to sequence the genome of interest
This genome is then mapped to the reference genome

21
Q

What is the recommended approach to sequencing

A
  • The hybrid approach
  • Whne you use WGS to produce a six fold coverage and map it
  • A mini tiling path of duplicated DNA is produced for sequencing
22
Q

What are some indicators of genome completeness and quality?

A
  • Per-base coverage (the average number of times a base is sequenced in a genome project
  • N50 (the shortest contig lenght that needs to be included to cover 50% of the total assembly length
23
Q

What is coverage

A
  • The number of sequence reads multiplied but the length of reads
  • This is then divided by the genome size. This can be used to estimate the coverage
  • A higher coverage is always desired
24
Q

explain the concept of N50

A
  • A small N50 indicated the genome is fragmented and a large N50 indicates the genome has longer contigs
  • There are 4 contigs of different lengths (5Kb, 4Kb, 2Kb and 1Kb) with a sum of 12Kb
  • The N50 for this assembly will be 4 because the sum of the longest and second longest (5+4) divided by the sum of all lengths (12) is 75% which is greater than 50% (6)
25
Q

How is genome sequence data displayed?

A
  • Graphics: showing the structure of the genome with details like exons and introns)
  • GenBank: entries with descriptors
  • FASTA: A universal file format used to report sequences followed by an actual sequence
26
Q

what is genome annotation

A