Genome Sequencing Methods Flashcards

1
Q

Physical mapping

A

BACs= bacterial artificial chromosomes
cloning vectors for larger pieces of DNA
physical mapping- provides path of minimally overlapping clones along chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

BAC

A

physical mapping
bacterial artificial chromosomes
Bac ends sequenced to provide landmarks in genome
BAC clones can be sequenced individually (clone by clone) – now outdates and done by whole genome shotgun method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Shotgun sequencing

A

whole genome sequencing

  • fragment whole genome into 1-3kb pieces
  • clone fragments into vectors (make a library)
  • sequence clones from both ends (parallel end seq)
  • align using computational methods
  • individual reads aligned into Contigs. Contigs into scaffolds
  • many gaps often remain
  • can be problematic for large genomes with many TEs that are similar and repetative DNA
  • fixed by genome annotation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Contigs

A

individual reads that can align together

- contigs linked together by forming scaffolds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

scaffolds

A

connecting contigs together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Issues with shotgun sequencing

A

many gaps remain

large genomes can have many TEs that are similar to each other and repeatative DNA = misalignment etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Genome annotation

A

After shotgun sequencing
gene finding programs (FGENESH and GeneScan)
- repeat masking done to hide repeated regions
- exon/intron structure predicted by programs
-transcriptome comparisons to find expressed genes and intron/exon junctions
-putative functions- found by comparisons to related species
-detection and annotation of Non-coding RNA and TES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Next generation sequencing

A

“2nd generation”
Ultra high throughput
-genome sequencing much faster
- 454, illumina, ion torret
- millions of sequence reads obtained per run instead of 96 or 384, with conventional sanger
- 454- sequencing by synthesis - no longer used
- illumina sequencing - higher throughput than 454
– 100-150bp short end (often paired end) reads
– used for re-sequencing to look for polymorphisms
–now (not often) used for de novo but challenges for assembling short reads – needs to be paired with other technologies
-ion torret= intermediate btw 454 and illumina, reads comparable to 454

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Single molecule sequencing

A

3rd generation
- using DNA polymerase
- does not require amplification of template
- SMRT sequencing and Oxford Nanopore’s Minion
– long reads (2400-4000 bp)
moderate throughput- like 454 technique
-good for De novo sequencing - long reads= easier assembly
-sometimes can be paired with others like illumina to correct for errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Resequencing of genomes

A

multiple lines, cultivars, accessions, ecotypes with sequenced reference genomes can be sequenced
- find variants etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Genotyping

A

use of DNA data to analyze relationships in or among populations

  • GBS (genotyping by sequencing) and RAD sequencing
  • using illumina sequencing of restriction digested DNA, using barcoding to sequence many samples in lane
  • used in population and evolutionary genetic studies
  • previously AFLPS and microsatellites were used but not as efficient
  • data allow for analysis of SNPs among individuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Main purposes for RNA-seq

A

transcriptome sequencing
- reference transcriptome RNA-seq to obtain a set of reference transcripts
Expression profiling
- compare gene expression levels in 2 or more samples from RNA-seq data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Illumina

A

millions of 100-150bp reads
sometimes paired end
useful for expression profiling: high depth sequencing
de novo transcriptome ref sequencing but challenging with short reads
- normalize reads to length of gene– more reads higher mRNA expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

454 and Ion torret

A

Longer reads than illumina, easier to align together or to reference genome
very useful for reference transcriptome sequencing
not as widely used for expression profiling studies as illumina (read depth lower)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

PacBio SMRT and Oxford nanopore’s minion

A

long reads sequence entire transcripts
very useful for reference transcriptome sequencing
allows sequencing of isoforms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

RNA seq aplications

A

new gene discovery
profiling of tissue/organ types, diseased vs wt, mutant vs wt, effects of stress/pathogens, etc.
-discovery of alternative splice sites
-expression profiling and discovery of miRNA, and siRNAs
- analyzing TF binding sites with Chromatin immunoprecipitation followed by RNA-seq

17
Q

Bioinformatics

A

study of biological information using concepts and methods from computer science and stats

  • algorithm and program development
  • genome database development
  • computational analysis of high throughput DNA sequence and expression data to answer biological questions
18
Q

blast searching

A
BLASTn= nucleotide to nucleotide
BLASTx= nucleotide to protein database
BLASTp= protein to protein
tBLASTn= protein to nucleotide database translated into all possible reading frames
tBLASTx= nucleotide to nucleotide translated to all possible reading frames-- slowest
19
Q

E-value

A

expectation value

  • lower/closer to 0 is best, 0.1=worst
  • represents significance of each hit
  • defined as number of hits one can expect to find by chance when searching a database of particular size
20
Q

few ways in which genome size effects organism

A
  • nucleus size, cell size
  • duration of cell cycle
  • cell differentiation rate
  • metabolic rate
  • embryotic developmental rate
  • life history strategy
  • invasiveness
  • extinction rate
21
Q

4 ways TE insertions can negatively impact host

A
  1. energetic costs of replication, transcription, translation
  2. disrupts cellular processes by TE proteins
  3. susceptibility to harmful GOF mutations
  4. deleterious rearrangements caused by ectopic recombination
22
Q

repeatMASKER

A

program that detects and filters out repeated sequences in genomes using sequence similarity to known set of repetitive sequences
- only as good as reference genome

23
Q

why is homozygosity important for genome sequencing?

A

facilitates the assembly of the genome and only 1 copy is required as don’t have to deal with allelic variants
- implications for putting together genome

24
Q

when assembling a genome why is the percentage of genes higher than the total amount in the assembled genome

A

reads are overlapping due to many different types of sequencing– over estimation of genes

  • genes are easier to find
  • repeatative regions hard to asssemble