Genome sequencing and assembly 3 Flashcards

1
Q

What are contigs?

A

Continuous, gapless, sequences of DNA that have been assembled from overlapping pieces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are scaffolds?

A

larger structures made by linking contigs together, using additional information to span gaps between them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is gap filling?

A

Filling in gaps between contigs/scaffolds that exist after shotgun sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does gap filling work?

A

If they had the start and end btis of the gap they filled it in using internal primers all along the way of the length of the gap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a physical gap?

A

A stretch of the sequence that isn’t present in the clone library

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why may physical gaps exist?

A

Gap regions may have been unstable in the cloning library

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which types of sequence are difficult to sequence?

A

Transposons, tandem repeats, centromeres–> repetitive sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Issue with sequencing repetitive elements?

A

A sequence that lies partly or wholly within a repeat element might be assigned as an overlap in a different part of the repeat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Frequency of repetitive elements in pro and eukaryotes?

A

Not common in pro, v common in eu

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Challenges in genome assembly?

A

Long reads–> high error rate in ON
short reads–> difficult to assemble large genomes with them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Examples of genome assembly algorithms?

A

Overlap layout consensus, De Bruijn Graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Overlap layout consensus?

A

Identifies overlap regions in fragments
Used to create a layout which connects the reads in order
Most likely overall sequence is determined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pros of Overlap layout concensus?

A

Useful for assembling genomes from long data reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

De Bruijn Graph?

A

Breaks down fragments into much shorter sequences (k-mers)
(K is the amount of nucleotides)
Connects the nodes (v short sequences) that have k-1 similarities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

De Bruijn graph example?

A

TGA–> GAC–> ACC–> CCG

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ways to assess genome assembly quality?

A

Completeness, Depth and COverage, Contiguity

17
Q

Completeness of genome assembly?

A

How much of the genome has been assemble without gaps

18
Q

Depth and coverage of genome assembly?

A

Number of times each base is sequenced. Higher coverage ensures accuracy

19
Q

Contiguity of genome assembly?

A

Assembled into contigs or scaffolds

20
Q

What is the N50 score?

A

Average length of contigs/scaffolds in assembly

21
Q

Good N50 score rn?

22
Q

What does BUSCO stand for?

A

benchmarking universal single copy orthologs

23
Q

What is BUSCO?

A

A method of measuring the completeness of genome assembly by comparing it against a set of highly conserved ortholog genes
e.g. each organism has RNA pol so it looks for that gene etc

24
Q

Pangenome?

A

The complete set of genes within a species, encompassing both core genes shared by all individuals and variable genes present in some but not others

25
Core genome?
genes present in all individuals of a species
26
Accessory genome?
genes that vary among individuals contributing to diversity
27
Why may it be difficult assembling a single reference genome?
Organisms may have lots of diversity so would need to sequence multiple strains to capture the full variability
28