final study Flashcards
(64 cards)
What is the primary goal of genome assembly?
Reconstruct the original genome sequence
Genome assembly involves compiling millions of pieces of DNA sequences to form a complete genome.
What are the two main types of genome assembly methods?
Reference-based and De Novo
Reference-based assembly maps reads to a known reference genome, while De Novo assembly constructs a genome without a reference.
What is the first step in the assembly workflow?
From genomic libraries
This involves obtaining raw long and/or short reads in fastq format.
What is the purpose of quality trimming in genome assembly?
To get rid of low-quality sequences and prevent errors
Quality trimming ensures that only accurate sequences contribute to the assembly.
What is a contig?
Contiguous sequences
Contigs are assembled from reads and represent continuous stretches of DNA.
How are scaffolds formed in genome assembly?
By bridging contigs together
Scaffolds can contain gaps of unknown size and represent larger fragments of DNA.
What is N50 in the context of evaluating assembly quality?
A metric indicating the length of the contig that has half of the assembly
A higher N50 value indicates better contiguity in the assembly.
What does BUSCO assess in genome assemblies?
Completeness by evaluating single-copy genes
BUSCO uses a curated set of genes expected to be present in a given clade to benchmark assembly quality.
What is a Hi-C library used for?
Capturing chromosome conformation for scaffolding the genome
Hi-C libraries help determine how different parts of chromosomes are physically close to one another.
What is the significance of paired-end sequencing in Hi-C?
To map reads to the assembly and determine proximal locations of contigs
This step does not construct the genome but helps visualize the spatial organization of the genome.
Fill in the blank: The _______ is a high throughput method for obtaining conformational information of chromosomes.
Hi-C library
What challenge does the assembly strategy using short reads face?
Complex regions longer than the size of reads
This limitation can lead to a low-quality assembly.
What is a De Bruijn graph used for in genome assembly?
To build a graph from kmers for short reads
It helps identify overlaps and connections between short sequence fragments.
True or False: Hybrid assembly combines short and long reads to improve coverage.
True
What does the diagonal in a Hi-C interaction heatmap represent?
Sequences that are next to each other
The intensity of color in the heatmap indicates the frequency of interactions between regions.
What is a chromosome-level assembly?
An assembly that represents a full set of chromosomes
This may include phased assembly, capturing genetic information from both parents.
What is the format of the first line in a FastQ file?
header (@)
The header line typically starts with ‘@’ followed by a sequence identifier.
What does the second line in a FastQ file represent?
sequence
This line contains the nucleotide sequence of the DNA.
What is found on the third line of a FastQ file?
separator (+)
This line serves as a separator between the sequence and the quality score.
What information is provided in the fourth line of a FastQ file?
Quality Score
This line contains a code that represents the quality score of the sequence.
What is the desired quality score for most reads in a FastQ file?
30 or over
A quality score of 30 indicates a 90% chance that the base is correct.
What does a quality score of 30 indicate about the base?
10% chance it’s the wrong base
This score reflects a high probability of accuracy in base calling.
What character in a FastQ file indicates a quality score of 30?
?
The ‘?’ symbol corresponds to a quality score of 30 in the encoding system.
What is the primary purpose of a FASTA file?
store DNA sequences
FASTA files are used to represent nucleotide or protein sequences.