final study Flashcards
(77 cards)
What is genome assembly?
Reconstruct the original genome sequence from millions of pieces.
What are the two main types of genome assembly?
- Reference-based
- De Novo
What does the assembly workflow start with?
Genomic libraries with raw long reads and/or raw short reads in fastq format.
What is the purpose of quality training in the assembly workflow?
To get rid of low-quality sequences that prevent errors.
What is a contig?
Contiguous sequences assembled from reads.
What are scaffolds?
Larger fragments of DNA that can contain gaps of unknown size.
What is the difference between contigs and scaffolds?
Contigs are contiguous sequences, while scaffolds are larger fragments that may have gaps.
What are chromosomes in the context of genome assembly?
A full set of chromosomes, possibly haplotype-phased.
Name two assembly strategies for genome assembly.
- Short reads only
- Long reads with polishing
What is the challenge associated with short reads in assembly?
Complex regions longer than the size of reads do not yield high-quality assembly.
What is the N50 metric used for?
To measure contiguity in an assembly, indicating the length of contigs that make up half of the assembly.
What does BUSCO assess in genome assemblies?
Completeness by checking for a curated set of single-copy genes expected in a clade.
What is the purpose of a Hi-C library?
To capture chromosome conformation for scaffolding the genome.
What does a Hi-C map represent?
A visualization of 3D genome structure showing interaction frequencies.
What are FastQ files used for?
To store raw sequencing data including sequences and quality scores.
What is the significance of a quality score of 30 in FastQ files?
Indicates a 90% chance that the base is correct.
What is a GFF file?
A plain text file that stores annotation information about gene locations in the genome.
What is the main advantage of Illumina sequencing?
High coverage and cost-effectiveness for reference-based assembly.
What does synteny refer to in genomics?
Linkage between genomic regions that can be conserved across species.
What are inversions in the context of genome rearrangements?
When a gene on a chromosome inverts, potentially affecting phenotypes.
What is functional annotation?
Attaching biological functions to predicted gene products.
What is the purpose of the BLAST tool?
To compare DNA/Protein sequences against a database to find similarities.
What are the three main categories of Gene Ontology (GO) terms?
- Biological Process (BP)
- Cellular Component (CC)
- Molecular Function (MF)
What is the process of gene expression?
The process by which information encoded in a gene is turned into a function.