final study Flashcards

(77 cards)

1
Q

What is genome assembly?

A

Reconstruct the original genome sequence from millions of pieces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main types of genome assembly?

A
  • Reference-based
  • De Novo
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the assembly workflow start with?

A

Genomic libraries with raw long reads and/or raw short reads in fastq format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the purpose of quality training in the assembly workflow?

A

To get rid of low-quality sequences that prevent errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a contig?

A

Contiguous sequences assembled from reads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are scaffolds?

A

Larger fragments of DNA that can contain gaps of unknown size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between contigs and scaffolds?

A

Contigs are contiguous sequences, while scaffolds are larger fragments that may have gaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are chromosomes in the context of genome assembly?

A

A full set of chromosomes, possibly haplotype-phased.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name two assembly strategies for genome assembly.

A
  • Short reads only
  • Long reads with polishing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the challenge associated with short reads in assembly?

A

Complex regions longer than the size of reads do not yield high-quality assembly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the N50 metric used for?

A

To measure contiguity in an assembly, indicating the length of contigs that make up half of the assembly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does BUSCO assess in genome assemblies?

A

Completeness by checking for a curated set of single-copy genes expected in a clade.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the purpose of a Hi-C library?

A

To capture chromosome conformation for scaffolding the genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a Hi-C map represent?

A

A visualization of 3D genome structure showing interaction frequencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are FastQ files used for?

A

To store raw sequencing data including sequences and quality scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the significance of a quality score of 30 in FastQ files?

A

Indicates a 90% chance that the base is correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a GFF file?

A

A plain text file that stores annotation information about gene locations in the genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the main advantage of Illumina sequencing?

A

High coverage and cost-effectiveness for reference-based assembly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does synteny refer to in genomics?

A

Linkage between genomic regions that can be conserved across species.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are inversions in the context of genome rearrangements?

A

When a gene on a chromosome inverts, potentially affecting phenotypes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is functional annotation?

A

Attaching biological functions to predicted gene products.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the purpose of the BLAST tool?

A

To compare DNA/Protein sequences against a database to find similarities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the three main categories of Gene Ontology (GO) terms?

A
  • Biological Process (BP)
  • Cellular Component (CC)
  • Molecular Function (MF)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the process of gene expression?

A

The process by which information encoded in a gene is turned into a function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is RNAseq used for?
To capture gene expression levels at a given moment in cells.
26
What is the first step in RNA extraction?
Start with tissue samples of an organism.
27
What is the purpose of library preparation in RNAseq?
To convert unstable mRNA into a more stable cDNA for sequencing.
28
What is differential gene expression analysis (DGE)?
Quantifying changes in gene expression levels between different conditions.
29
What does a heatmap of differentially expressed genes represent?
Expression levels of genes across various tissues or conditions.
30
What is the function of the motor protein in Oxford Nanopore sequencing?
Controls the amount of DNA passing through the nanopore during sequencing.
31
What are the two main types of translocations?
* Reciprocal translocation * Nonreciprocal translocation
32
What is the role of the eggNOG-mapper tool?
To predict gene functions based on pre-computed clusters and phylogenies.
33
What does the term 'high turnover rate' in RNA refer to?
The rapid degradation and synthesis of mRNA molecules based on environmental conditions.
34
What is the significance of the hierarchical structure of Gene Ontology?
It organizes gene functions into categories that describe their biological roles.
35
Fill in the blank: The _______ file is used to filter sequences of interest from a FASTA file.
GFF
36
True or False: The De Bruijn Graph is used for assembling long reads.
False
37
What is the purpose of using short and long reads in hybrid assembly?
To improve assembly quality when coverage in long read data is insufficient.
38
What kind of information can be gleaned from a Hi-C library?
Physical proximity of chromosomes or genes.
39
What is the impact of structural variations in genomes?
They can lead to significant changes in DNA fragments, affecting genome organization.
40
What are differentially expressed genes (DEGs)?
Genes that are either up- or down-regulated in comparison to a control situation or expectation.
41
In a heatmap of differentially expressed genes, what does the y-axis represent?
Individual genes.
42
In a heatmap of differentially expressed genes, what does the x-axis represent?
Different tissues, environments, health conditions.
43
What color represents high-expression/up-regulated genes in a heatmap?
Red.
44
What color represents low-expression/down-regulated genes in a heatmap?
Blue.
45
What is the purpose of a Gene Ontology (GO) term?
To provide a GO ID and show the relation to other terms in ontology.
46
What does KEGG stand for?
Kyoto Encyclopedia of Genes and Genomes.
47
What is the function of the KEGG PATHWAY database?
Maps representing experimental knowledge on metabolism and other cellular processes.
48
What does Functional Enrichment Analysis determine?
Which functions are over-represented in a subset of genes.
49
List the steps involved in Functional Enrichment Analysis.
* Define background set for comparison * Define list of genes of interest * Get GO annotations for all genes * Do enrichment analysis.
50
What are common sources of contamination in genomic studies?
* Cross-contamination between samples * Contaminated reagents * Environmental sources.
51
What is Horizontal Gene Transfer?
Transfer of genetic material between species, NOT from parent to offspring.
52
What are xenologs?
Genes that originate from horizontal gene transfer.
53
What is Codon Usage Bias?
The phenomenon where different species may use different codons for the same amino acid.
54
What are orthologs?
Genes between species that originated from a common ancestor by a speciation event.
55
What are paralogs?
Genes that originate from a duplication event.
56
What are ohnologs?
Paralogous genes that originated from a whole genome duplication event.
57
What are homologs?
Genes that originate from shared ancestry, including both orthologs and paralogs.
58
What is the significance of genome duplications?
They can lead to redundancy and pseudogenization, gene dosage effects, neo-functionalization, and sub-functionalization.
59
What are non-coding regions of the genome?
* tRNA / rRNA * Introns * Regulatory sequences * Repetitive sequences.
60
What are regulatory elements?
* UTRs (untranslated regions) * Promoters * Enhancers * Silencers.
61
What are transposable elements?
Elements that can move and change positions in the genome.
62
What are the two major classes of transposable elements?
* Retrotransposon (class 1) * DNA transposon (class 2).
63
What is structural annotation?
The process of mapping genes to the whole genome sequence using automated pipelines and databases.
64
What is a pangenome?
The full set of genomic elements in a species, representing all genetic diversity.
65
What are structural variants?
Copy number and chromosomal rearrangements (inversions, insertions, deletions, duplications) in large segments of DNA.
66
What is a monophyletic group?
A clade that unites all descendants of a common ancestor.
67
What is a paraphyletic group?
A group that includes an ancestor but not all of its descendants.
68
What is a polyphyletic group?
A group where taxa do not share an immediate common ancestor.
69
What are some challenges in phylogenomics?
* Amount of missing data * Gene tree discordance * Long-branch attraction. * Site heterogeneity. * Compositional heterogeneity. * Computational limitations.
70
What are epigenetic modifications?
Changes that affect gene expression without altering the DNA sequence.
71
What is DNA methylation?
A chemical modification to DNA that can inhibit gene transcription.
72
What is the purpose of bisulfite sequencing?
To identify methylated sites in a genome.
73
What do Blobplots represent?
Taxon-annotated GC-coverage plots used to filter assemblies from contamination.
74
What are Hi-C maps used for?
To illustrate the physical proximity of chromosomes and their connections within the cell.
75
What is the role of KEGG in genomics?
To provide a database for gene annotation and metabolic pathway information.
76
What do synteny plots compare?
Chromosome correspondence and syntenic equivalence across distantly related species.
77
What is indicated by a heat map with red and blue colors?
Red indicates high-expression (up-regulated) and blue indicates low-expression (down-regulated) genes.