Exam Flashcards

(94 cards)

1
Q

What is the primary goal of functional genomics?

A

To describe gene/protein functions and interactions, focusing on dynamic aspects like transcription and protein-protein interactions, rather than static aspects like DNA sequence. Functional genomics uses data from genomic projects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between genomics and functional genomics?

A

Genomics studies the structure of genomes (DNA sequence), while functional genomics focuses on the dynamic aspects such as gene transcription, translation and protein-protein interactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the steps in a typical genome sequencing project?

A

Isolate chromosomes.
Fragment DNA into smaller pieces.
Sequence the fragments (reads).
Assemble the reads to reconstruct the genome.
Annotate the genome, identifying genes and other elements.
Validate annotation results using BLASTp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the purpose of a gene finder?

A

To identify and label different features in the genomic sequence, such as coding regions (genes), regulatory elements, and other functional elements. Gene finders are more sophisticated than ORF finders. They can detect elements like the TATA box. Examples include GeneMark and Augustus.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between an ORF and a gene?

A

A gene has a scientifically demonstrated associated function, while an ORF (Open Reading Frame) is a DNA sequence that might encode for a protein but does not have a proven function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two main approaches to genome assembly?

A

Guided Assembly: Uses an existing reference genome to map reads.
De Novo Assembly: Constructs the genome solely from sequencing data when no reference genome is available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is hybrid assembly?

A

Hybrid assembly combines long and short reads to create a more accurate and complete genome representation. It uses data from different sequencing technologies like PacBio (long reads) and Illumina (short reads).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is paired-end sequencing useful?

A

Paired-end sequencing sequences both ends of a DNA fragment, providing mate reads. This increases read length and improves alignment, especially in repetitive regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between vertical and horizontal coverage?

A

Vertical coverage is the number of reads that align to a given position of the genome. High vertical coverage is important for comparing strains.
Horizontal coverage refers to how much of the entire genome has been sequenced. High horizontal coverage is preferred when sequencing an entire genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the main advantage of PacBio sequencing?

A

PacBio produces long reads, which facilitates de novo assembly, and provides good horizontal coverage of the genome. Also, PacBio does not require amplification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is Illumina sequencing often used in combination with PacBio?

A

Illumina provides high accuracy and high vertical coverage, while PacBio provides long reads and high horizontal coverage. Combining the two can improve genome assemblies, especially for small genomes. Illumina uses bridge PCR for amplification and reversible termination sequencing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a contig?

A

A contig is a contiguous sequence of DNA, assembled from overlapping reads. Contigs represent parts of a genome sequence that are assembled but might not be in their correct order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can duplicated regions in a genome affect assembly?

A

Duplicated regions in the genome can cause a spike in the vertical coverage and may cause mis-assembly, leading to a higher than expected number of contigs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the purpose of karyotyping?

A

Karyotyping allows researchers to determine the number and size of chromosomes in an organism. It can also be used to determine the genome size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the purpose of BLASTp?

A

BLASTp is used to validate gene models by comparing predicted protein sequences to known protein sequences in databases. The result with the highest similarity for the highest coverage is usually selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the goal of comparative genomics?

A

Comparative genomics compares the genomes of different species to study evolutionary processes and genetic diversity. It identifies variations in chromosome number and size, and large-scale genomic modifications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can comparative genomics be used to study phenotypes?

A

By comparing genomes of strains that differ in a particular phenotype, comparative genomics can be used to identify genetic changes responsible for adaptive traits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the key steps in transcriptomics analysis?

A
  1. Define experimental conditions.
  2. Extract RNA from cells.
  3. mRNA enrichment or RNA Depletion (only in eukaryote) - mRNA enrichment is done through oligo(dT) to enrich poly(A) tails; RNA depletion is done through removing rRNA with Rebo-Zero
  4. Convert RNA to cDNA.
  5. Sequence the cDNA.
  6. Quantify gene expression by comparing read coverage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the difference between general and specific transcription factors?

A

General TFs form the basic machinery needed for transcription.
Specific TFs regulate the expression of individual genes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a microarray and how is it used?

A

A microarray contains immobilized DNA probes designed to bind specific transcripts. Labelled cDNA samples are hybridized to the probes and the signal intensity corresponds to the quantity of transcript present. It is used to quantify gene expression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the controls used in microarray experiments?

A

Positive Controls: Samples with known concentrations of the target to validate the results.
Negative Controls: Samples without DNA/RNA or non-specific sequences to determine background noise and non-specific bindings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the proteome?

A

The entire set of proteins expressed by an organism, tissue, cell, or organelle at a given time under defined conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the steps in a typical expression proteomics experiment?

A

Select a biological model and question.
Define experimental conditions.
Collect samples and obtain protein extracts.
Separate proteins using 2D gel electrophoresis.
Visualise the proteins.
Quantify protein abundance using softwares.
Identify proteins through mass spectrometry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the purpose of trypsin in mass spectrometry?

A

Trypsin cleaves proteins at specific amino acid residues, generating peptides of different sizes that can be analyzed by mass spectrometry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is peptide mass fingerprinting (PMF)?
PMF identifies proteins by comparing the mass/charge ratio of trypsin-digested peptides to databases. PMF relies on having an annotated genome.
26
What is tandem mass spectrometry (MS/MS)?
MS/MS involves two rounds of mass spectrometry, separated by a collision-induced fragmentation step. This provides better distinction between peptides and helps identify unique metabolites.
27
What is DIGE (Differential Gel Electrophoresis)?
DIGE labels proteins from different samples with distinct fluorescent dyes, allowing multiple samples to be run on the same gel. This reduces gel-to-gel variation.
28
How can post-translational modifications (PTMs) be studied in proteomics?
PTMs can be studied through mass spectrometry (detects mass shifts), and specific staining techniques for: Phosphorylation (Pro-Q diamond staining). Carbonylation (DNP staining). Thiol groups (radiolabelling or chemical labelling with maleimides). Ubiquitination (antibodies against ubiquitin).
29
What are ICAT, SILAC, and iTRAQ?
These are labelling methods used in quantitative proteomics for peptide quantification by introducing mass differences. ICAT uses isotope-coded affinity tags to label cysteine residues and quantify the proteins. SILAC uses stable isotopes to label amino acids during cell culture and perform quantification during the first step of MS. iTRAQ uses isobaric tags to label peptides and allows for protein quantification through tandem MS during the second step.
30
What is metabolomics?
Metabolomics is the study of small molecules (metabolites) within cells, tissues, or organisms.
31
What are the two main approaches to metabolomics analysis?
Metabolomic footprinting: Analysis of the exometabolome (extracellular metabolites). Metabolomic fingerprinting: Analysis of the endometabolome (intracellular metabolites).
32
What is the difference between NMR and MS in metabolomics?
MS is more sensitive and identifies a higher number of metabolites, better for simple metabolisms. NMR is highly quantitative and reproducible and is better suited for complex systems or when you need information on novel metabolites.
33
What are PCA and PLSDA?
PCA (Principal Component Analysis): An unsupervised method used to reduce the dimensionality of a dataset and check for the relationship between replicates. It is also used for diagnostics by comparing the metabolomics profile of healthy vs disease cells. PLSDA (Partial Least Squares Discriminant Analysis): A supervised method to find the optimal separation between sample groups, by selecting the variables that are most relevant. It is used to identify diagnostic markers, but not for checking for a relationship between replicates.
34
What are the two main scientific approaches?
Hypothesis testing approach: Testing a proposed hypothesis. Hypothesis-generating approach: Results are used to generate hypotheses. Omics approaches use this approach.
35
What is a disruptome analysis?
A disruptome analysis is used to find genes that are required for a particular phenotype. It uses a collection of mutant cell lines where each gene is individually deleted. It compares the growth of mutants vs wild-type under specific conditions.
36
What is a two-hybrid system?
Detect protein-protein or protein-DNA interactions by reconstituting a functional transcription factor when two interacting proteins bring together its separate domains in a yeast or bacterial cell.
37
What is co-immunoprecipitation (co-IP)?
Detect protein-protein interactions by using an antibody to pull down a target protein along with its interacting partners from a cell lysate.
38
What is a proteome chip?
A proteome chip has the proteome immobilized on a chip and is used to study protein-protein, protein-nucleic acid, and protein-lipid interactions. The protein of interest is labelled with a fluorophore and hybridized with the chip.
39
What is a gene?
A gene has a scientifically demonstrated associated function. It's not just a coding sequence (like an ORF) but a functional biological unit.
40
41
Explain the difference between genomics and functional genomics.
Genomics focuses on the study of the entire genome, including coding and non-coding regions, whereas functional genomics focuses on the dynamic aspects of gene function, such as transcription, translation, and protein interactions. Functional genomics aims to use the data produced by genomic projects to understand gene/protein functions and interactions.
42
Describe the steps involved in a typical RNA-seq experiment, and why mRNA enrichment is a crucial step.
1. Cultivate Treated versus Untreated (Control): 2. Lyse Cells and Extract Total RNA: 3. mRNA Enrichment or rRNA Depletion (in eukaryote) 4. Convert RNA into cDNA through Reverse Transcription 5. Sequence cDNA with Illumina (Paired-End): 6. Map Reads to the Reference Genome and Analyze Vertical Coverage. 7. Normalize the Reads and Quantify Gene Expression - Through read coubting identify which genes are upregulated 8. Annotate Uncharacterized Transcripts (gene finders)
43
Explain the difference between de novo assembly and guided assembly in the context of genome sequencing.
De novo assembly involves assembling a genome without a reference sequence, using overlapping reads to form contigs. Guided assembly, on the other hand, uses a reference genome to order and orient the contigs, often improving the contiguity and accuracy of the final assembly.
44
What is the role of contigs in genome assembly, and what does it mean if a genome is considered 'incomplete'?
Contigs are assembled sequences of DNA, formed by joining overlapping reads. An incomplete genome means that the entire genomic sequence of an organism has not been fully characterised or assembled. This can be due to technical limitations, gaps in sequencing or the complexity of genomic regions.
45
What is the purpose of using paired-end sequencing, and how does it help in genome assembly?
Paired-end sequencing involves sequencing both ends of a DNA fragment. This provides information about the relative orientation and distance between the reads, which helps to map reads more accurately and improves the contiguity of genome assemblies.
46
Compare and contrast Illumina and PacBio sequencing technologies in terms of read length, accuracy, and applications.
Illumina: short reads (500-600 bp), high accuracy, and high throughput, used for high vertical coverage. PacBio: long reads (10-60 kbp), lower accuracy, used for better horizontal coverage and de novo assembly of large or complex genomes. PacBio reads also do not require library amplification. They are often used together in a hybrid assembly approach to generate high-quality reference genomes.
47
Describe the principle behind peptide mass fingerprinting (PMF) in proteomics.
PMF involves digesting a protein with trypsin, which cleaves proteins at specific amino acid residues, generating a set of peptides of different sizes. The resulting peptides are analysed by mass spectrometry, which determines their mass-to-charge ratios. The resulting peptide masses are compared to a database of known protein sequences to identify the protein.
48
What is the main advantage of using tandem mass spectrometry (MS/MS) compared to single-stage mass spectrometry in proteomics?
Tandem MS (MS/MS) involves two rounds of mass spectrometry, separated by a step of collision-induced fragmentation of the peptides. This fragmentation provides additional information about the amino acid sequences of the peptides, allowing for more accurate protein identification, especially in cases where multiple peptides have similar masses.
49
Explain how SILAC and iTRAQ are used in quantitative proteomics.
SILAC: uses stable isotopes to label proteins. Protein quantification is performed in the first step of MS based on the mass differences introduced by the stable isotopes. iTRAQ: uses isobaric tags, and quantification is achieved in the second step of tandem MS when the reporter group of the tags is released.
50
What is the difference between metabolomic fingerprinting and metabolomic footprinting?
Metabolomic fingerprinting refers to the analysis of the endometabolome (intracellular metabolites). Metabolomic footprinting refers to the analysis of the exometabolome (extracellular metabolites).
51
How does Nuclear Magnetic Resonance (NMR) differ from Mass Spectrometry (MS) in metabolomics, and when would you use one over the other?
MS: more sensitive, good for identifying a higher number of metabolites, and used for organisms with simple metabolism. NMR: has lower sensitivity but provides more structural information, can be used for novel or unusual metabolites, especially from complex systems such as plants, and is not affected by metabolite pKa or hydrophobicity.
52
How can PCA (Principal Component Analysis) be used in the analysis of omics data?
PCA is an unsupervised multivariate statistical analysis tool that is commonly used to evaluate the quality of replicate experiments, by showing how similar the replicates are. It can also be used to identify clusters of samples based on their overall omics profiles.
53
What is a disruptome and how can it be used in functional genomics?
A disruptome is a library of single-deletion mutants for every gene in the genome of an organism. It is used to study the impact of each gene's deletion on the phenotype of the organism, under different conditions, to understand gene function.
54
Explain the concept of synthetic lethality in the context of Synthetic Genetic Array (SGA) analysis.
Synthetic lethality is a type of genetic interaction where the co-occurrence of two genetic events results in cellular death. SGA makes use of synthetic lethality to look for phenotypes in a specific gene by systematically creating double mutants to study gene interactions.
55
How does the two-hybrid system work to identify protein-protein interactions?
The two-hybrid system involves fusing proteins of interest with either a DNA-binding domain (BD) or activation domain (AD) of a transcription factor. If the hybrid proteins interact, the BD and AD are brought together, activating transcription of a reporter gene, thus indicating a protein-protein interaction.
56
Describe the steps involved in using a proteome chip to identify protein-protein interactions.
A proteome chip involves immobilizing a proteome on a chip. Then, the protein whose interaction you want to test is labelled with a fluorophore and hybridized with the chip. If the labelled protein interacts with an immobilized protein, a fluorescent signal will be emitted.
57
What are the key differences between vertical coverage and horizontal coverage in sequencing data?
Vertical coverage refers to the number of aligned reads at a specific position in the sequence, which indicates the confidence in the correctness of that nucleotide. Horizontal coverage refers to the percentage of the genome that was sequenced.
58
What is a non-synonymous SNP and how might it impact a protein?
A non-synonymous SNP is a single nucleotide polymorphism that results in a change in the amino acid sequence of a protein. This can alter protein function, stability, or interaction with other molecules. A higher number of non-synonymous SNPs in a gene may be indicative of adaptive changes.
59
How can RNA-seq data help determine if environmental isolates of C. glabrata are more tolerant to bitertanol?
Identify up-regulated genes in C. glabrata (CBS138) after bitertanol exposure using RNA-seq data. Compare expression levels of these genes in environmental isolates to CBS138. If isolates over-express these genes, it may explain their higher tolerance. ## Footnote You could also analyze SNPs in environmental isolates to see if changes coincide with bitertanol tolerant strains.
60
What biological material is needed to identify all genes essential for L. fermentum survival in the GI tract?
A disruptome, which is a library of single-deletion mutants for every gene in the L. fermentum genome. ## Footnote The experimental approach involves comparing growth of wild-type and mutant strains under GI tract-like conditions.
61
How does Pulsed-Field Gel Electrophoresis (PFGE) differ from traditional gel electrophoresis?
PFGE uses a multidirectional electric field that changes at intervals, improving resolution for large DNA fragments. Traditional gel electrophoresis is ineffective for large fragments. ## Footnote PFGE is particularly useful for studying prokaryotic chromosomes and plasmids.
62
What experiment would verify the claim of high 'horizontal coverage' in sequencing data for Rhodotorula toruloides?
Authors likely compared the sum of contig sizes after assembly with the 'real' genome size determined by karyotyping using PFGE. ## Footnote Horizontal coverage is the percentage of the genome sequenced, while vertical coverage measures read alignment at specific positions.
63
How would you use ChIP to identify targets of ScHSFA4a in heat-shocked jojoba leaves?
Cross-link proteins to DNA, lyse cells, fragment DNA, and use an antibody specific to ScHSFA4a to immunoprecipitate the protein and bound DNA. Remove proteins, reverse cross-links, and purify DNA for sequencing to identify binding sites. ## Footnote This identifies direct target genes of ScHSFA4a.
64
Why use both metabolomic footprinting and fingerprinting in leukocyte response to chemokines?
Footprinting studies extracellular metabolites, revealing the impact of chemokines on the extracellular environment. Fingerprinting studies intracellular metabolites, showing the effects of chemokine signaling within cells. ## Footnote Both approaches provide a comprehensive view of metabolic changes induced by chemokines.
65
What is the key difference between hypothesis-driven and hypothesis-generating approaches in metabolomics?
Hypothesis-driven starts with a pre-defined hypothesis tested experimentally, while hypothesis-generating begins without a prior hypothesis, generating one after analysis. ## Footnote PCA is unsupervised (hypothesis-generating) and PLSDA is supervised (hypothesis-driven).
66
Can high vertical coverage in RNA-seq data immediately indicate gene overexpression?
No, high vertical coverage does not guarantee overexpression. Other factors like gene size and total read normalization can influence coverage. ## Footnote Ensure coverage comparability between patient and control samples.
67
Why is hybrid assembly preferable for complex eukaryotic genomes?
Hybrid assembly combines short-read accuracy with long-read ability to span repetitive regions, improving de novo assembly accuracy and reducing errors. ## Footnote This approach is particularly useful for larger, complex genomes.
68
What are potential sources of false positives and negatives in a two-hybrid system?
False positives can arise from indirect interactions via a third protein; false negatives may occur due to steric hindrance or inability of Gal4 to reach the nucleus. ## Footnote Mitigate issues by confirming interactions with techniques like co-immunoprecipitation or using split-ubiquitin for membrane proteins.
69
What is the primary goal of a shotgun metagenomics project?
To assess the diversity of a microbial community including viruses, prokaryotes, and microeukaryotes in a habitat. It can also be used to study the functional genes of a microbial community
70
Why are shotgun metagenomic reads assembled into contigs?
To enable the reconstruction of genomes from metagenomes and the study of larger, secondary metabolite biosynthesis gene clusters
71
What are MAGs (Metagenome-Assembled Genomes) and what do they provide?
MAGs are genomes assembled from metagenomic data, and they link function and taxonomy, provide insights into unculturable organisms, and typically have higher completeness scores than genomes of isolates
72
What is the function of RNA Predator?
NA Predator is a bioinformatics tool that scans prokaryotic genomes to predict sRNA-mRNA interactions based on the minimization of the energy of interaction and the occurrence of a SEED region
73
What is coverage in the context of sequencing?
overage refers to the amount of reads that is vertically aligned in a given place of the genome/contig. It relates to the number of times a particular base in a DNA sequence is read during the sequencing process
74
What is reversible termination sequencing?
It is a sequencing step used in Illumina that uses nucleotides blocked in the OH group. The blocking group can be removed, allowing the binding of a subsequent nucleotide during the PCR reaction
75
What is paired-end sequencing and what does it enable?
Paired-end sequencing uses two terminal adapters in the 5' and 3' ends of DNA fragments, allowing sequencing from both ends. It helps improve the assembly step by effectively increasing the read length
76
What is guided assembly?
It is the organization of reads into contigs using a previously assembled genome as a reference
77
What is the significance of non-synonymous SNPs in genomic studies?
Non-synonymous SNPs change the chemical properties of the encoded amino acids, and are therefore likely to have a direct impact on protein function. Genes accumulating a higher number of non-synonymous SNPs are often the focus of study
78
Describe the experimental setup for studying the adaptive genomic responses of Candida glabrata strains isolated from different environments, including the bioinformatics steps involved.
This involves sequencing the genomes of C. glabrata isolates from wine musts and garden flowers, comparing them to the reference strain CBS138 (from a human GI tract). ◦ DNA Extraction and Sequencing: Extract DNA from the isolates and use a combination of Illumina (for high accuracy) and PacBio (for long reads). ◦ SNP Identification: Map the reads against the reference genome (CBS138) to identify single nucleotide polymorphisms (SNPs), paying attention to coverage, frequency, and balance. Focus on genes with a higher number of non-synonymous SNPs. ◦ Large-Scale Modifications: Assemble the reads into contigs and compare them to the reference strain contigs using BLAST to identify any large-scale genomic alterations
79
Explain how transcriptomic analysis can help in understanding the bitertanol tolerance in Candida glabrata.
Conduct a transcriptomic analysis of the C. glabrata CBS138 strain by growing it in the presence and absence of bitertanol. ◦ Experimental Conditions: Grow the CBS138 cells with and without bitertanol. ◦ Transcriptomics Platform: Use RNA-seq to quantify gene expression. ◦ Gene Expression Analysis: Compare the coverage of reads for each gene in the two conditions. Focus on up-regulated genes in the presence of bitertanol, as these may be essential for tolerance. This can lead to the hypothesis that environmental isolates are more tolerant to bitertanol due to naturally overexpressing these genes compared to the CBS138 strain
80
How can you identify all the genes that are determinants of resistance to chemokines using a disruptome analysis?
A full collection of chicken cell lines, in which each gene is individually deleted, must be available. ◦ Compare growth of all mutant cell lines to the wild type cell line under control conditions and with chemokine exposure. ◦ Look for cell lines that do not grow under chemokine stress when compared to the wild-type strain. ◦ The absent gene in that cell line is a determinant of chemokine resistance. ◦ This experiment is almost impossible due to the difficulty of mammalian cell manipulation
81
Explain how to use a combination of Illumina and PacBio sequencing to improve a genome assembly.
Illumina provides high vertical coverage for accurate base calling, but short reads can lead to assembly gaps in repetitive regions. PacBio provides long reads that span repetitive regions, improving horizontal coverage and facilitating contig assembly. Combining these techniques provides both accuracy and completeness in genome assembly. Two rounds of Illumina may be performed to increase vertical coverage
82
What are the key steps in a typical RNA-seq experiment, and how is gene expression quantified?
RNA-seq involves the following key steps: ◦ RNA Extraction: Isolate total RNA, followed by enrichment for mRNA to exclude rRNAs. ◦ cDNA Conversion: Convert the mRNA into cDNA. ◦ Sequencing: Perform next-generation sequencing to generate reads. ◦ Read Mapping: Align the reads to a reference genome or transcriptome. ◦ Quantification: Quantify gene expression by comparing the coverage of reads for each gene across different experimental conditions. Increased coverage corresponds to greater mRNA abundance
83
Describe how you would identify the molecular mechanisms underlying different phenotypes towards inhibitors of an evolved Rhodotorula toruloides strain compared to its non-evolved parent strain.
This involves both genomic and transcriptomic analyses. ◦ Genomic Analysis: Sequence the genomes of both strains using Illumina sequencing. Map the reads of the evolved strain against the non-evolved strain to detect SNPs, paying attention to coverage, frequency, and balance. Focus on genes accumulating a higher number of non-synonymous SNPs. ◦ Transcriptomic Analysis: Grow both strains in the presence and absence of inhibitors. Perform RNA-seq to quantify gene expression. Focus on genes that are over-expressed in the evolved strain compared to the non-evolved strain in the presence of the inhibitors.
84
How can you use a disruptome to identify genes involved in a specific phenotype?
Compare the growth of wild-type cells with single-gene deletion mutant cells in control and test conditions. Identify genes required for a specific phenotype by analyzing the survival rates of disruptome and wild-type cells when exposed to test conditions. In a competition assay, the strains are grown together and the most fit strains will be more abundant. Each deletion mutant strain is uniquely tagged, allowing for their simultaneous screening and quantification.
85
What is the difference between a hypothesis-generating approach and a hypothesis-testing approach in omics studies?
Hypothesis-Generating Approach: This approach is used in omics studies, such as genomics, transcriptomics, proteomics, and metabolomics, where the goal is to explore changes in the phenotype without any prior assumptions. Hypotheses are formulated after seeing the results. ◦ Hypothesis-Testing Approach: Used in target metabolomic analysis, this approach relies on a pre-existing hypothesis, and experiments are designed to validate or refute it
86
Explain the principles behind the two-hybrid system for studying protein-protein interactions. What are the limitations of this method?
Principle: Two proteins of interest are fused with either a DNA-binding domain (BD) or activation domain (AD) of a transcription factor. If the two proteins interact, the BD and AD are brought together, activating a reporter gene. The reporter gene is often lacZ, which produces a blue color when activated. ◦ Limitations: False positives can occur if proteins interact indirectly through a third protein. False negatives can occur if the domains are sterically hindered or cannot reach the nucleus. It is an in vivo technique and is therefore dependent on environmental conditions. The technique is limited to proteins that can move to the nucleus, and cannot be used for membrane proteins.
87
A researcher is studying Lactobacillus fermentum and wants to understand its global response to a gastric environment. They choose to perform transcriptomics and proteomics analyses. Describe the best experimental approach for both transcriptomics and proteomics, including how to quantify transcripts and proteins in each case.
Transcriptomics: ▪ Grow L. fermentum in media mimicking the gastric environment and in a control environment. ▪ Extract RNA, perform mRNA enrichment to exclude rRNAs. ▪ Convert mRNA to cDNA and perform RNA-seq. ▪ Quantify gene expression by comparing the coverage of reads for each gene in the two conditions. ◦ Proteomics: ▪ Grow L. fermentum in media mimicking the gastric environment and in a control environment. ▪ Extract proteins, lyse cells, and use detergents and reducing agents. ▪ For a gel-free approach, use methods like SILAC or iTRAQ to label the proteins. ▪ Use tandem mass spectrometry (MS/MS). For SILAC, protein quantification is achieved in the first step of MS/MS. For iTRAQ, it is achieved in the second step of MS/MS.
88
How would you identify all the genes required for the persistence of L. fermentum in the GI tract of a mouse model, including the necessary biological material and experimental approach
Biological Material: Obtain a disruptome library of L. fermentum mutants, where each gene is individually deleted. ◦ Experimental Approach: ▪ Inoculate mice with both the wild-type and mutant strains of L. fermentum. ▪ Compare the abundance of each mutant in the GI tract of the mouse model at different time points. ▪ Identify genes essential for GI persistence by looking for mutants that are significantly reduced in abundance over time. ▪ Use a barcoded strategy to allow for simultaneous screening and quantification of the mutants
89
A researcher is studying an evolved Rhodotorula toruloides strain (MM15) with increased tolerance to inhibitors compared to the parental strain (IST536). Describe an experimental methodology to identify the molecular mechanisms underlying these different phenotypes, including how to identify the most relevant genes involve
Genomic Analysis: ▪ Sequence the genomes of both strains using Illumina. ▪ Map reads of the evolved strain against the reference genome of the parental strain. ▪ Identify SNPs focusing on coverage, frequency, and balance. ▪ Focus on genes with a high number of non-synonymous SNPs or genes known to be involved in inhibitor tolerance. ◦ Transcriptomic Analysis: ▪ Grow both strains with and without inhibitors. ▪ Perform RNA-seq to quantify gene expression. ▪ Compare gene expression levels between the strains in both conditions. ▪ Identify genes that are over-expressed in the evolved strain when exposed to the inhibitors
90
You are planning to sequence the genome of a new chicken strain to facilitate mapping of RNA-seq reads. Which sequencing techniques would you use and why? Describe the amplification and sequencing steps for each chosen technique.
Techniques: ▪ Illumina: For high accuracy and high vertical coverage. ▪ PacBio: For long reads, facilitating contig assembly into a low number of contigs, and better horizontal coverage. ◦ Amplification and Sequencing: ▪ Illumina: * Amplification: Bridge PCR on a glass slide. * Sequencing: Reversible termination sequencing. ▪ PacBio: * Amplification: No PCR amplification required. * Sequencing: Single Molecule, Real-Time (SMRT) sequencing using DNA polymerase within nanowells.
91
Explain the concept of "coverage" in the context of genome sequencing and why both horizontal and vertical coverage are important
Coverage: Refers to the number of times a specific nucleotide in a genome is sequenced. ◦ Horizontal Coverage: Represents the fraction of the entire genome that is covered by sequencing reads. This is improved by long reads like those produced by PacBio. Higher horizontal coverage means more of the genome is sequenced, reducing gaps in the assembly. ◦ Vertical Coverage: Represents the number of reads aligned to a specific position in the genome. This is improved by short reads like those produced by Illumina. Higher vertical coverage increases the accuracy of the sequence data and the reliability of SNP identification. ◦ Importance: ▪ High vertical coverage is needed when comparing different strains to identify SNPs. ▪ High horizontal coverage is needed for assembling a complete genome because long reads can span repetitive regions.
92
A researcher wants to study protein-protein interactions with the chemokine LOC100857191. How would they approach this, and what are the limitations of a proteome chip in this case
Approach: ▪ Use a proteome chip, immobilizing the full proteome of leukocyte cells on the chip. ▪ Label the chemokine LOC100857191 with a fluorophore. ▪ Hybridize the labelled chemokine with the proteome chip. ▪ Look for interactions by detecting a fluorescent signal. ◦ Limitations of the Proteome Chip: ▪ The protein of interest, in this case the chemokine, is labelled, not the proteome. ▪ Requires expression and purification of all proteins to be immobilized on the chip. ▪ Can produce false positives because some proteins may not interact in vivo. ▪ Can miss interactions that occur under the specific cells and conditions of interest.
93
You have performed a transcriptomic analysis and observed that in a specific region of a gene (region B) there are no reads mapped. Explain what this indicates and why assembly of the reads is needed
The absence of reads mapped in region B of the gene indicates this region is likely an intron. Introns are removed during RNA processing and are not present in mature mRNA transcripts, which are the target of RNA-seq. ◦ Assembly of the reads is crucial to understand which exons are present in the transcripts of the gene and to identify splicing variations such as mis-splicing of introns/exons, which can be associated with diseases like Duchenne muscular dystrophy
94
You are analyzing the metabolome of cells exposed to a toxic compound and need to choose between metabolomic footprinting and metabolomic fingerprinting. Which would you prefer, and why?
Metabolomic Fingerprinting: This refers to the analysis of the endometabolome (intracellular metabolites). It's vital for assessing specific metabolic features of cells, as it provides information about the cell's internal metabolic state. It should be chosen when analysing biopsy samples. ◦ Metabolomic Footprinting: This refers to the analysis of the exometabolome (extracellular metabolites). It's suitable for understanding the impact of extracellular factors (such as chemokines) on the extracellular environment and cell-to-cell communication. It would not be easy to obtain realistic extracellular environment when evaluating biopsy results. ◦ Preference: In this case, metabolomic fingerprinting would be preferred because it would be more useful for assessing the specific metabolic features of cells exposed to a toxic compound. However, using both approaches may be valuable for a broader understanding