4 RNA sequencing Flashcards
(29 cards)
Whag is the DOGMA
DNA to rna to protein
The directionality move toward making protiens
What is transcritomics and proteomics
How is expression of RNA and protien regulated
Transcriptimics: capturing all the RNA species from a population of cells all at once . Measuring the abundance of RNA species that are present in the cell.
Regulated:
- gene getting transcribed to mRNA
- mRNA stability (like the tags that cause their degredation or chemical modifications)
proteomics: capturing all the protien species from a population of cells all at once . Measuring the abundance of protien species that are present in the cell.
Regulated:
- translation of the mRNA to protien
- protien stability and PTM
What are all the types of RNA and why is this important
mRNA
TRNA
siRNA
MiRNA
important because you need to purify RNA using a specific method so you get the type of RNA you want
What is gene expression
Transcriptome
Proteome
Gene expression:
- the process by which information from a gene is used in synthesis of a functional gene product
Transcriptome:
- the set of all RNA molecules in one or a population of cells (mRNA TRNA rRNA ncRNA)
proteome:
- the set of all protien molecules in one or a population of cells
Explain differential expression using microarrays and Transcriptome sequencing
Trying to sequence RNA
Have treatment and control conditions, extract rna from both cells
Microarray:
- add fluorophore to ends of RNA that was extracted
- it’s a comparative analysis, ratiometric because your analyzing one type of probe to another to quantify whether something is up or down regulated in one cell type vs the other
Transcriptome seqeuncing:
- convert the extracted RNA fragments to cDNA
- genetically barcode the cDNA fragments and fix them to a flow cell
- then do next Gen DNA seqeuncing
- this is quantitative because it lets you do discrete quantification of RNA transcripts
- give actual number values whereas microarray is qualitative
How do you prepare RNA for DNA seqeuncing
Turing the rna into cDNA using RT PCR or RNA seq
How do you make cDNA for qRT-PCR
- Isolate the RNA
- Random priming: take small primers of 6 random nucleotides called hexamers, they prime randomly onto the RNA
- First strand synthesis: the hexamer is used as a primer for the murine leukima virus reverse transcriptase (using dNTPs) . Then reverse transcription Of the RNA to make the first strand of cDNA
- Second strand synthesis: RNases degrade the intial RNA strand, these rna pieces act as primers for the single stand cDNA. E. Coli DNA pol I then elongates and fills in to make dsDNA. T4 DNA ligase ligates the nicks
- Double stranded cDNA library made
Is RT PCR useful for RNA seq
No because doesn’t have a way to remove rRNA from the sample that cDNA is made from
What does purified bacterial total cellular RNA look like on a gel
What does this mean
Smears but Most intense bands are 23S rRNA 16S rRNA and 5s rRNA
This means the most abundant RNA in bacterial cells is rRNA
Same for eukaryotes
How does cDNA synthesis work for RNA seq
Same for qRT-PCR but before hexamer priming you do rRNA depletion
This is because if making cDNA and 95% of sample is rRNA , most of the cDNA made is coming from rRNA which you don’t need that much and your just wasting sequencing power on rRNA
How is RNA capture and enrichment done
To get rRNA or mRNA from cells (mainly to remove rRNA)
TEX (terminator exonuclease):
- degrades 5’ phos RNA in the cell, these RNA is usually rRNA, so it’s a way to remove rRNA
RNA immunoprecipitation sequencing (RIP-seq):
- capturing RNA that interacts with a specific protien
- so using a tag to bind the protien and also pull down the target RNA
rRNA capture:
- magnetic beads with specific probe seqeunces bind RNA sequences you want to capture
- can bind rRNA so you can capture them with a magnetic and separate them from mRNA
Selective poly adenylation of mRNA:
- the e.coli poly A polymerase enzyme selectively poly adenylates mRNA
- then that mRNA can get captured by oligo dT primers or reverse transcribed using reverse dT primers
What is not so random priming
Use the hexamers to prime the rna BUT
Predict which of the hexamers in the random mix actually bind to rRNA, and remove them:
- so design the primers so that the ones that bind rRNA are gone and don’t get turned to cDNA
- depletes 50-80% of the rRNA in the sample
Highly biased
How are rna seq libraries made
Same as DNA seq libraries but using cDNA from RNA instead of DNA
How do we interpret data from RNA seq
U mapping
Cluster analysis
Whag is the workflow of bioinformatics and statistical analysis of rna seq
Ex. Getting seqeunce read from an illumina seqeuncer
- Base calling: computer processing the image raw data and and make base calls based on the fluorescence patterns
- The base calling give you short read seqeunces
- These short read sequences are mapped/traced back to a reference genome
- Use algorithm to bin those reads to specific genomic intervals that give integer counts of how many reads align to that region of the genome
- This gives uniquely mapped reads, , multiple mapped reads and unmapped reads (which you get rid of to calibrate the instrument of its contaminants)
34:43
What is the simple concept of rna seq analysis
The algorithm maps reads to positions in the genome and then counts the number of reads that map to specific loci
This can be done manually for individual loci, but not for entire genomes (too much data)
Then you can estimate changes in transcript abundance mathematically
Computers do algorithms to make this easier
Explain how to analyze the results from RNA seq analysis
In a gene you have genomic intervals of open reading frames and intergenic regions
Read is assigned to the plus strand or minus strand of the RNA, diff colour for plus or minus strand
If a strand read overlaps with the orf is counted as 1 count to that orf : so ex 4 plus and 1 minus strand to the orf
If 5’ end overlap to a region it’s assigned to that interval
Explain the differential expression analysis
Explain why changes would be seen
Ex. Comparing condition A and B genes:
orfB: both cond A and B have 3 plus, 3/3=1 no change in fold change in expression
OrfA: sense: 4/4 no change antisense: 8/1=8
igBC: cond A has 2 and B has 16, 16/2=8, 8 fold change in expression
Infer:
- locus 1: have sense and antisense in one orf, this means it could stop gene expression in this region because sense and antisense have complimentary pairing there
- locus 2 cond b: if no overlap in the intergenic region and the orf can assume that intergenic region has mainly non coding RNA
What is normalization RPKM
Why is it useful
Reads per kilo base per million mapped reads
RPKM: quantifies the gene expression by normalizing for genic (or intergenic) length on a human friendly scale
The key issue is that the read counting is biased towards longer genes (longer genes are over represented) so this normalizes the length
Scales the reads to numbers that are easier to use
What can RPKM be used for and how
Can be used to asses reproducibility of RNA sequencing
Is does a goodness of fit measurment between repeats of the same sequence library just in diff flow cells
Then is give a r squared value (coefficient of determination) to see how close the reads are
How do you carry out a differential expression analysis using rna seq data
What are the key issues of analysis of gene expression
Stats using R:
- most common tools used are edgeR and DESeq
Analysis of gene expression requires:
- an accurate statistical model for variance (gene expression follows a negative binomial distribution):
- since not normal distribution can’t just do a bunch of t tests to compared diff genomic intervals and see what differentially expressed
- a way to correct for false discovery
- a calculation that’s aware of count depth and normalized between replicates
What are false discovery/false negative in analysis of gene expression
Most common error with Arna seq data is false negatives (type II error)
Output of the R program to find change in gene expression between region of the dna give false negatives
This is type II error when it says their not diff regulated but they are (type I error is false positives)
Ex. Filtering for a false discovery rate of 0.05,
In genes that are part of the same operon and regulated by the same transcription factor, it says the first three are differently expresses from the cond b but the last two aren’t
You know this wrong because if part of the same operon they should all be acting the way and all be differentially expressed
Also Degredation could lead to high variance but the false negatives remove that variance and say theyre regulated that same
What is special about DE-Seq DATA ANALYSIS
it’s depth aware and aware to variance
Can make volcano plots:
- below a certain threshold of mean of normalized counts (so when the mean of normalized counts is low) the algorithm won’t make a call a about anything being differentially expressed or not
This is because of the depth of the counts: just due to random sampling it’s unlike that what your seeing at those lower values is actually true (depth aware)
The high depth (how much sample you have), more confidence in saying it’s differentially expressed
But if depth too high this increases false discovery rate (type I error false positive)
What else can you capture from RNA seq other than the read counts
The strandedness of the libraries