Lecture 5 - Computational analysis Flashcards

(41 cards)

1
Q

It is now simple to measure the expression levels of thousands of genes

A

simultaneously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Methods such as RNA-seq allow for measurement of

A

transcriptome-wide expression levels without a reference genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

RNA-seq is useful for

A

high-throughput sequencing of RNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

RNA-seq allows for quantification of

A

gene expression and differential expression analyses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

RNA-seq allows for characterization of

A

alternative splicing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

de novo means

A

from the beginning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

de novo transcriptome assembly allows for

A

quantification and exploration of boutique organisms (no genome sequence necessary)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

RNA-seq steps

A
  1. Extraction of mRNA
  2. PCR amplification
  3. Sequencing (single or paired end)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Poly A selection is a method of

A

isolating Poly(A+) transcription usually using oligo-dT affinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ribodepletion depletes

A

ribosomal RNAs using sequence specific biotin-labeled probes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Reads

A

the sequenced portion of cDNA fragments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Coverage

A

read length, number of reads, or haploid genome length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Single-end

A

cDNA fragments are sequenced from only one end (1x100)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Paired-end

A

cDNA fragments are sequenced from both ends (2x100)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Strand-specific

A

You know whether the read originated from the + or - strand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Counts =

A

(Xi) the number of reads that align to a particular feature i (gene, isoform, miRNA, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Library size =

A

(N) number of reads sequenced

18
Q

FPKM =

A

Fragments per kilobase of exon per million mapped reads

19
Q

CPM =

A

Counts per million mapped reads

20
Q

FDR =

A

False discovery rate (the rate of Type I errors - false positives)

21
Q

FASTA files are

A

text files with sequences (amino acids or nucleotides)

22
Q

FASTQ files are

A

text files containing header, sequence, and quality information

23
Q

A SAM file is a

A

tab-delimited text file that contains sequence alignment information

24
Q

BAM files are

A

the binary version (compressed and indexed version) of SAM files (they’re smaller)

25
Compared to single-end RNA-seq, paired end gives
better alignment
26
Paired end RNA-seq is essential for
splicing analyses and de novo assemblies
27
Biological replicates are ______ while technical replicates are ______
necessary; not necessary
28
Longer reads =
better alignments
29
Implicit internal standards =
housekeeping genes
30
Explicit external standards =
spike in RNA
31
Technical replicates control for
variation in your procedure
32
Biological replicates control for
variation such as growth or environmental effects
33
Most gene expression experiments assume
1. Most genes don't change 2. Only a few genes have significant changes in expression
34
RNA and protein expression profiles _______ correlate well
do not always
35
Sequence alignment is a way of
arranging sequences of DNA, RNA, or protein to identify regions of similarity
36
Two types of sequence alignment
1. local 2. global
37
NGS read alignment allows us to
determine where sequence fragments (reads) came from
38
Differential expression analysis is
the assessment of differences in read counts of genes between two or more experimental conditions
39
Gene Ontology (GO) Consortium seeks to
provide consistent descriptions of gene products across databases
40
The GO is comprised of 3 structured ontologies that describe gene products in terms of associated
1. Biological processes 2. Cellular components 3. Molecular functions
41
Most commonly used databases for data deposition
Gene Expression Omnibus (GEO) Short Read Archives (SRA) dbGaP