Lecture 15 Flashcards

(58 cards)

1
Q

genome

A

the complete set of genetic material present in a cell or organism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

genomics

A

the cloning and molecular characterization of entire genomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

a haplotype

A

The specific set of SNPs and other genetic variants observed on a single chromosome or part of a chromosome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

linkage disequilibrium

A

The nonrandom association between genetic variants within a haplotype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

tag-SNPs

A

The few SNPs used to identify a haplotype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Genome-wide association studies use

A

numerous SNPs scattered across the genome to find genes of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

annotated (gene) which means

A

linking its sequence information to other information about its function and expression, the protein it encodes, and similar genes in other species.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Metagenomics is an emerging field in which

A

the genome sequences of an entire group of organisms that inhabit a common environment are sampled and determined.(eDNA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Synthetic biology seeks to

A

design organisms that might provide useful functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Functional genomics

A

characterizes what sequences do—their function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Genome content consist of

A

much more than just protein-coding genes

Intergenic sequences. → “non-coding” DNA

Repetitive sequences → short and long sequences that repeat in tandem or are interspersed throughout the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

prokaryotic and eukaryotic genomes differ drastically in

A

size & organization

prokaryote - attached to cytosol (no organells, DNA not in nucleus)

eukaryote - genome in distinct chromosomes - tightly bound to proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Anatomy of a prokaryotic genome

A

1) single, circular chromosome

2) Single origin of replication (req. for DNA rep. machenerary)

3) Genomes are compact
→. ~1-10 million bases (Mb)

4) Most content is genic
→ Minimal intergenic DNA (non- coding)
→ few repetitive sequences
→ No introns

5) Genome size is directly related to gene content
→ larger genomes encode more proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

regulatory consequences of organization of prokaryotic genome

A

Genes in biochemical or signaling pathways often clustered and controlled as operons

Chromosome not sequestered in nucleus

Chromosome not bound by histone proteins
→ No chromatin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Eukaryotic genomes

A

1) Genomes divided into multiple linear chromosomes, with telomeres & centromeres

2) DNA complexed with histone proteins (=chromatin) in a nucleus

3) Genome size tends to be much larger, and varies widely, even within a taxonomic group
→ Genes interrupted by introns
→ Copious intergenic DNA
→ Copious repetitive DNA

4) Genomes don’t tend to be compact

5) With rare exceptions, genes not clustered into operons

6) Many genes (most human genes) are interrupted by introns; genes are far apart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

C-value is the

A

DNA content per haploid cell
→ think of this as genome size (how many bp)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

G-value is the

A

protein-coding gene number
(amount of DNA seq corresponds to coding protein)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

G-value paradox

A

Gene number does not fully correlate with organismal complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

G-value paradox explained by

A

(1) alternative splicing
(2) expansion/contraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

alternative splicing explanation for G-value paradox

A

Multiple exons from one gene can be spliced in different ways (=alternative splicing) to form distinct mRNAs and proteins

No. of proteins&raquo_space; no. of protein coding genes

Explains smaller-than-expected gene count in multicellular spp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

expansion/contraction explanation for G-value paradox

A

Gene expansion & contraction is frequent, even among closely related spp.

gene duplication
family duplication
entire genome duplicated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

C-value paradox

A

Genome size doesn’t fully correlate with organismal complexity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

C-value paradox explanation

A

expansion of non-genic DNA, largely repetitive DNA

> 85% of human genome is repetitive DNA → caused by interspersed transposable elements → non-autonomous, non-coding transposable elements

25
Assembly of eukaryotic genomes is
very challenging Human genome is 3,200 Mb (million bases; =3.2 Gb) with large amounts of repetitive DNA. Technology not up to the task: Sanger sequencing
26
Draft human genome reference assembly
Sequencing is only the beginning, resulting in multiple millions of “reads” Assembly → sequencing reads must be put in order on chromosomes (we are skipping this aspect) Draft assembly → unfinished, with lots of “gaps” Reference assembly → the assembly (usually a working model) is used as a framework to guide interpretation of individual genome variation and functional genome analysis
27
HGP brought about
radical technological changes in genetics and radical conceptual changes in genetics
28
technological changes in genetics brought about by HGP
High throughput, massively parallel, genome-wide data collection and functional assays Sequencing efficiencies → 1 human genome: from 10 years and 2.7 billion (1990-2003) to 1 day and <$999 in 2019 Concomitant strides in computing power and analysis software
29
conceptual changes in genetics brought about by HGP
Humans are more variable than we thought Humans have far fewer protein-coding genes than we thought… …yet, most of the genome is transcribed → a lot of RNA not turned into proteins Cells are full of noncoding RNAs
30
Initial predictions before the Human Genome Project were ____ Current estimate is ___
~200,000 ~20,000 or less caveat: we need to reconsider how a “gene” is defined, as we will see later in the course
31
Functional Genomics
how to go from DNA to what do
32
Genome controls phenotype through
transcription
33
We expect that functional elements in the genome should be
1) transcribed or 2) bind proteins that regulate transcription
34
Bioinformatics involves ___ which can ____
using computer technology to collect, store, analyze and disseminate biological data and information can increase our understanding of health and disease and, in certain cases, as part of medical care.
35
Homologous genes
Genes that share a common evolutionary origin. Likely to have conserved sequence and function.
36
Paralogs
Homologous genes in the same species. e.g. alpha and beta hemoglobin in humans.
37
Orthologs
Homologous genes in different species. e.g. mouse and human alpha hemoglobin
38
Predict function from sequence
how closely related to other genome (ex. SARSr-CoV)
39
Comparative genomics
field of genomics that studies similarities and differences in gene content, function, and organization among genomes of different organisms
40
Transcriptome
All RNA molecules transcribed from a genome
41
Transcriptomics
Techniques used to identify and quantify the transcriptome.
42
protein domains
Complex proteins often contain regions, called that have specific shapes or functions (ex. zinc finger)
43
RNA-seq
Transcriptomics identifies all transcribed elements → extract all cellular RNA → transcribe → cDNA → chop up and add adapters → sequence Relies on next generation sequencing and bioinformatics
44
Microarrays
Transcriptomics Can be used to determine relative levels of mRNA (i.e. expression levels) for 1000’s of genes. Employ an array of complementary probes that are complementary to mRNA sequences.
45
Proteome
All proteins encoded in a genome.
46
Proteomics
Techniques used to identify and quantify the proteome.
47
Mass spec
Proteomics is a high throughput method to identify proteins in a cell → digest proteins into peptides → separating fragments by mass-to-charge ratio → match peak profiles to a database of known proteins
48
ChIP-seq
(Chromatin ImmunoPrecipitation) Proteomics (affinity capture) identifies DNA bound by known DNA-binding proteins → e.g., transcription factors (TFs), RNA pol antibodies bind to specific protein → take genomic DNA → mix with antibody → bind to protein (that is bound to DNA) → can pull complex out of solution and seq DNA bound by that protein Requires specific antibodies → need to know what protein looking for (and have antibody for it) high throughput sequencing
49
two-dimensional polyacrylamide gel electrophoresis
(2D-PAGE), proteomics in which the proteins are separated in one dimension by charge, separated in a second dimension by mass, and then stained
50
Protein Microarrays Employ ___ Can be use to ____
Proteomics Employ an array of proteins immobilized on a solid support. to identify protein-protein interactions or measure expression of proteins within cells (using immobilized antibodies).
51
Modifications of affinity capture and other techniques can be used to ____ termed the_____
determine the complete set of protein interactions in a cell, interactome.
52
Genome-wide mutagenesis screens
can be used to search for all genes affecting a particular function or trait. two methods—random inducement of mutations on a genome-wide basis and mapping with molecular markers—are coupled and automated
53
segmental duplications,
duplicated regions greater than 1000 bp that are almost identical in sequence. Many eukaryotic genomes, especially those of multicellular organisms, are filled with
54
multigene family is a
group of evolutionarily related genes that arose through repeated duplication and evolution of an ancestral gene.
55
gene deserts
(genetically engineered mice that were) missing large chromosomal regions with no protein-encoding genes
56
collinearity
many genes are present in the same order in related genomes
57
pangenome
the entire set of genes possessed by all members of a particular species.
58
single-nucleotide polymorphism (SNP)
A site in the genome where individual members of a species differ in a single base pair