Sudbery Flashcards Preview

MBB6344 - Genomic Science > Sudbery > Flashcards

Flashcards in Sudbery Deck (34)
Loading flashcards...

Why do we care about transcriptomics?

- 98.5% of protein coding seq is the same human to mouse
- 1-2% of the genome is coding (hence why we are so diff from mice)
- metazoan genomes are not selected for size → much is repetitive seq for decaying pseudogenes
- every cell has the same DNA, but cells are diff --> dep on what genes are active/exp levels


What did ENCODE find about 'junk' DNA?

- most of what was thought to be junk has a function in controlling something


How much of the genome did ENCODE claim was functional?

- 80% (CRMs)


What are cis-regulatory modules?

- inc promoters, enhancers, silencers and insulators
- regions of DNA that bind DNA BPs (eg. TFs) and reg gene exp


What seq motifs do DNA BPs bind, and what is the result of this?

- bind degenerate sequence motifs
- binding sites vary, but certain seqs are more likely
- but just because seq is present doesn’t mean it will bind
- eg. 8 mil GATA1 binding sites in the genome, only 0.2% bound by GATA1 (ChIP-seq)


Does all DNA exist as heterochromatin or euchromatin?

- no, sliding scale


What is hetero and euchromatin?

- heterochromatin = tightly packed
- euchromatin = loosely packed


What regions tend to be nucleosome free, or have v few nucleosomes?

- CRMs


How tightly packed is chromatin in transcribing genes?

- intermediate


How can nucleosome free regions of genome be mapped?

- map w/ DNase-seq
- DNase only cuts where there are no nucleosomes, can use to build up genome wide pic of where nucleosome free regions are


What did ENCODE measure and how?

- RNA expression --> RNA-seq, CAGE-seq and RNA-PET
- DNA/protein interactions --> ChIP-seq
- chromatin accessibility --> DNase-seq and FAIRE-seq
- 3D structure --> ChIA-PET and 5C
- methylation --> RRBS


How does ChIP-seq work?

- prots bind to DNA, and use crosslinking to see where binds to DNA, chop up DNA and use Ab to select for DNA which is crosslinked to a prot, so can separate this DNA, seq it and work out where in genome prot binds


What assays were carried out on what cells in ENCODE?

- tier 1 = all assays
- tier 2 = a selected subset of assays
- tier 3 = everything else, eg. a specific assay or combination


What did ENCODE prod?

- lots of data sets and continues to gen new data


What did ENCODE claim?

- vast majority (80.4%) of human genome participates in at least 1 biochemical RNA and/or chromatin-assoc event in at least 1 cell type
- 19.4% covered by at least 1 DHS or TF ChIP-seq peak across all cell lines


What was wrong w/ ENCODEs claim that 80% of the genome is functional?

- 100% of genome participates in replication
- about 60% of this 80% is transcription and about half of this is introns (which are NOT coding, and these are much bigger than exons so account for a signif proportion of the genome), and we would not necessarily say introns have a function


Was ENCODEs claim that 19.4% of the genome covered by at least 1 DHS or TF ChIP-seq peak more sensible, why?

- yes
- assuming half the elements from TF and cell-type diversity sampled, could estimate a min of 20% of genome participates in these specific functions, w/ the likely figure signif higher


What is the question at the debate over ENCODEs finding?

- what do we mean by functional?


What are the definitions of function?

- causal role = a seq has a function if seq causes the function
- selected effect = a seq has a function is that seq exists because of this function
- also the genetic role = a seq has a function if it is req for that function (doesn’t have to be visible to natural selection)


How is biological function created?

- evolution and selection for that function


What definition of function did ENCODE use, and what is the problem w/ this?

- causal definition
- but surely if a seq is important it will be selected for and if it has no effect on function then it is not important as far as evolution is concerned
- but ENCODE estimates of functionality were way above the amount of DNA known to be selected (at time best estimate was around 5% under -ve selection)


How can the amount of the genome under selection be investigated?

- compare seqs from 2 species and find regions w/ fewer diffs than expected


What is the problem w/ comparing seqs of 2 species to see how much of the genome is under selection?

- what is expected?
- if distant species used, only find function conserved across long time
- if close species used, not enough mutations to find conserved regions


How can problems in investigating how much of the genome is under selection by comparing seqs be overcome?

- use indels --> compare 2 species, look at the gaps in alignment, are there regions w/ fewer gaps than expected?
- had to work out how much seq conserved now in humans, can extrapolate this to plot human mouse, human horse, human chimp etc., then find fit line and extrapolate to 0


If just look at -ve selection to see how much of the genome is under selection, then what's missing?

- +ve selection
- non coding seqs
- compensatory evolution (applies to non-coding seq)


How does +ve selection affect genome selection?

- seq changes because of new function
- coding seq: dN/dS (comp synonymous to nonsynonymous changes and if lots of nonsynonymous then prob selecting for new function)


How can selection of noncoding seqs in genome be investigated?

- intra-species diversity comp to inter-species diversity
- using 1000 genomes, approx 4%


What is the effect of compensatory evolution on genome selection?

- TF sites control exp of gene, if lose a binding site then fitness reduced from 1 to eg. 0.8 (80% exp)
- isn’t fatal so allows time to get another mutation to gain binding site and fitness again increased to 1
- over years of evolution get many diff seqs that perform same function and give same fitness, but look quite diff


What evidence if there for compensatory evolution?

- mainly anecdotal, eg. from fly embryos
- systematic evidence --> took 4 species of yeast, counted amount of TF binding in regulatory regions of genes, and looked at how much seq changed between 2 species, T is like evolutionary time, get much more changes in seq than binding energy, so conservation of function w/o conservation of sequence


Are ENCODE elements conserved?

- some signal of selection, but quite weak
- melanocyte DHSs are depleted in somatic mutations in whole cancer genomes --> didn't find somatic mutations in ENCODE elements
- cancer function = unselected function?
- so cancer needs seq but body doesn't, does this make it functional?


What are some eg.s of function w/o conservation?

- eye colour genes --> not under selection, but is genetic
- disease causing mutations such as AD --> no evolutionary advantage to stopping these mutations, as affects after reproductive age


What did an experiment looking at enhancer and promoter evolution do and find?

- experiment took livers from 20 mammals and mapped where enhancers and promoters are
- promoters generally in conserved location
- enhancers move between species, not v conserved between species
- are they under seq constraint?
promoters and enhancers have some conservation but much less than exons


What evidence is there for function of promoters and enhancers?

- 98% of DHSs are linked to a promoter in ChIA-pet experiments
- genes closer to predicted enhancers tend to have higher expression levels in correct cell type
- ENCODE tested a no. of elements in enhancer reporter assays “over half of the elements showing activity, often in the corresponding tissue type”
- 65% of predicted human heart enhancers drove heart expression in mice


What is the significance of evidence for function os enhancers and promoters being TF binding sites?

- does not imply TF binding
- does not imply enhancer state
- does not imply contact w/ promoter
- does not imply regulation of a promoter
- does not imply phenotypic consequence for the cell
- does not imply phenotypic consequence for the organism