Lecture 2 Flashcards

1
Q

The human genome

A
  • 22 autosomes and 1 pair of sex chromosomes (XX/XY)

About 25,000 genes in the genome. Two types:

  • Euchromatin is loosely packaged and transcriptionally active
  • Heterochromatin is densely packed and not transcriptionally active (aren’t supposed to be turned on)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How are males and females different

A
  • Y chromosomes are only paternally inherited (nothing to pair up and recombine with)
  • Mitochondrial DNA (mtDNA) is only maternally inherited- although its found in males and females- useful if we’re trying to reconstruct the evolutionary history of a population. They’re also different
  • Y and mtDNA does not recombine
  • Autosomal regions do recombine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Structure of DNA and RNA

A
  • DNA has 4 nucleotides (bases) - A, C, G and T- similar in their structure. Where they differ is in the nucleotide part. A and G are purines, C and G are pyrimidines.
  • Attached to a sugar (deoxyribose) and a phosphate group
  • RNA has a different sugar (ribose) and also a different base; uracil instead of thymine
  • The genetic code is always read in the 5’ to 3’ direction
  • G always pairs with C
  • A always pairs with T
  • DNA is always double stranded - G and C, T and A.
    • GC content - % of positions that are GC
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Typical structure of a gene

A

No two genes are the same, but there are some common features

No two genes are the same- there are similarities. Coding part is the exons- in between the exons are introns. Upstream there are promoters and enhancers (involved in gene expression). During transcription… introns get spliced out - mature messenger RNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The genetic code

A

Key point – some substitutions change the amino acid (nonsynonymous)

Others result in the same (synonymous) amino acid; e.g. CCA and CCG are both proline

Only relevant to the coding region

Redundancy of the genetic code, each amino acid is encoded by a codon. Some amino acids are encoded by more than one codon.

This means a place where the mutation happens- can impact amino acid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Genetic Variation

A
  • Single nucleotide polymorphisms (SNPs) are most common form of genetic variation
    • Change one nucleotide for another. Tranisitions are more common than transversions
  • Transitions are more common than transversions (see Lecture 10)
  • Genetic variation: rely on evolution to happen
  • Occasionally we get indels- deletions/ insertions - this is disruptive in the coding region - results in a different set of codons.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SNPs are fundamental to the entire module

A
  • SNPS are the building blocks of all genetic variation
  • (Usually) two alleles – the major (more common) and minor (rarer) allele. A is the minor allele and g is the major allele.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Whats a minor allele frequency

A

Individuals can either have homozygous (e.g. GG or AA) or heterozygous (GA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can SNPs have a greater impact on some proteins than others

A
  • Ones that radically change a protein are unlikely to be neutral
  • Change from a GCG to a GCA- still codes for an alanine - silent.
  • GCG to GTG - valine - substitution (conservative) as they’re biochemically quite similar - similar charges and pHs- won’t affect the protein that much. The protein may fold and function in the same way
  • Non conservative amino acid substitution - GCA- CTA - change to a leucine - more profound affect on the protein. Changes the electronic charge or pH.
  • Deletions and insertions - GAA has been deleted- might have a serious consequence but down stream everything is still the same - number that’s divisible by 3
  • Addition or deletion of 1 or 2- affects the amino acid and affects everything downstream e.g. adding G- changes all the amino acids downstream and adds a stop codon early on - Likely to be harmful - rather than positive / neutral
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we screen genetic variation? Capillary sequencing

A
  • Also known as Sanger sequencing and chain termination sequencing
  • Main idea is that a dideoxy (rather than a deoxy)nucleotide results in the termination of DNA synthesis during PCR.
  • Dideoxynucleotides are at low concentration in reaction mixture, but each base has a different fluorescent dye
  • Products passed through a capillary sequencer, and dyes read by a laser. Combination of size and dye reveals sequence
  • Laser can read different wavelengths of the different dyes- so tells you the sequencing of letters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we screen genetic variation? Next-generation sequencing

A

Note the Log scale

1 Mbp costs about 1/1,000,000 of the price it did in 2001

Moores law - relates to the speed of computer processes- DNA sequencing followed moores law until 2007

1 million times cheaper - huge change- can do things a lot more effectively

e.g. Illuminia sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Illumina sequencing key points

A
  • DNA is fragmented and adaptors added to the ends.
  • Adaptors are then immobilised on a flow cell, and a process known as bridging amplification generates ~1000 identical sequences in millions of locations on the cell
  • Dye-labelled terminating nucleotides are added to the flow cell, then washed away. The process is repeated until about 100bp are read.
  • These short reads are then compared to a reference assembly.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Comparison of sequencing methods

A

Capillary sanger sequencing: ddNTP termination and fluorescent detection, max read 850, run time 1, Gb / run / machine &laquo_space;0.001, Pros: Accurate, useful for validation of NGS data, Cons: Low throughout, expensive

Illumina NovaSeqX: Polymerase-based sequence by synthesis, max reads: 2 x 150 (paired end), run time 2, Gb / run / machine 8000, Pros: Massive throughput, Cons: Short reads make assembly challenging

PacBio Revio: Single molecular real time sequencing, Max. read length 15,000-20,000, Run time 1, Gb / run / machine 90, Pros: Very high throughput, Very long reads, Cons: Less throughput than Illumina

Oxford Nanopore Promethion: Single molecular real time sequencing, Max. read length/ bp 10,000-100,000, run time: 3, GB/ run/ machine: 50-110, Pros: Very high throughput; Possibly longest reads, Cons: Slightly higher error rate; less throughput than Illumina

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Errors during sequencing?

A
  • Illumina errors can be corrected (and caused!) bioinformatically
  • Sequencing errors are ~0.1% per read- get another file
    • Q scores give an indication of reliability Q = -10log10P(error)
    So, error rate of 1% is: -10 x log10(0.01) = 20Error rate of 0.1% is: -10 x log10(0.001) = 30
  • With high depth, errors are more obvious, and so can be corrected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Exome sequencing

A
  • The exome* is only a few percent of the genome, yet contains the coding regions of genes.
  • Sequencing just the exome is a cost-effective way of analysing a part of the genome that is likely to be important
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is sequence capture

A

works by using ‘baits’ with biotin, which can then be attached to magnetic beads. The beads are then stripped of DNA which can be sequenced.

17
Q

Cataloguing human genetic variation

A
  • Identified > 1 million SNPs from 4 populations
  • Showed that recombination happened at ‘hotspots’
  • Provided impetus for association studies
  • Had profound effect on ability to study selection, evolution and population structure
18
Q

SNP genotyping

A
  • SNP chips use the principle of primer extension and termination (see sequencing).
  • Fragments are annealed to beads on an array
  • The two alleles are labelled with different flourescent probes
  • The different genotypes are ‘clustered’ which makes genotype calling very quick
  • Some chips type >2 million SNPs for ~£100
  • SNPs are chosen from HapMap data. Tag SNPs are highly correlated with other SNPs close by
19
Q

The 1000 genomes project

A

Much more comprehensive description of human variation
Estimate of mutation rates and detection of regions under selection
Loss-of-function mutants detected and shown to be common in all of us

Managed to estimate mutation rates for the first time.

20
Q

The ENCODE Project

A
  • An ambitious (and expensive) attempt to understand the function of the parts of the genome that is non-coding (i.e. most of the genome)
  • Sequences that are conserved across species are more likely to be functional
  • Some scientists have argued about the 80% figure, saying it is much lower