Lecture 2 Flashcards

Question 1

Q

The human genome

Answer

A

22 autosomes and 1 pair of sex chromosomes (XX/XY)

About 25,000 genes in the genome. Two types:

Euchromatin is loosely packaged and transcriptionally active
Heterochromatin is densely packed and not transcriptionally active (aren’t supposed to be turned on)

Question 2

Q

How are males and females different

Answer

A

Y chromosomes are only paternally inherited (nothing to pair up and recombine with)
Mitochondrial DNA (mtDNA) is only maternally inherited- although its found in males and females- useful if we’re trying to reconstruct the evolutionary history of a population. They’re also different
Y and mtDNA does not recombine
Autosomal regions do recombine

Question 3

Q

Structure of DNA and RNA

Answer

A

DNA has 4 nucleotides (bases) - A, C, G and T- similar in their structure. Where they differ is in the nucleotide part. A and G are purines, C and G are pyrimidines.
Attached to a sugar (deoxyribose) and a phosphate group
RNA has a different sugar (ribose) and also a different base; uracil instead of thymine
The genetic code is always read in the 5’ to 3’ direction
G always pairs with C
A always pairs with T
DNA is always double stranded - G and C, T and A.
- GC content - % of positions that are GC

Question 4

Q

Typical structure of a gene

Answer

A

No two genes are the same, but there are some common features

No two genes are the same- there are similarities. Coding part is the exons- in between the exons are introns. Upstream there are promoters and enhancers (involved in gene expression). During transcription… introns get spliced out - mature messenger RNA

Question 5

Q

The genetic code

Answer

A

Key point – some substitutions change the amino acid (nonsynonymous)

Others result in the same (synonymous) amino acid; e.g. CCA and CCG are both proline

Only relevant to the coding region

Redundancy of the genetic code, each amino acid is encoded by a codon. Some amino acids are encoded by more than one codon.

This means a place where the mutation happens- can impact amino acid.

Question 6

Q

Genetic Variation

Answer

A

Single nucleotide polymorphisms (SNPs) are most common form of genetic variation
- Change one nucleotide for another. Tranisitions are more common than transversions
Transitions are more common than transversions (see Lecture 10)
Genetic variation: rely on evolution to happen
Occasionally we get indels- deletions/ insertions - this is disruptive in the coding region - results in a different set of codons.

Question 7

Q

SNPs are fundamental to the entire module

Answer

A

SNPS are the building blocks of all genetic variation
(Usually) two alleles – the major (more common) and minor (rarer) allele. A is the minor allele and g is the major allele.

Question 8

Q

Whats a minor allele frequency

Answer

A

Individuals can either have homozygous (e.g. GG or AA) or heterozygous (GA)

Question 9

Q

How can SNPs have a greater impact on some proteins than others

Answer

A

Ones that radically change a protein are unlikely to be neutral
Change from a GCG to a GCA- still codes for an alanine - silent.
GCG to GTG - valine - substitution (conservative) as they’re biochemically quite similar - similar charges and pHs- won’t affect the protein that much. The protein may fold and function in the same way
Non conservative amino acid substitution - GCA- CTA - change to a leucine - more profound affect on the protein. Changes the electronic charge or pH.
Deletions and insertions - GAA has been deleted- might have a serious consequence but down stream everything is still the same - number that’s divisible by 3
Addition or deletion of 1 or 2- affects the amino acid and affects everything downstream e.g. adding G- changes all the amino acids downstream and adds a stop codon early on - Likely to be harmful - rather than positive / neutral

Question 10

Q

How do we screen genetic variation? Capillary sequencing

Answer

A

Also known as Sanger sequencing and chain termination sequencing
Main idea is that a dideoxy (rather than a deoxy)nucleotide results in the termination of DNA synthesis during PCR.
Dideoxynucleotides are at low concentration in reaction mixture, but each base has a different fluorescent dye
Products passed through a capillary sequencer, and dyes read by a laser. Combination of size and dye reveals sequence
Laser can read different wavelengths of the different dyes- so tells you the sequencing of letters

Question 11

Q

How do we screen genetic variation? Next-generation sequencing

Answer

A

Note the Log scale

1 Mbp costs about 1/1,000,000 of the price it did in 2001

Moores law - relates to the speed of computer processes- DNA sequencing followed moores law until 2007

1 million times cheaper - huge change- can do things a lot more effectively

e.g. Illuminia sequencing

Question 12

Q

Illumina sequencing key points

Answer

A

DNA is fragmented and adaptors added to the ends.
Adaptors are then immobilised on a flow cell, and a process known as bridging amplification generates ~1000 identical sequences in millions of locations on the cell
Dye-labelled terminating nucleotides are added to the flow cell, then washed away. The process is repeated until about 100bp are read.
These short reads are then compared to a reference assembly.

Question 13

Q

Comparison of sequencing methods

Answer

A

Capillary sanger sequencing: ddNTP termination and fluorescent detection, max read 850, run time 1, Gb / run / machine &laquo_space;0.001, Pros: Accurate, useful for validation of NGS data, Cons: Low throughout, expensive

Illumina NovaSeqX: Polymerase-based sequence by synthesis, max reads: 2 x 150 (paired end), run time 2, Gb / run / machine 8000, Pros: Massive throughput, Cons: Short reads make assembly challenging

PacBio Revio: Single molecular real time sequencing, Max. read length 15,000-20,000, Run time 1, Gb / run / machine 90, Pros: Very high throughput, Very long reads, Cons: Less throughput than Illumina

Oxford Nanopore Promethion: Single molecular real time sequencing, Max. read length/ bp 10,000-100,000, run time: 3, GB/ run/ machine: 50-110, Pros: Very high throughput; Possibly longest reads, Cons: Slightly higher error rate; less throughput than Illumina

Question 14

Q

Errors during sequencing?

Answer

A

Illumina errors can be corrected (and caused!) bioinformatically
Sequencing errors are ~0.1% per read- get another file
- Q scores give an indication of reliability Q = -10log10P(error)
So, error rate of 1% is: -10 x log10(0.01) = 20Error rate of 0.1% is: -10 x log10(0.001) = 30
With high depth, errors are more obvious, and so can be corrected

Question 15

Q

Exome sequencing

Answer

A

The exome* is only a few percent of the genome, yet contains the coding regions of genes.
Sequencing just the exome is a cost-effective way of analysing a part of the genome that is likely to be important

Question 16

Q

What is sequence capture

Answer

Study These Flashcards

A

works by using ‘baits’ with biotin, which can then be attached to magnetic beads. The beads are then stripped of DNA which can be sequenced.

Question 17

Q

Cataloguing human genetic variation

Answer

Study These Flashcards

A

Identified > 1 million SNPs from 4 populations
Showed that recombination happened at ‘hotspots’
Provided impetus for association studies
Had profound effect on ability to study selection, evolution and population structure

Question 18

Q

SNP genotyping

Answer

Study These Flashcards

A

SNP chips use the principle of primer extension and termination (see sequencing).
Fragments are annealed to beads on an array
The two alleles are labelled with different flourescent probes
The different genotypes are ‘clustered’ which makes genotype calling very quick
Some chips type >2 million SNPs for ~£100
SNPs are chosen from HapMap data. Tag SNPs are highly correlated with other SNPs close by

Question 19

Q

The 1000 genomes project

Answer

Study These Flashcards

A

Much more comprehensive description of human variation
Estimate of mutation rates and detection of regions under selection
Loss-of-function mutants detected and shown to be common in all of us

Managed to estimate mutation rates for the first time.

Question 20

Q

The ENCODE Project

Answer

Study These Flashcards

A

An ambitious (and expensive) attempt to understand the function of the parts of the genome that is non-coding (i.e. most of the genome)
Sequences that are conserved across species are more likely to be functional
Some scientists have argued about the 80% figure, saying it is much lower

Lecture 2 Flashcards

(20 cards)