Genome organisation and function Flashcards

(22 cards)

1
Q

Genome content

A

Cot curve analysis and DNA reassociation kinetics= early tools for understanding genetic complexity.
Cot value= nt concentration in mol/L (Co)* reassociation time in s (t)* factor based on cation concentration of buffer. Cot1/2 directly proportional to seq “complexity” (i.e., unique seq length). Cot curves of hypothetical eukaryotes contain unique (1 copy), middle repetitive (10-100K) and highly repetitive (>100K) seq.s. Cot analysis-> Hypothesis: evolutionary novelty arises both from individual structural genes and regulatory systems (expansion of repetitive seq.s could provide mechanism-> evolutionary transformations).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sanger seq

A

Sanger (dideoxy)- 4 reactions doped w/~1% different ddNTP chain terminator. New strand read off 4 electrophoresis lanes manually. Smiling-> hard to automate. Can label nts w/ different fluorofores and run 1 gel. Automated sequencer (auto reactions, gel loading, reading) outputs peaks of different colours for each nt. Throughput also increased by capillary sequencer- gels in many parallel capillaries, can be reloaded repeatedly, camera scans fluorescence in each capillary as dye peaks pass. Read length increased by better gels, dyes, enzymes. New machines can run 1000x 500-800bp samples/day. Still choice method for small-scale routine seq.
Phages lambda and SV40, human mt in 1977-82; 1995 physical and genetic maps of human and mouse genome; 97- budding yeast genome; 98- worm genome;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Shotgun and heirarchical shotgun seq

A

Shotgun: genomic DNA-> shotgun clones-> pieces of seq aligned computationally. Fast+ cheap for short genomes/ first ~90%/large genomes. Quality variable, repeats= problem (underestimated duplicated seq.s)- leave unseq/poor quality gaps.
Heirarchical shotgun: genomic DNA-> BAC library-> organised, mapped large clone contigs-> BACs to be seq’d broken down to shotgun clones-> assemble seq. slow, multiple passes, but accurate, good to finish gaps.
00 drosophila and Arabidopsis drafts ; 01- human draft; 04 near complete human genome; 06 macaque draft.
Gap filling of human genome for another 20 yrs by T2T consortium, largely w/ long-read next gen seq.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2nd generation sequencing

A

Seq by synth avoids need for separate reactions, ID base added during be labelled reversible chain terminators. Polymerase, wash and image with 4-colour scanner, cleave fluorophore+3’-O-allyl group w/ Pd catalysed deallylation, repeat. 2nd Gen:
Massively parallel seq- Amplify single uncloned (unlike Sanger) template as Polonies (PCR colonies); mlns reactions in parallel on immobilised surface (massive parallel detection-> no need for separate reactions); liquid handling to +/- reagents, scan all reactions between successive nt additions- paired-end reads from both ends for added accuracy, easier assembly. PCR step optional, makes library less faithful. Commercialised by:
Illumina: Forked adapters-> generate template fragments w/ 2 different adapter seqs each end from library. Polonies amp around single template on flat surface coated w/ immobilised primers (5’ end on surface, 3’ free for extension), sheared dsDNA ligates to 2 different ds adapters; dsDNA denatured, 5’ binds surface; 3’ ends anneals one immobilised primer; PCR, denature, repeat 3-5 PCR rounds; seq polonies by synth using free primer binding loose ends of one template stand.
In other words: randomly fragments genomic DNA, ligate adaptors to both ends; bind ss fragments randomly to inside of flow cell channels; add unlabelled nts, enzyme; solid-phase bridge amplification, building dsDNA bridges on solid-phase substrate; denature ss templates anchored to substrate; several mln dense clusters generated in each flowcell channel; initiate cycle w/4 labelled reversible terminators, primers, polymerase; laser excitation+ detection of base; repeat; align data, compare to a reference.
Read length 150-200bp, doubled by paired-end reads using primer binding 3’ end of other template strand-> overlapping seqs, or if don’t overlap, alignment aided by them being from same DNA fragment. Can do multiple samples together, same flowcell (multiplexing) w/ unique primer sets for each sample to ID origin of each seq. Increase polony density-> increase throughput. Most popular for whole genome seq. Efficiency-> intrinsic limit on read length-> de novo assembly of new genome harder (gaps, repeats). Alignment to existing reference helps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3rd gen sequencing

A

3rd Gen: Long-read seq (LRS) (SMRT/ PacBio)- no repeated nt addition rounds, read lengths up to 150kb (limited by Pol)- good for seq assembly. Stochastic single-molecule events-> hight error rate on 1 pass. Real time requires constantly acquiring camera, # of parallel reactions limited by pixel #, chip speed. Prep and load long DNA into SMRT cells with 0-mode waveguides (ZMW- 1 molecule per well); Pol at bottom of well, add nts+ record seq in rea time. Improve by circular consensus seqs- up to 99.9% accuracy by 10-pass 15kb circular molecules-> primers on both “SMRTbell” copies, single immobilised molecule of DNAPol. Assembly more accurate than Illumina, T2T consortium fills gaps incl. centromeric satellites+ 115 new protein coding genes.
Nanopore: No PCR (remove PCR biases, allow detection of some mods like Me groups). DNA/RNA block current translocating through membrane channel (nt specific). Long read length, high throughput+ speed, 10s of kb read lengths (up to 1Mb), low cost, portable, single molecule. Good discrimination of single nts hard. from Imperfect coverage, low accuracy (92% 1D, 97% for 2D with reverse reading)- can complement Illumina. Improvements: better pore selectivity/ new pore proteins; solid-state inorganic pores (easier to manufacture); more accurate software; real-time reads.
Template annealed to primer (hairpin, prevent elongation from wrong end)+ blocking oligo; DNA binds MspA channel, pulled through slowly by +ve voltage on other side, current changes; blocking oligo forced off; Polymerase pulls DNA back, current trace mirrors that from before (same nts in reverse). Ideally, current size depends on nt at narrowest point of pore, but actually also depends on run of adjacent nts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Identifying and counting genes

A

Computational prediction of, e.g., ORFs- find seq w/out in-frame stop longer than could be by chance+ w/ evolutionary conservation regardless of reading frame. <100aa ORFs until recently ignored by predictions, but may encode functional peptides- can ID by seq conservation (miniproteome). Non-coding RNAs can’t be ID’d by ORF, conserved 2o structures hard to spot.
RNA-seq (of cDNA pop) faster and higher throughput, gives more info than just individual clones.
Also need to account for alternative promoters, alt PolyA sites, and alt splice sites.
E.g.: Drosophila Dscam gene- alt splicing-> 38,016 cell surface proteins (homophilic cell adhesion molecule, prevent narcissistic connections in neurons). Many splice variants also in haemocytes (- pathogen recognition role?)
E.g.: over 100M vertebrate antibodies for under 100K genes. Each Ig locus has 300-1000 V “genes” upstream of a few C genes. 2 light chain loci (kappa and lambda), 1 heavy chain locus. Lymphocyte development- DNA recombination-> VDJ fusion. Nts (aas) can be indel’d at V-D or D-J junctions. In light chain, kappa locus has ~1500 variants and lambda ~1800= ~3300 total L chain V regions. H chain has ~36k variants. Any L+H= ~100M Ig molecules. V point mutations during immune response (initial response- antibodies w/ low affinity for foreign antigen. Point mutations-> high affinity antibodies)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Applications of sequencing

A

Bioinformatics; combining PacBio with Illumina for de novo genome seq; ID genetic variation.
Cancer cell genomes- ID changes, monitor progression, tumour ecosystem.
Clinical isolates to ID pathogens, trace spread by nt divergence (nanopore helps speed and portability)
Metagenomics- seq ecosystems like soil, gut contents, sea; single-molecule seq-> metagenome- parasites, viruses, bacteria etc- but sensitive to contamination
Archaeological genomics (neanderthals, mammoth, bubonic plague) + Phylogenomics.
RNA-seq- profiling gene exp (starting to replace microarrays)
Population seq (e.g., UK10K consortium) IDs rare variants in health/disease. Africa has highest genetic diversity. European genomes over-represented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Single nucleotide polymorphisms

A

Single Nucleotide polymorphisms: alleles present at freq too high to be transient occurrence due to recent mutation (>1%)- recessive lethals may be rarer (eventually eliminated, new alleles w/ selective advantage grow in freq, eventually fix at 100%, alleles at higher freq likely selectively neutral, freq changes due to genetic drift (can lead to loss/fixation). Extensive SNP catalogues from seq many individuals- UK biobank seqs reveals nearly 600M SNPs, ~1/5 nts in genome. ~1/300-> aa substitution, most silent, likely neutral.
Mutation rates: Human pedigree seq-> ~1.3/100M substitutions per bp per generation, ~70 new mutations/gen/genome- continual source of variation. Mutation rate/male germline= 4x female germline, increases by ~2 mutations/year of parents’ combined age, mostly father’s age (1.5 vs 0.37 per year).
Large scale SNP detection by hybridisation: label target RNA/DNA w/ radio fluor., sample-> microarray, hybridise w/ probes immobilised on surface, wash unbound sample, charge-coupled camera device (CCD) images fluo, intensity proportional to # bound targets- analyse gene exp levels. Can screen whole genomes for a known SNP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SNP uses: HapMap, cons of using SNPs, GWAS

A

Uses of SNPs/IDing SNPs with elevated/reduced disease risk: classical genetic mapping IDs rare, high-risk disease alleles. Poor at IDing loci w/small contribution to diseases w/ heterozygous+ env causes. Many complex conditions-> likely numerous alleles. Using common SNPs maximises info from limited sample. HapMap project-> mother-father-child trios, ID “haplotypes”- regions w/ neighbouring SNPs showing “linkage disequilibrium” (non-random assoc)- not perfectly maintained in all individuals, decay by recombination over generations- single SNP from haplotype IDs haplotype present in many individuals, so don’t need to genotype every SNP. Potential to detect many SNPs w/ small disease contributions in GWAS.
e.g., SNP in angiogenin gene (2 alleles, A and G). A in 79% ALS patients (573/728) vs 86% non-ALS (515/598). Compare relative ALS risk conferred by gene allele: Odds ratio= (155/573)/(83/515)=1.7. Chi-squared test shows p<0.001- ALS/G association significant. Can make Manhattan plot of multiple SNPs in diseased and healthy sample comparison. Larger sample sizes detect smaller effects. Groups of significant SNPs/genes affecting same pathways offer mechanistic clues in disease.
SNP analysis cons: if testing multiple SNPs, even p<0.001 may be artefact (e.g., w/ 1M SNPs, 1000 will have p<0.001, false positives)- data from multiple SNPs in same haplotype can help accuracy; artefacts can also be result of non-random sampling, e.g., diseases associated w/ certain ethnic groups w/ ctrl not matched for ethnicity; most “causative” SNPs non-protein coding, only some affect exp of nearby genes; most not useful in prediction/therapy (low odds ratios).
GWAS: known heritability> genetic contributions of ID’d SNPs to traits. “dark matter” genetic contributions not represented as SNPs on chips: copy # variants, epigenetically inheried traits, ever more genes w/ smaller effects (“omingenic hypothesis”), rare alleles (more easy to ID w/exome seq, yields proportionately more severe mutations than whole genome, quicker and cheaper; exon seq purified by hybridisation-> higher coverage of coding seqs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SNP uses: disease alleles, molecular clocks, evolution and issues

A

Disease alleles: average individuals hetero for #150 LoF variants, most neutral. 10-20/150=rare variants, likely to be selected against/ disease if homozygous. Homozygotes for rare disease/ compound heteros (2 different non-wt alleles) for LoF alleles helpful to understand disease mech.s. Exome seq of undiagnosed patients+ their families-> 1/3 to ½ have strong candidate mutation (homozygous LoF/rare disease alleles). Phenotypically unbiased seq of populations w/ higher levels of homozygosity, e.g., populations w/ few founders/ high rates of cousin marriages, e.g. Icelandic and Pakistani populations.
Some individuals found homo for severe Mendelian conditions w/ no symptoms- other factors contribute. Some knockouts (homo LoF) w/ut effects, e.g., APO3C LoF individuals have lowered triglyceride lipoproteins in blood- less susceptible to cardiovascular disease? Potential for new therapeutic targets?
Molecular clocks: Seq divergence between species-> orthologues- genes from common ancestral gene, due to species/ phylogenetic divergence (vs paralogues- duplication within species)- divergence rate-> molecular clock- estimate divergence time, build phylogenetic trees. Clock rate varies among proteins and species, naïve interpretation-> misleading trees!
Evolution: some causative changes should show +ve selection: Higher polymorphism/ rate of change for non-synonymous vs silent nt changes. A selective sweep occurs- new mutation selected for, linked polymorphisms “hitchhike” along, mutations and hitchhikers fixed. E.g., immune proteins in host-parasite arms races, olfactory receptors (human-chimp comparisons), adult persistence of lactase, lighter skin colour (recent, Europe+ Asia).
Issues: under-representation of non-European genome seqs (improving), analysis of variation w/out involvement/ intent to benefit of affected groups (ethical issues incl. consent), eugenic assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Structural variants

A

Copy # variants (CNVs)- duplications, deletions. Polyploidisation (or whole genome duplication)- yeast had an event like this, followed by loss of 90% duplicated genes. Unequal crossovers-> tandem gene families: unequal crossover between short repetitive elements, when repeats in same orientation-> 1 deletion+ 1 duplication; recombination between inverted copies-> inversion; unequal crossing over between tandemly duplicated genes increases/decreases (can remove functional copy, often deleterious) copy number; crossover within a gene swaps N and C termini of affected gene.
Some originally ID’d as phenotypes- thalassemias (globin gene unequal crossovers), colourblindness (red/green unequal crossovers), Prader-Willi syndrome (Chr 15 duplication-> intellectual disability). Seq individuals-> 100s of structural changes relative to reference seq, ~90% of which indels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Microarrays for structural differences. Evolutionary consequences

A

Microarrays- genome screens of more individuals, kb res- sample + ctrls different colours, repeats blocked, samples hybridised to arrays (colour of ctrl and sample sqitched on a 2nd array as ctrl measure). Detect and quantify signals to produce mirror image from 2 flipped-colour arrays (if non-symmetrical, signal spurious). Duplications show up as peaks in sample colour, deletions-> peaks in reference colour. Otherwise flat.
CNV regions cover ~12% genome, several tens of CNVs per individual, ~1500 hetero and ~2 homo deletions /individual,most not associated w/phenotype. Counting relative freq. of reads along genome in massive parallel seq can ID CNVs, local read freq in short-read (Illumina)seq complements long-read, improves resolution, detects more variants. CNVs affect more of genome than SNPs.
Evolutionary consequences:
* Increased gene number (paralogous genes)-> redundancy (allows one to mutate-> pseudogene, non-functional), new/ specialised functions increasing complexity, or increased gene dosage
* Multigene families- ~1000 protein families duplicated and shuffled-> modern proteins, allowing extra complexity, new architectures and functions. Phylogenetic studies show 2/3 of orphan genes (no detectable relatives- high divergence or new gene birth) are new.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

VNTRs

A

VNTRs (variable # tandem repeats)- satellite seqs= tandem repeats, generally heterochromatic, condensed, most in centromeric regions, fewer in telomeric. Repeat size from few to 100s bp, copy # 10K to 1M, 10-40% of genome. Divergence rate typical of non-coding DNA, implies no function depending on seq. Drosophila w/ large satellite deletions survive-> not much needed. Mammalian centromeres= long repeat arrays (unlike yeast)- if 1 deleted, array w/ different seq can substitute. Unequal crossing over shows no bias to gain/loss. Replication slippage (looping out of daughter strand where unpaired end of daughter is in an Okazaki fragment (favoured if 5’ end of Okazaki contains seq that can form non-WC 2o structure more stable than pairing of entire fragment to template)) could give either gain/ loss of repeats, in vivo bias to gain perhaps due to stable 5’ flaps-> hard to remove from Okazaki fragments. Most euk genomes tolerate long repeat arrays, but selection at certain length reduces average repeat #). Pathogenic microsatellites: Huntington’s CAG repeats encoding polyQ stretch in protein, >40-> disease. Most microsatellites in non-coding regions, but excess expansion-> heterochromatic DNA, can inactivate nearby genes.
PCR analysis of VNTRs: high degree polymorphism and instability in repeat #, w/ bias to increased # over generations, small seq divergence between repeats (minsatellites)- expected since homogenisation of array not instantaneous.

Satellite 2-100 >1000 Centromere, heterochromatin
Minisatellite 9-100 10-100 Subtelomere, dispersed
Microsatellite 1-5 10-100 dispersed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Transposons and transposition model

A

OG described by McClintock: “controlling elements” @ chromosome breakage sites. Influence genome size/evolution. Transposition either DNA- or RNA intermediate+ reverse transcriptase based.
DNA-based transposons: full length usually encode transposase, inverted repeats either side. Insertion generates small duplication of target site @each end. Cut-paste (replicative) or copy-paste (non-replicative).
Precise excision of P only if homologous chromosome without P at that location, otherwise either no excision (replicative) or imprecise, often leaving a P behind.
Transposition model: P excises, ds gap, DNA can insert elsewhere. If ds gap repaired w/ homologous chromosome (no P), result equiv. to precise excision. If repair w/ sister chromatid (P inserted @ same position as excised), appear replicative. “Sloppy” repair-> imprecise excision, e.g., internal deletions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Identifying transposons

A

IDing transposons: by genetic effects (chromosome breakage caused by Ac in maize, hybrid dysgenesis caused by P in Drosophila). Kidwell-> males of wild-caught drosophila (“P strains”)x lab-raised females (M strain) gave partially sterile F1- germ-cell degeneration. F2 had many new mutations, often assoc w/chr breakage. Hybrid dysgenesis due to “P factors” present but repressed in P strains. M strains don’t have P factors, can’t repress. If P repressor activity maternally exp in egg, explain why only M female x P male cross (of 4 M/F M/P combos) produce dysgenic progeny. Dysgenesis seen in germline, but not soma (germline specificity) due to tissue (germline)-specific splicing on intron 3; in soma, stop codon in intron 3-> truncated transposase. If P element lacks intron 3 (engineered+ reintroduced), active in all cells.
For mutation in known gene, clone mutant allele w/ insertion. P strains have 30-50 P copies, but many internally deleted. Many deletions retain P ends but can’t encode transposase- must be supplied in trans from complete P.
Some cytotype repression by dominant negative effect of truncated transposases from P deletions, or piRNAs that include P seq- heritable through female germline due to RNA silencing, chromatin repression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Evolution of mobile DNA

A

All wild fly strains collected before 1950 M, now all carry P. Distribution+ phylogeny of P-like elements across drosophila species don’t match drosophila phylogeny, but consistent with horizontal interspecies transfer. ~2006, P elements start spreading through D simulans, hybrid dysgenesis less severe in populations where P longer established. P-> deleterious mutations, reduce reproductive fitness, drive lab pops to extinction- similar to host-parasite relationship: may limit effect to germline, reducing/postponing harm; spread through individual genomes and to progeny; can spread to all flies in sexually reproducing pop; eventually die out (transpose too often, killing host, or not often enough, losing function by selection against them and random inactivating mutations); escape routes via horizontal transfer/ becoming advantageous to host (symbiosis).
Hosts generate innate immune responses:
* RNAi (higher transposition rates in C elegans lacking RNAi machinery components)
* piRNAs recognising transposons (share mechanisms w/RNAi)
* 100s/KRAB Zn finger proteins in vertebrate genomes- bind specific seqs, recruit H3K9me chromatin respressor complex.
Symbionts:
* Telomerase- telomeres of many insects+ plants atypical, contain multiple LINE element copies, transposition maintains telomere length,
* Bacterial transpososns encoding antibiotic resistance. (are plasmids parasites or symbionts?)
* Strong circumstantial evidence: transposase-derived genes conserved (e.g., 58 in humans):
Arc derived from retrovirus RNA-binding gag, repurposed-> transcellular synaptic transport of extracellular vesicles. Syncytins catalyse cell fusion in mammalian placentas, origin- retroviral env genes driving cell fusion during viral infection.
Immunoglobin switching: excision between V/D/J resembles P element excision; RAG1 endonuclease in same superfamily as P, other transposases. Likely mobile ancestor integrated in ancestral immunoglobin gene-> advantage in antibody variety. Recently active ProtoRAG transposon using RAG1/2 found in lancelet (vertebrate)
Mobility in humans: ~7 classes of DNA-based transposon seqs; Within each, seqs v. divergent- therefore DNA-based transposons in humans appear as inactive fossils, lacking essential seqs (almost all human retroviral-like repeats, some exceptions)+ no recent copies.

17
Q

Retroelements

A

Retroelements (retroviral-like, RNA-mediated transposition)- can’t catalyse own excision.
* LINEs (long interspersed nuclear elements)- Drosophila I element can cause hybrid dysgenesis. In humans, most common L1- 17% of genome, ~500K copies.
* SINEs- most common in human=Alu, 10% genome, ~1M copies. Not protein-coding, resemble RNAPIII transcripts. Not autonomous (Piggyback w/ reverse transcriptase from L1)
Natural mutagenesis by endogenous retrotransposons- few mobile L/SINEs w/ near identical copies in different locations-> 10s of human diseases (some de novo insertions), 7380 polymorphic insertions found in 180 genomes (recent evolutionary history)- most in non-coding regions, probably will never be frequent enough to detect.
Self-splicing Introns (Genes XII)- introns of euk protein-coding nuclear genes spliced by spliceosome, incl snRNPs (these introns don’t usually encode protein), other intron classes in bacteria, organelles. Some cases, catalyse own splicing, sometimes w/ intron-encoded protein (IEP). Group II self-splicing introns in some plastids/bacteria, similarities to spliceosomal introns-> suggests common splicing mech+ evolutionary origin. Both excised as lariats, have similar active site 2ndary structures (some functionally homologous), U5 snRNP can substitute loss of homologous stem-loop structure of group II intron in trans. In organelles, can transpose to same location that lacks intron, catalysed by intron-encoded proteins (these introns are transposable elements but non-mutagenic as spliced out of final transcript).

18
Q

Engineering genomes: P element mediated transformation, genome-wide insertional analysis, cloning genes

A

P-element mediated transformation: flies (retroviruses for mammalian cells)- inject vector into precellular fly embryos, transposase made by co-injected “wings-clipped” vector, allows stable integration. Seqs between ends of vector immobilised. Use marker, e.g., white+ gene in vector for white- flies to select.
Genome-wide insertional mutagenesis: ~10% P insertions homo lethal, only 1/3 fly genes essential-> 30% inserts probably cause mutation byt inserting in coding region (hybrid/truncated protein), in intron (interfere w/ production of full-length transcript or splicing pattern), in 5’ UTR (truncated transcript), in promoter/enhancer, or where blocks effect of distant enhancer seq. Prefer to insert @ transcriptional ctrl seqs, perhaps bc chromatin accessible- create hot/coldspots. Use of P elements as mutagens easier using stable genomic sources of transposase. Collections of insertions= tools for genetic analysis.
Cloning gene: molecular tag in gene makes P insertion mutant easier to clone by inverse PCR (restriction digest, interested in flanking seqs, primer for known middle seq. reverse-engineer boundary of insertion, ligate to circularise, cut in known seq, end up w/ flanking seq in the middle, known seq either side to make primers, them amplify now central flanking seq to characterise/seq.
In drosophila, >60% genes have insertion in/near. Use of 3 transposons w/ different non-random site preferences improves coverage.

19
Q

P element secondary mobilisation and insertional mutagenesis in mouse embryonic stem cells

A

Secondary mobilisation: closest P near but not in gene. Can cross transposase gene to catalyse excision, a few % excisions imprecise, also remove flanking DNA, cause deletion. Alt, mobilise nearby P to recover new insertions- many reinsert within a few kb of original insertion
Other organisms: P only work in flies. Other transposons don’t require host proteins to transpose, work widely. PiggyBac (from moths) and SleepingBeauty (fossil element in fish, now used for some gene therapy, e.g., introducegenes in purified blood cancer cells) work in mammalian germline. Mouse mutagenesis logistically harder.
Insertional mutagenesis in mice embryonic stem cells-> chimeric adults (see BMB transgenic mice). Random insertional mutagenesis w/ retroviruses, “gene trapping” (like enhancer-trapping, but using vector w/ reporter gene, integrates, transcribed as fusion to endogenous promoter. E.g., LacZ integrated randomly, expressed in same pattern as disrupted gene), targeted knockout. Excess/ random integration eliminated by selecting against thymidine kinase (tk) w/ ganciclovir.
General strategy for targeting mice: ES cultivated from pre-implantation embryos, targeting vector w/ DNA homologous to target gene+ inserted DNA allowing selection, homologous recombination, proliferation of ES.

20
Q

Targeted genome manipulation by ds breaks and NHEJ

A

Targeted manipulation via ds breaks: repair by non-homologous end-joining (NHEJ)-> local mutations, small deletions.
* CRISPR/Cas9 generates targeted ds breaks, Cas9 nuclease can also be replaced by, e.g., transcriptional activator/repressor, allow targeted gene exp instead of mutation- efficient-> easy to screen for change by PCR/ inserting marker (GFP), easy to make targeting constructs by cloning synthetic oligo into gRNA-containing vector (see BMB for CRISPR mechanism).
* Can generate 2 alleles at different sites in same gene- where crosses possible, hetero individuals will show relevant recessive phenotype but not recessive phenotypes from off-target mutants-> exclude potential off-target effects.
* Ds breaks can also be repaired by homologous recombination from custom designed targeting construct to edit endogenous gene (e.f. GFP fusion)
* Base editing: fuse Cas9 nickase to one of 2 base editing enzymes, no ds break needed but only limited changes
* Prime editing: longer gRNA w/ 3’ extension to anneal unpaired strand, including nt change. gRNA extension= template for reverse transcriptase fused to Cas9
* Other anti-phase immunity systems like retrons

21
Q

Risks of gene manipulation and genome-wide studies

A

Risks of gene manipulation: “gene drive” could spread mutation-> every copy in population (mutagenic chain reaction, method for converting hetero to homo mutations). Human ES cells amenable to CRISPR-> ethics of engineering heritable changes

Genome-wide approaches aim to ID all molecules involved in a cellular process+ understand their roles. After microscopy-> general understanding, approach:
* Biochemically: purify proteins/ molecules co-localising w/ process of interest. ID function w/assays including replication of DNA when crude nuclear extract added, fusion of purified membrane compartments in vivo when cytosolic extract added, change in developmental cell fate when cleavage cell injected w/ cDNA library, fractionate extract that can catalyse process of interest. Clone gene, seq, ID homologies, express for biochemical characterisation, FISH for cellular location, use reverse genetics to ID mutants lacking protein, study mutant/transgenic phenotypes.
* Genetics: screen for mutants (see yeast section). Clock mutants for embryonic patterning in flies.

22
Q

Drosophila and P elements

A

More on P (class II) transposons in drosophila as an experimental tool
Ease of crossing- transposon carrying a dominant marker (white+, Cu/curly (usually on chromosome 2), Dr(drop- reduced eye size) usually on chr 3) easy to follow in recessive populations.
Enhancer trapping: reporter gene encoding DNA-binding domain of transcription factor (e.g., yeast GAL4) fused to minimal promoter. Significant transcription only when element cell-type specific, so different insertion sites of “enhancer trap” allow labelling of many cell types by GAL4 exp.