Sequencing and Bioinformatics Flashcards

1
Q

dNTP vs ddNTP?
What do these abbreviations stand for?
Similarities and Differences?
Nickname for ddNTP?

A

Both can incorporate into a newly synthesised DNA strand
Deoxy Nucleotide (dNTP) - Oxygen present on 3’ C allows for chain extension
Dideoxy Nucleotide (ddNTP) - 3’ oxygen removed preventing chain extension; Terminator nucleotide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain Sanger Dideoxy sequencing
It is an example of ‘sequencing by _______’?

A

Sequencing by Synthesis

DNA is denatured, and new strand is synthesised with a mixture of dNTPs and ddNTPs
New DNA strand extends a known primer
As each nucleotide is added to the chain, there’s a chance that a terminator nucleotide will be added
If this occurs then no more bases can be added
The products are then run on a gel to figure out the sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name of sequence once a ddNTP is added

A

Truncated sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Alternate ways of carrying out Sanger sequencing?

A

Running 4 reactions, each with a different ddNTP i.e. ddATP, ddGTP etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain Dye-Terminator Sequencing
What are the benefits?

A

Using fluorescently labelled ddNTPs
- This allows all the products to be run in the same lane in capillary gel electrophoresis
Each base is identified depending on the colour/wavelength of the fluorescent tag

This process can be automated and scaled up to industrialise the process of sequencing to sequence large mounts of DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How many bp can the ABI 370 sequencer read at a time?

A

800bp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Problems with sequencing human genome?

A

3 billion base pairs; Takes a long time if reading 800bp at a time
Piecing together the genome is also a challenge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Process the Human Genome Project Strategy (IHGSC) used? (hint: BACs)

A
  1. Extract human genomic DNA from multiple people
  2. Fragment DNA so they are small enough for the sequencers to read
  3. Size selection of 100-200kb fragments
  4. Clone fragments into Bacterial Artificial Chromosomes (BACs)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are BACs and what do they do?

A

They are plasmids which contain sequence elements which trick E. coli into replacing/copying them during the cell cycle (as if they were a native plasmid)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain how BAC cloning is performed

A

BAC cloning amplifies DNA

BACs are transformed into E. coli cells and these grow colonies
These colonies grown contain millions of copies of the same fragment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are genetic and physical maps used for in the sequencing of the genome?

A

Established a dense set of genetic markers across the human genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do genetic maps rely on?
What is the relationship between distance between markers and recombination frequency?

A

They rely on recombination frequency between markers
The further apart markers are from one another, the higher the recombination frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do they link BAC clones to the genetic and physical maps?

A

Clones are tested for PCR markers that have known locations on genetic and physical maps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are overlapping clones identified?
What is the likely origin for such clones?

A

The end of the insert, in the BAC containing the marker, is sequenced
Using the sequence used to design PCR primers, BAC clones containing that end sequence are looked for
Such BAC clones are likely to come from a neighbouring genetic locus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How are BAC clones organised?
Which types of BAC clones are identified?

A

They are ordered relative to the genetic map, with PCR primers
- These PCR primers are complementary to the sequence of a particular marker, allowing us to test and identify BAC clones that overlap the genetic marker
- Once we know that a BAC clone overlaps a marker, we know roughly where to place it on the genetic map

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is sequence data from the end of the insert obtained?
What is done with this data?

A

Sequence from the vector backbone into the BAC insert
This sequence data is used to design a new set of primers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How are these primers used with the BAC library?
Where are the clones identified likely to be?

A

They are ran with the BAC library to identify clones which contain the end sequence but not the original genetic marker
These are likely to be derived from a genomic region adjacent to the source of the original BAC clone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How is this BAC insert sequencing process repeated?
What can be done in the other direction?

A

The end of the second BAC clone insert is sequenced, allowing for another set of primers to be designed to identify a third BAC clone with an insert that overlaps the second
The same thing can be done at the other end of the original clone which overlaps the marker, to obtain overlapping BAC clones in the other direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is this whole approach of identifying and ordering overlapping BAC inserts called?

A

Chromosome walking
Allows a library of clones to be built up in the correct order relative to the genetic map

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does shotgun sequencing work?

A

Many fragments are sequenced at random and then assembled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Size of BAC inserts used in shotgun sequencing and how are the BAC clones generated?

A

BAC clone DNA is fragmented into smaller 5-10kb fragments and cloned into plasmid vectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How are primers used in shotgun sequencing to derive plasmid insert sequence?

A

The sequence of the plasmid that the insert is in is known, so primers could be designed to sequence the insert
Done many times to derive a consensus of the insert sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What did Celera want to do with the human genome data?

A

Patent and commercialise it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What was the sequencing approach Celera used?
How is this different to IHGSC?

A

They used a shotgun Whole Genome Sequencing (WGS) approach
Instead of creating BAC clones, Celera fragmented into much smaller fragments of 2-50Kbp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What did Celera demonstrate with shotgun sequencing?

A

That shotgun sequencing was feasible for even large and repetitive genomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Problems with Sanger sequencing on an industrial scale?
How did IHGSC get around this?

A

Doesn’t scale well as when reading, each capillary can only produce 1 sequence at a time
The IHGSC had ‘factories’ with hundreds of sequencers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What happened to the cost of sequencing in 2007 and why?

A

Price to sequence human genome dropped drastically from 2007 onwards due to Next-Generation Sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is massively parallel sequencing and how does it improve on Sanger sequencing?
What became the dominant platform for technologies like this?

A

Lots of molecules are sequenced at the same time as opposed to one; Reduced costs
Illumina is the main platform for this technology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How many sequence reads does 1 HiSeq run produce?
Length of sequencing reads?

A

≈8 billion sequence reads
≈100bp sequence reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Illumina uses sequencing by _______?
What is used to detect each base?

A

Sequencing by synthesis
Fluorescent bases are used to detect each base

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Why and how are the molecules amplified in Illumina sequencing?

A

Optical sensors are not sensitive enough to detect the signal from a single template molecule
PCR amplification of template molecules is done via a process called bridge amplification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

First step of Illumina sequencing?

A

Fragment DNA and size select fragments of ≈500bp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Refer to cluster generation process images and explain the process

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Outcomes of cluster generation?
Overlapping?

A

Generate millions of separate clusters, each with sequence data from a different region of the genome
Clusters are large enough to be detected when they fluoresce
Some clusters overlap, potentially resulting in the loss of a few reads; Doesn’t matter as there are so many clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is used in Illumina’s cluster sequencing by synthesis process? (type of term. nucleotide)

A

Reversible terminator nucleotides; Block chain extension, but the block and dye can be removed

Once the block is removed it acts like a dNTP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Refer to cluster sequencing process and explain the process
Also explain how it is repeated on different clusters to generate the second read?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Size of fragments selected in Illumina library preparation?

A

500bp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How far apart are read pairs when doing 2 rounds of sequencing clusters?

A

500bp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Advantages of 3rd Gen sequencing over Illumina? (4 advantages)

A

Single molecule sequencing – No amplification required
Real time sequencing – Data is generated during the run
Ultra-long read lengths - Up to 50kb (PacBio) or >2Mb (nanopore)
Can directly identify base modifications such as methylation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Disadvantages of 3rd Gen sequencing over Illumina?

A

Fewer reads per run than Illumina
More expensive per base
Individual reads have a high error rate (although consensus accuracy is good) (high error rate is inevitable as you are sequencing a single molecule at a time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are ZMWs in PacBio SMRT cells?
How many?

A

Zero mode waveguides are wells that cover the aluminium surface
150,000 wells

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is the ds-template DNA bound to and where?

A

DNA is bound to a DNA polymerase and a sequencing primer
Polymerase is immobilised at the bottom of a well; 1 polymerase per well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What does ZMW do to light used to excite the fluorescently labelled nucleotides?
What does this allow for?

A

Only allows the light to penetrate a small distance into the well, exciting a very small volume
This allows the signal from a single fluorescently labelled nucleotide to be detected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How is the DNA sequenced through this process? (ZMWs)

A

When a fluorescent nucleotide is bound by the polymerase, it remains within the illuminated zone and gives a detectable signal (unbound nucleotides will diffuse in and out quickly, not giving a consistent signal
The label is then cleaved away and another nucleotide is then incorporated and fluoresced, showing the sequence

45
Q

What are SMRT-bell adapters?

A

Hairpin loops on either end of a ds-template DNA, which connects them to make a single continuous loop of DNA

46
Q

Refer to the images of SMRT-bell adapters and explain how primers are involved in the process

A
47
Q

Why is it better to use SMRT-bell adapters with a small insert rather than a larger one?
What can be done with smaller insert reads? (hint CCS and CLR)

A

Larger fragment reads have a poorer quality with low accuracy

Smaller insert reads are sequenced several times as the primers goes round
These sequences can be combined to reduce the error rate and giving circular consensus sequences (CSS)
CCS reads can be used to correct the long but lower quality reads, giving corrected long reads (CLR)

48
Q

How long are CLRs typically?
Accuracy?

A

15kb
>99.999% accuracy

49
Q

Advantages of Oxford Nanopore?

A

Read lengths of up to 100kb
Error rates of ≈1%
Direct sequencing of DNA, RNA and protein
No library prep; Sequence directly from biological samples
Small and portable models

50
Q

How does Oxford NANO-PORE work?

A

Pore proteins embedded within an artificial membrane which is electrically insulating
Motor protein pushes a single strand of DNA through the pore, resulting in a change in the electrical current flowing through the pore

51
Q

How many bases are sequenced at once in Oxford Nanopore?

A

Several bases can pass through the pore at once
Short sequences (e.g. 5 bases) have their own characteristic signal

52
Q

What is done in Whole Genome Shotgun Sequencing?
How is assembly done?
What is this known as? (de novo ____)

A

Genomic DNA is fragmented and paired reads are obtained from either end of each fragment
The original chromosome sequences are computationally reconstructed; This is called de novo assembly

53
Q

What is read mapping?
Repetitive regions?

A

Computationally determining the most likely position that each sequence read derives from
Repetitive regions of the genome are a problem for mapping
Once the reads are mapped to the reference genome, it is possible to identify different positions, a process called “variant detection”

54
Q

Why would you want to resequence a genome? (4 reasons)

A

Individuals of a species are not all identical; Resequencing allows us to understand genetic variation within a population

For human populations, this is of particular interest for studying single-gene and complex genetic disorders

Cancers are effectively evolving organisms which are genetically different from the patient; Sequencing allows us to understand the genetic changes which occur as the cancer progresses

Functional genomic (identifying function of genomes) technologies such as RNA-seq and ChIP-seq involve resequencing

55
Q

What is the traditional method of assessing gene expression?
Explain this method?
Size of band meaning?

A

Northern blotting
Radiolabelled probes are used to detect the presence of a particular transcript within a whole cell RNA extract
The level of expression can be assessed (semi)quantitatively by the size of the band

56
Q

What is RT-(q)PCR?
Explain this method

A

Reverse transcription quantitative PCR
It uses reverse transcriptase to make cDNA from transcripts

The cDNA corresponding to a transcript can be PCR amplified; Fluorescent primers allow the transcript level to be quantified relative to a reference gene

57
Q

What is a microarray?
Differences to northern blot?

A

A glass slide onto which a spot consisting of lots of copies of a probe sequence can be attached
Can be done in parallel; Thousands of probes at a time

58
Q

How is a microarray assessed?

A

Microarray scanner detect the average intensity of each spot on the microarray and use it as a measure of the transcript level associated with each gene

59
Q

What are technical and biological replicates?

A

Technical replicates involve assessing the same biological sample on multiple microarrays
Biological replication requires us to repeat the entire experiment independently

60
Q

Typical relationship between biological and technical variation?
What does this mean for biological replicates?

A

Typically biological variation is larger than technical variation, so it is usually appropriate to perform multiple biological replicates

61
Q

Why is not possible to have as many replicates for microarrays and RNA-seq?
How do we get around this?

A

They are expensive

By looking at many genes in parallel to estimate the “normal” level of variation between biological replicates; Allows us to identify genes where the difference in expression is greater than would be expected by chance

62
Q

What do microarrays look for?
How is this expressed (calculation)?

A

Differential expression between the experimental sample and control
Expressed as ‘fold change’; Level of expression in the experimental/Level of expression in the control

63
Q

What does logFC mean?

A

Log2 of fold change calculation

64
Q

Explain logFC =
- 0
- +1
- -1

A

0 = Expression remains at the same level in the experimental
+1 = Expression doubles in the experimental
-1 = Expression halves in the experimental

65
Q

Limitations of microarray (4 limitations)

A

Microarrays are a low-resolution sequencing technology
- If we get a signal for a particular probe, we know that the sequence is present in our sample; However, we usually don’t know if that is the exact sequence that is present

We also don’t know if there are any sequences present which are not covered by our microarray probes

There is a limit to how much RNA can hybridise to a particular spot on the microarray; Can limit our ability to distinguish the expression levels of highly expressed genes

66
Q

Process of RNA sequencing (RNA-seq)

A

Fragment input RNA
Reverse transcribe it to cDNA
Attach adaptor molecules to it
Sequence them to produce many sequence reads

67
Q

How is RNA mapping of RNA-seq reads used to measure expression levels?

A

The reads are mapped to a reference genome
The number of reads mapping to each gene is used as a measure of the expression levels of each gene

68
Q

Why is RNA splicing a challenge for RNA-seq data analysis?
Why?

A

mRNA sequence does not exactly correspond to the sequence of the reference genome

Processed mRNAs consist of adjacent exon sequences, but the exons are separated by introns in the genome sequence

69
Q

What happens to reads that overlap exon junctions? (hint: SAA)

A

They are split during mapping using a splice aware aligner (see image on notes)

70
Q

What happens in the absence of a reference genome in RNA-seq assembly?
Key features of this process?

A

De novo assembly of transcripts
- Not all transcripts are present at the same level
- Same gene may produce multiple different transcripts
- Assembled transcripts can be annotated in a similar way to genome sequences

71
Q

RNA-seq vs Microarrays
Which has a larger dynamic range (ability to distinguish different levels of expression)?
Which gathers information for pre-selected regions, and which is genome-wide?
Which allows us to detect differences from the reference genome?
Which can be done without a reference genome?

A

RNA-seq has a larger dynamic range than microarrays (greater ability to distinguish different levels of expression)

Microarrays only give information for pre-selected regions of the genome; RNA-seq is genome-wide, and can detect novel transcripts

RNA-seq allows us to detect differences from the reference genome, such as SNPs in transcribed regions

RNA-seq can be done without a reference genome – de novo assembly of the transcriptome is possible

72
Q

What is alternative splicing?
How does this affect RNA-seq analysis?

A

Alternative splicing allows for a single gene to produce many different transcripts and proteins
Adds to the complexity of RNA-seq analysis

73
Q

What is DNA methylation?
What is this for and an example of?

A

Addition of a methyl group to C5 of cytosine
Acts to downregulate/regulate gene expression and is an example of epigenetics

74
Q

What else is methylation involved in?

A

X chromosome inactivation
Silencing of germline-specific genes and repeat regions
Imprinting (distinguish maternal and paternal alleles)

75
Q

How do bacteria use methylation? (2 ways)

A

To distinguish ‘self’ DNA from ‘non-self’
Non-self DNA can be digested by enzymes that acts as the immune system

They also use methylation to control bacterial DNA replication; Limit of a single replication per cell cycle

76
Q

What are the 3 ‘contexts’ of cytosine methylation?

A

CpG - C linked to G by phosphate backbone
CHG - C followed by ‘not G’ followed by a G
CHH - C followed by 2 non-G bases

77
Q

Which methylation persists and which must be re-established after cell division

A

CpG methylation persists whereas CHG and CHH methylation do not

78
Q

What does 5’-methyl cytosine deaminate to?

A

Thymine

79
Q

What are clusters of CpG in promoter regions called?
What does it mean when they are unmethylated?

A

CpG islands
The gene is expressed

80
Q

What is Bisulphite conversion?

A

Chemically inducing deamination of cytosine
Methylated cytosine does not undergo this change

81
Q

What is BS-seq?
How does it exploit the fact methylated cytosine remains unchanged?

A

Bisulphite Sequencing
A sample is sequenced before and after bisulphite conversion
These can be compared as methylated cytosine will remain as cytosine and unmethylated cytosine will be changed to uracil and read as thymine

82
Q

What is Reduced Representation Bisulphite Sequencing (RRBS)?
How do they do this?

A

RRBS is a method of targeting BS-seq to regions which are likely to have a high CpG content (e.g. CpG islands)
This allows us to make the most of a sequencing run

RRBS exploits restriction enzymes which have a recognition site containing CpG
By digesting with this enzyme and selecting small fragments, we target regions of high CpG density

83
Q

What is PacBio Single Molecule Real-Time (SMRT) sequencing?
How does it work?

A

Allows methylated bases to be distinguished from unmethylated ones (adenine as well as cytosine)

The presence of a methylated base delays the progress of the polymerase; This can be detected by analysis of the polymerase kinetics

84
Q

How does Oxford Nanopore detect cytosine methylation?

A

It detects a disruption in electrical current caused by a base passing through a pore in a membrane
Methylated bases give a distinct signal from unmethylated ones; Allows for direct methylation measuring

85
Q

What is Chromatin Immunoprecipitation (ChIP)?

A

It is a method that can be used to isolate DNA bound by specific protein

86
Q

Explain ChIP process

A
  1. Proteins covalently crosslinked to DNA by treating with formaldehyde
  2. Chromatin sheared by sonication or using an endonuclease (ChIP-exo) allows the bound DNA to be trimmed to the binding site
  3. Immunoprecipitation and purification of bound DNA using an antibody specific to the protein of interest
87
Q

What is ChIP-on-ChIP?
How does it work?

A

ChIP-on-chip involves identification of the ChIP-purified DNA using a microarray

The purified binding sites are labelled and hybridised to a tiling microarray to determine the genomic regions where the protein is bound

88
Q

What is ChIP-seq?
How does it work?

A

Sequencing of the ChIP-purified binding sites directly using high throughput sequencing platforms (e.g. Illumina)

Reads are mapped to the reference genome, and binding sites are identified as peaks in the signal

There is an offset between reads on the forward and reverse strand, which allows the exact boundaries of the binding site to be determined; Due to trimming DNA to binding site

89
Q

What is Chromosome Conformation Capture? (hint: long range interactions)

A

Uses formaldehyde to identify and form cross-links between long-range interacting regions of the genome

The cross-linked chromatin is digested, the loose ends ligated, and the cross-link is removed to form a single continuous piece of DNA containing sequence from the 2 interacting regions

90
Q

What are the 4 methods?
How do they differ?
(see image on notes) (hint: the different C’s)

A

3C - Look for specific interaction between 2 known partners

4C - Identify remote regions which interact with region of interest

5C - Discovery of novel interactions

Hi-C - Allows comprehensive genome-wide characterisation of all of the interactions between remote chromosomal regions

91
Q

Explain 3C

A

3C uses 2 specific primers, so is good for targeting interactions between 2 known loci

92
Q

Explain 4C

A

4C introduces a circularisation step, meaning that only one of the interaction partners needs to be pre-selected

93
Q

Explain 5C

A

5C uses amplification using primers with a universal “tail” sequence
PCR using primers which recognise this overhanging tail sequence can be used to amplify interactions between many interacting regions

94
Q

Explain Hi-C

A

Biotin is incorporated into the cross-link between interacting loci

The protein streptavidin has a high affinity for biotin, and is used to purify out the biotin-labelled DNA containing interacting loci

This is followed by high-throughput sequencing to get a genome-wide view of all long range chromosomal interactions

95
Q

What are the different functional elements of the human genome? (6 elements) (summary essentially)

A

Long range chromosomal interactions
Regions of open chromatin
DNA methylation sites
Transcription factor binding sites
Enhancers and promoters
Coding and non-coding transcribed regions

96
Q

What is the ENCODE Project?
Aims?
Methods used? (3 methods)
With what method were genes identified?

A

ENCyclopedia Of DNA Elements

Aims to identify all the functional elements in the human genome

RNA-seq, 5C and ChIP-seq were used

Genes were identified with RT-PCR or computational prediction

97
Q

How are CLIP-seq and RIP-seq different to ChIP-seq?

A

They identify RNA-binding proteins, whereas ChIP-seq identifies DNA-binding proteins

98
Q

What do DNase-seq and FAIRE-seq identify?
How are they different?

A

Regions of open chromatin

  • DNase-seq exploits open chromatins hypersensitivity to DNase I digestion
  • FAIRE-seq uses formaldehyde crosslinking of DNA to nucleosomes and purifies unbound DNA
99
Q

What is ChIA-PET?
Similarities and differences with 5C?

A

Both identify long range chromosomal interactions
It does this through ChIP-seq analysis of DNA-nucleosome interactions, instead of direct ligations of interacting DNA regions

100
Q

What is methyl450k?

A

methyl450k is a microarray-based method of identifying DNA methylation

101
Q

What did ENCODEs findings conflict with and how?

A

They were able to assign biochemical functions for 80% of the human genome

This conflicted with the previous view that much of the genome was “junk DNA” with no function

102
Q

Why are exons more conserved than introns between species?

A

Mutations are more likely to cause problems if they are within exons

Functionally important regions of the genome tend to be evolutionarily conserved

103
Q

General principle of evolutionary genomics?

Explain

A

The rate of evolution of the genome is not uniform, and functionally important regions tend to evolve more slowly

Changes in important regions are more likely to be “deleterious”
- Have a negative impact on fitness, which means they tend to be removed from the population through natural selection

104
Q

What is TRaDIS?
What is it for?

A

TRansposon Directed Insertion-site Sequencing

It is used to understand bacterial gene function

105
Q

What are transposons?
How do they work?

A

Mobile genetic elements

Transposons can move around the genome through a “cut and paste” mechanism

106
Q

What is the structure of transposons?
How can they be manipulated for mutant screens?

A

They consist of a transposase gene, flanked by inverted repeat sequences that are recognised by the transposase

If the transposase gene is removed, the transposon can still move and be inserted into a bacterial genome if transposase is supplied
Inclusion of an antibiotic resistance gene allows mutants to be selected

107
Q

How are mutant screens involving transposons studied? (Observations and conclusions etc.)

A

If a gene is disrupted by the transposon, it will be inactivated
If the disrupted gene is essential, the mutant will not survive

Genes without insertions are likely to be essential
In transposon mutagenesis we do not see mutants in essential regions of the genome

108
Q

‘TraDIS can in some cases give information at a sub-genic level’: What does this mean?

A

It can identify not just important genes, but important regions of genes

109
Q

How is TRaDIS used to identify which genes are essential?

A

Have an input pool of random transposon mutants
Run them through some form of stress
Compare input and output pool to see which organisms with which gene survived

This will tell us which genes are essential and non-essential