Sequencing Flashcards

1
Q

Genome Sequencing Methodology

A

5-10 times the number of anonymous participants as needed provided DNA samples
Taken from local sites. DNA extracted from blood
Sequenced from composite of genomes of fraction of participants, known by nobody

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

BACS libraries

A

Bacterial Artificial Chromosomes
Sorted chromosomes from which DNA is isolated
Restriction Enzymes cut specific palindromic sequences
Restriction enzymes cut isolates DNA into multiple fragments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Creation of BACS libraries

A

DNA fragments inserted into circular DNA and included into bacteria (BACS)
Single sequences called CONTIGS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

BACS clones

A

Dilute solution of bacteria can be cultured on agar plate and the colonies produced are clones
Single colony contains clones of DNA sequence
Clones then used for sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

BACS automation

A

Automated massively parallel creation of BACS
Copied DNA isolated and sequenced
Computational tools applied to obtain the physical map

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Production of physical map

A

Select clones for sequencing (overlapping)
Sequence to at least draft coverage
Merge data
Order and orient with mRNA, paired end reads and other data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Genetic mapping

A

Produced using a physical map by assessing the location of the genes.
Genes on same chromosome are ‘linked’.
More recently. Position of genes is determined by the exact frequency of recombination has occurred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

FISH mapping

A

Fluorescence in situ hybridization
Attach fluorescent labels to DNA sequences
Process chromosomes on glass so location of specific genes within the chromosome can be identified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sequencing developments

A

Can do 20kb with 99.5% accuracy
Can sequence mRNA directly
Only suitable for a single strand of DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Current sequencing methods

A

PacBio HiFi - Mid length, Mid accuracy
Illumina - Low length, High accuracy
Oxford Nanopore - High length, Low accuracy

Not available during Human Genome Mapping Project

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

PacBio Hifi

A

Polymerase enzyme, nano-sized hole
Single strand of DNA introduced
Fluorescent nucelotides emit light as they are ‘stitched’ into the complementary double strand
Colour of light emmission provides accurate sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Illumina Sequencing

A

Individual pieces of DNA attached to glass surface
Sequencing by synthesis
As complementary nucleic acid attached, fluorescence produced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Oxford Nanopore

A

Double strand of DNA unzipped
Single strand inserted into protein nanopore
Electric current created by flow of ions which is a function of the nucleic acid base
Current as a function of time provides sequence information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Linkage distance

A

Distance in bp between genes on the same chromosome

Smaller linkage distance = more likely to be inherited together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Make up of Human Genome

A

Only 2% contains exons
26% introns
Only recently been able to understand role of other sequence information (lots of repetitive sequences)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sequence reassembly - Reducing computational efforts

A

Sequencing a large array of overlapping short fragments (contigs) created from the BACS
Short sequences are called reads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Gel electrophoresis

A

Comparing size of fragments/contigs
Fragments migrate in an applied electric field
Shortest move the fastest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Digital Trees/Trie

A

Multiway tree often used for storing large sets of words
Trees with a possible branch for every letter of an alphabet
Words end with $

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Trie usage

A

Implementation of sets
Quicker insertion, deletion and find
Quicker than binary trees and hash tables
Spell checkers, completion algorithms, longest-prefix matching, hyphenation
Search finds longest match between words in set and query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Sequence analysis - Tries

A

Can store DNA/proteins
Finding next fitting section in DNA reconstruction
Useful for finding errors, only need to search a small sub-tree
DNA, 4 way tree meaning your tree is deep but doesn’t waste so much memory
Searching for particular sequence motifs

21
Q

FInding protein coding genes

A
Ab initio
Computer approaches
Finding common sequences (start and end of protein coding genes)
Promoter regions - protein binding
Start codons
Stop codons
22
Q

Regulatory Region

A

Promoter - TATA box - Start of 5’ UTR

23
Q

Transcription and Splicing

A

Removal of introns in transcribed regions

Results in mRNA

24
Q

Regulatory Region Function

A

In this sequence, RNA polymerase will bind to initiate the transcription of the cDNA into RNA

25
Promoter Sequence
Firsts binds the RNA polymerase upstream / 5' end of the transcription initiation site 100-1000 base pairs long High occurrence of AA,AT,TA and TT dinucleotides (also A+T trinucleotides) Over representation of GC,GG,CG,AG,GA,TG downstream of promoter
26
TATA box
30% of human genes Contains sequence TATAWAW W = A or T
27
Benefits of sequencing the mRNA
Start codons, stop codons and exon sequences can be looked for in both the chromosomal DNA and the mRNA Can find them with tries Subsequent codons in mRNA are in groups of three for coding amino acids in sequence Start codon unique
28
Memory issues with tries/Time issues with tries
Can use a regular trie for a suffix tree, would typically use far too much memory to be useful Use of pointers to the original text Can build a suffix tree using O(n) memory where n is the length of the text Also linear time O(n) algorithm for trie construction (non-trivial)
29
When to use suffix trees
Efficient when it is likely that you will need to do multiple searches Exact word matching Use with dynamic programming for inexact matching (match with smallest edit distance) Bioinformatics, Advanced ML
30
Suffix trees with genome sequences
Suffix trees are valuable given the number of repeats present in the genome sequences With more unique reads in the genome, becomes less efficient
31
Genome Homology
Genomes of human are 99.9% homologous
32
Variants Removal of Negative Mutations
100s of new mutations in offspring for each generation Most mutations neutral in phenotypical effect or removed by negative selection Many mutations corrected by repair enzyme machinery of the cell
33
Variants - Mutations causing an advantage
Occasionally mutations create an advantage w.r.t survival or reproduction advantage to offspring (positive selection)
34
Mutations occurring in the genome
Mutations don't occur randomly. | Occur in particular regions in the genome known as hotspots
35
Variant definition
Permenant change in the DNA sequence which makes up a gene
36
Variant as opposed to gene mutation
Such changes do not always cause disease and can be present in non-coding regions
37
Allele
Variation of a given gene at the same position (locus) on the chromosome Can also be present in non-coding regions Typically multiple alleles at locus between different individuals in population
38
Polymorphism
Allelic variation determined as the number of alleles present
39
Phenotypic traits
Derived from the transmission of genes and alleles to an organism's offspring
40
SNP
Single nucleotide polymorphism Most common variation in human genomic DNA Single nucleotide differs between members of the population/chromosome pairs 4-5 million in each person's genome
41
Other genomic polymorphisms
Deletions and insertions
42
Chromosome synteny
Used to define genes which lie on the same chromosome | More recently term used for the conservation of blocks of order within two compared chromosomes
43
Repetitive Sequences
aka repetitive elements, repeating units, repeats | Make up approximately 50% of the human genome
44
Dispersed repeats
Recognized as potential source of genetic variation and regulation
45
Tandem repeat sequences (trinucleotide repeats)
Important in several human diseases Implication of repeats within exon region causes protein misfolding when present in high numbers (>40 copies for huntington's disease)
46
CpG islands
Sequences containing repeats of CG closer to the 5' end of the gene sequence (promoter) At least 200bp long % c+g >50% Observed/expected frequency >0.6
47
Expected frequency of CpG islands
Human genome has 42% GC content Expected frequency of a CpG = 0.21 ** 2 Actual frequency is 1%
48
Location of alleles or genes in chromosomes
Defined by bands (historically created by G-stain)
49
BCRA2
Breast/Prostate cancer One BRCA1 and BRCA2 are sequenced from blood samples Can use suffix trees to detect which of the stable mutations are present Short specific sequence motifs (mutations) within the flanking base pairs can be mined