Genes and Genomes Flashcards

(118 cards)

1
Q

Definition

a method of DNA sequencing first commercialized by Applied Biosystems, based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication

A

Sanger sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition

the material of which the chromosomes of organisms other than bacteria (i.e. eukaryotes) are composed, consisting of protein, RNA, and DNA

A

Chromatin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define

Methyl-cytosine

A

the normal cytosine nucleotide in DNA that has been modified by the addition of a methyl group to its 5th carbon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Definition

non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates

A

SINES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is considered the fifth base in DNA?

A

Methyl-cytosine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Definition

a unit made up of linked genes which is thought to regulate other genes responsible for protein synthesis

A

Operon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mobile genetic elements are not usually found in gene exons/introns. Examples are retrotransposons which move via a DNA/RNA intermediate

A

Mobile genetic elements are not usually found in gene exons. Examples are retrotransposons which move via a RNA​ intermediate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Where are CpG islands usually found?

A

Mainly at the 5’ end of genes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How many bases does the human genome contain?

A

3162 million bases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Whole genome shotgun (WGS)

A

entails sequencing many overlapping DNA fragments in parallel and then using a computer to assemble the small fragments into larger contigs and, eventually, chromosomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Definition

a functional RNA molecule that is transcribed from DNA but not translated into proteins

A

non-coding RNA/ncRNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Whole Genome Shotgun Method?

A

Genomic DNA is shred randomly before being read. Repeated many time to ensure at least 30x read depth coverage. The reads are then reassembled into the genome sequence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Definition

an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences

A

BLAST search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What makes up junk DNA?

A

Pseudogenes

Mobile genetic elements (i.e. LINES, SINES, incomlplete retroviral-like elements and Transposon remnants)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Definition

Describing a type of messenger RNA that can encode more than one polypeptide separately within the same RNA molecule

A

Multicistronic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is used to sort out the contigs given in de novo assembly?

A

PacBio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a hypothetical protein?

A

A predicted protein that is not similar to any characterised protein

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What BLAST program is used for a protein query search in the protein database?

A

BLASTp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the major characteristics of SINES?

A

They do not encode reverse transcriptase, endonuclease or integrase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Definition

a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes

A

FASTA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define

non-coding RNA/ncRNA

A

a functional RNA molecule that is transcribed from DNA but not translated into proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Define

Draft genome sequence

A

Sequence of genomic DNA having lower accuracy than finished sequence; some segments are missing or in the wrong order or orientation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Definition

Elements that are transcribed into RNA, reverse-transcribed into DNA and then inserted into a new site in the genome

A

Retroviral-like elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Definition

a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores

A

FASTQ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
# Define Genome annotation
the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do
26
# Define Retroviral-like elements
Elements that are transcribed into RNA, reverse-transcribed into DNA and then inserted into a new site in the genome
27
# Define Paralogue
Either of a pair of genes that derives from the same ancestral gene
28
# Define SINES
non-autonomous, non-coding transposable elements (TEs) that are about 100 to 700 base pairs in length. They are a class of retrotransposons, DNA elements that amplify themselves throughout eukaryotic genomes, often through RNA intermediates
29
The ENCODE project is an *editing/annotation* approahc that has built a map of functional elements within the human genome, suggesting that over *50%/70%* is biologically active
The ENCODE project is an **annotation** approahc that has built a map of functional elements within the human genome, suggesting that over **70%** is biologically active
30
What is the genome data problem?
The ever increasing analysis gap that is occurring because our ability to analyse is not keeping up with the data available
31
# Definition a project that seeks to interpret the sequence of DNA that makes up the human genome
ENCODE project
32
What were the strategies used by HGP and Celera to sequence the human genome?
HGP used an ordered or directed strategy Celera used a shotgun strategy
33
# Define Pseudogenes
a section of a chromosome that is an imperfect copy of a functional gene
34
What were the key findings of the ENCODE project?
Around 80% of the human genome is assocaited with at least one biochemical event
35
\_\_\_\_\_\_\_\_\_\_\_ arise by gene duplication followed by gene inactivation - contain introns \_\_\_\_\_\_\_\_\_\_\_\_ are formed by integration of DNA copies of mRNA - do not contain introns
**Classical pseudogenes** arise by gene duplication followed by gene inactivation - contain introns **Processed pseudogenes** are formed by integration of DNA copies of mRNA - do not contain introns
36
# Definition DNA that does not code for a protein, usually occurs in repetitive sequences of nucleotides, and does not seem to serve any useful purpose
Junk DNA
37
True or False: The HGP sequence tells us nothing about the genetic variation between individuals
True
38
What BLAST program is used for a nucleotide quesry searchin the protein database?
BLASTx
39
# Definition Either of a pair of genes that derives from the same ancestral gene
Paralogue
40
Why does the sequence CpG occur at a lower than expected frequency in vertebrates?
During DNA damage, deamination of unmethylated C gives rise to U, which is recognised as a fault by DNA repair machinery. Deamination of methylated C gives rise to T, which is not recognised as an error by DNA repair machinery. Over evolutionary time, methylated Cs have been mutated to T, so CpG is under-represented in vertebrate DNA
41
# Define CpG island
stretches of DNA 500–1500 bp long with a CG: GC ratio of more than 0.6, and they are normally found at promoters and contain the 5′ end of the transcript
42
How do SINES move?
Using enzymes produced by other mobile elements e.g. LINES
43
# Definition a set of overlapping DNA segments that together represent a consensus region of DNA
Contig
44
Zero
45
What is an unbroken consensus sequence called?
Contig
46
True or False: The sequence data found in the HGP is inaccessible by regular people
False It is publically available
47
# Definition entails sequencing many overlapping DNA fragments in parallel and then using a computer to assemble the small fragments into larger contigs and, eventually, chromosomes
Whole genome shotgun (WGS)
48
What are the two types of Illumina sequencing? Which is faster?
HiSeq (3 days; 2000 GigaBases) MiSeq (56 hrs; 20 GigaBases)
49
# Define De novo
starting from the beginning
50
True or False: Only transposon remnants are evident in the human genome
True
51
# Definition a section of a chromosome that is an imperfect copy of a functional gene
Pseudogenes
52
What are the similarities between a draft and a closed genome sequence?
* Both have all the genes * Both predict the encoded proteins * Predict function by similarity to characterised proteins * Overview of the organism's genetic capability
53
In reality, how many contigs do we get per chromosome? Why?
You expect only 1, but in reality you get many, but the whole genome sequence will be there. This is because there will be several copies of the same sequence on the genome
54
What symbols indicate Bad and Excellent Phred quality scores?
Bad - !'#$%" Excellent = EFGHIJK
55
# Define BLAST search
an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences
56
# Define LINES
a group of non-LTR (long terminal repeat) retrotransposons which are widespread in the genome of many eukaryotes
57
# Definition a transposon whose sequence shows homology with that of a retrovirus
Retrotransposons
58
What are the major characteristics of non-retroviral retrotransposons (LINES)?
They have a promotor and encode a protein with combined endonuclease and reverse transcription activity
59
# Definition a group of non-LTR (long terminal repeat) retrotransposons which are widespread in the genome of many eukaryotes
LINES
60
What form is each entry in the GenBank database in?
A text file containing DNA sequence data and any associated information (annotation)
61
# Define Chromatin
the material of which the chromosomes of organisms other than bacteria (i.e. eukaryotes) are composed, consisting of protein, RNA, and DNA
62
# Definition the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do
Genome annotation
63
Which nucleotides can be methylated?
Cytosine but only when next to a guanine
64
What were the aims of the Human Genome Project?
To determine the entire nucleotide sequence of human DNA To identify all the genes within the human genome
65
Why is Illumina and PacBio often used together?
Illumina provides good quality reads whereas PacBio provides good read length
66
On completion of the human genome project it was evident that over *50%/70%/90%* of the genome does no encode *protein/microRNA/tRNA*, consistent with the idea of *waste/garbage/junk* DNA
On completion of the human genome project it was evident that over **90%** of the genome does no encode **protein**, consistent with the idea of **junk**​ DNA
67
What is the read depth coverage equation?
Depth = N x L / G N = number of reads L = length of each read G = estimated genome size
68
Retrotransposons move from one point to another in the genome via what?
RNA intermediates
69
# Define Processed pseudogenes
a type of pseudogene that is are copied from messenger RNA and incorporated into the chromosome
70
# Define Junk DNA
DNA that does not code for a protein, usually occurs in repetitive sequences of nucleotides, and does not seem to serve any useful purpose
71
# Define Mobile genetic elements
DNA sequences that can move around the genome, changing their number of copies or simply changing their location, often affecting the activity of nearby genes
72
# Definition a fluorescent chemical compound that can re-emit light upon light excitation
Flurophores
73
# Define FASTA
a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes
74
# Define Reference genome
a digital nucleic acid sequence database, assembled by scientists as a representative example of a species' set of genes
75
# Define Retrotransposons
a transposon whose sequence shows homology with that of a retrovirus
76
# Define Flurophores
a fluorescent chemical compound that can re-emit light upon light excitation
77
What are the two classes of retrotransposons?
Retroviral-like Non-retroviral-like
78
# Define Multicistronic
Describing a type of messenger RNA that can encode more than one polypeptide separately within the same RNA molecule
79
# Definition the normal cytosine nucleotide in DNA that has been modified by the addition of a methyl group to its 5th carbon
Methyl-cytosine
80
# Define Contig
a set of overlapping DNA segments that together represent a consensus region of DNA
81
# Definition one of two or more homologous gene sequences found in different species
Orthologue
82
How many genes are in the human genome?
Between 20000 and 25000
83
# Define Orthologue
one of two or more homologous gene sequences found in different species
84
# Definition Sequence of genomic DNA having lower accuracy than finished sequence; some segments are missing or in the wrong order or orientation
Draft genome sequence
85
What percentage of the genome encodes proteins?
2%
86
What does DNA methylation do?
Helps turn genes off by altering chromatin structure
87
# Definition a digital nucleic acid sequence database, assembled by scientists as a representative example of a species' set of genes
Reference genome
88
# Define Amplicon
a piece of DNA or RNA that is the source and/or product of amplification or replication events
89
# Definition stretches of DNA 500–1500 bp long with a CG: GC ratio of more than 0.6, and they are normally found at promoters and contain the 5′ end of the transcript
CpG island
90
# Definition a piece of DNA or RNA that is the source and/or product of amplification or replication events
Amplicon
91
# Definition a chromosomal segment that can undergo transposition, especially a segment of bacterial DNA that can be translocated as a whole between chromosomal, phage, and plasmid DNA in the absence of a complementary sequence in the host DNA
Transposon
92
What is the name of the modified nucleoties used in Sanger Sequencing?
Dideoxy nucleotides
93
# Definition starting from the beginning
De novo
94
# Define ENCODE project
a project that seeks to interpret the sequence of DNA that makes up the human genome
95
True or False: Retorviral-like retrotransposons do not encode coat proteins
True
96
A query protein is 26% identical to a guide protein. What can we say about these two proteins?
They might have similar functions
97
# Define Transposon
a chromosomal segment that can undergo transposition, especially a segment of bacterial DNA that can be translocated as a whole between chromosomal, phage, and plasmid DNA in the absence of a complementary sequence in the host DNA
98
# Define Read coverage depth
the number of unique reads that include a given nucleotide in the reconstructed sequence
99
# Definition a type of pseudogene that is are copied from messenger RNA and incorporated into the chromosome
Processed pseudogenes
100
# Definition the number of unique reads that include a given nucleotide in the reconstructed sequence
Read coverage depth
101
# Define Sanger sequencing
a method of DNA sequencing first commercialized by Applied Biosystems, based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication
102
# Definition DNA sequences that can move around the genome, changing their number of copies or simply changing their location, often affecting the activity of nearby genes
Mobile genetic elements
103
What are the components of Illumina sequencing?
'Blocked' nucleotides Oligonucleotide primer ssDNA template DNA polymerase
104
What proportion of nucleotides are identical in all people?
99%
105
What can you say about proteins that are over 35% identical to a guide protein?
They probably have a related function
106
The human genome compises 3 *million/billion/trillion* base paires encoding approximately *10,000/**20,000/50,000* genes. The number, position and order of *introns/exons/genes* is identical between *individuals/proteins/tRNA*
The human genome compises 3 **billion** base paires encoding approximately **20,000** genes. The number, position and order of **genes** is identical between **individuals**
107
108
# Define Operon
a unit made up of linked genes which is thought to regulate other genes responsible for protein synthesis
109
# Define FASTQ
a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores
110
Long interspersed nuclear element 1 (LINE 1) mobile genetic elements... Select one: a. are derived from viruses b. encode enzymes essential for their replication c. are found only in A / T rich regions of the genome d. emerged from the genomes of ancient parasitic bacteria
Long interspersed nuclear element 1 (LINE 1) mobile genetic elements... Select one: a. are derived from viruses **b. encode enzymes essential for their replication** c. are found only in A / T rich regions of the genome d. emerged from the genomes of ancient parasitic bacteria
111
Gene annotation is the process of... Select one: a. manually sequencing “difficult” regions of the human genome b. depositing new nucleotide sequence data in a public database c. adding information on biological function to a nucleotide sequence file d. deleting redundant data files
Gene annotation is the process of... Select one: a. manually sequencing “difficult” regions of the human genome b. depositing new nucleotide sequence data in a public database **c. adding information on biological function to a nucleotide sequence file** d. deleting redundant data files
112
When completed in 2003, the Human Genome Project lacked information on the... Select one: a. order of genes in the human genome b. approximate number of genes in the human genome c. approximate number of alleles in the human genome d. percentage of protein-encoding genes in the human genome
When completed in 2003, the Human Genome Project lacked information on the... Select one: a. order of genes in the human genome b. approximate number of genes in the human genome **c. approximate number of alleles in the human genome** d. percentage of protein-encoding genes in the human genome
113
What is the current thinking about junk DNA? Select one: a. It serves no useful purpose b. It consists of non-functional ancestral genes c. It makes up less than 10% of the human genome d. It is largely made up of mobile genetic elements
What is the current thinking about junk DNA? Select one: a. It serves no useful purpose b. It consists of non-functional ancestral genes c. It makes up less than 10% of the human genome **d. It is largely made up of mobile genetic elements**
114
The figure below represents a visual overview of a part of the DNA sequence of a bacterial genome (approximate base range 2400 to 8000). The overview is produced using the Artemis software and shows reading frame (RF) one through to six. The short black vertical lines indicate stop codons. The sequence for each stop codon in the sequence displayed can be... Select one: a. GGG only b. any of ATG CTG GTG c. any of TAG, TAA, TGA d. any of UAG, UAA, UGA
The sequence for each stop codon in the sequence displayed can be... Select one: a. GGG only b. any of ATG CTG GTG **c. any of TAG, TAA, TGA** d. any of UAG, UAA, UGA
115
BLAST search... Select one: a. predicts protein function from the predicted 3D structure of the query protein sequence b. is widely used to map millions of short DNA sequences onto a reference genome c. finds sequence similar to the query sequence in the subject database d. is a basic global alignment search tool
BLAST search... Select one: a. predicts protein function from the predicted 3D structure of the query protein sequence b. is widely used to map millions of short DNA sequences onto a reference genome **c. finds sequence similar to the query sequence in the subject database** d. is a basic global alignment search tool
116
The Whole Genome Shotgun (WGS) method for genome sequencing... Select one: a. uses long read sequencing technology to produce a single read that spans the whole bacterial chromosome b. is likely to work best when the total number of sequenced bases is the same as the predicted number of based in the bacterial chromosome c. is rarely used for bacterial genome sequencing d. is an approach based on the sequencing of randomly selected fragments of the genomic DNA, that collectively cover the whole genome
The Whole Genome Shotgun (WGS) method for genome sequencing... Select one: a. uses long read sequencing technology to produce a single read that spans the whole bacterial chromosome b. is likely to work best when the total number of sequenced bases is the same as the predicted number of based in the bacterial chromosome c. is rarely used for bacterial genome sequencing **d. is an approach based on the sequencing of randomly selected fragments of the genomic DNA, that collectively cover the whole genome**
117
For the final three questions, consider the following information: The genome sequence of the Reference strain was determined using a combination of long-read and short-read sequencing technologies (Assembled genome sequence: one circular chromosome and no plasmids). The genome of the Mutant strain was sequenced using a short-read sequencing (Illumina, paired-end, 150 base reads). Table 1 shows all sequence differences between the Reference and Mutant strains. Table 1. Sequence differences between strains The phenotypic difference is that the Reference strain has a flagellum and the Mutant strain does not. -- The initiation codon for pwpS is located: Select one: a. within 100 bases of position 4,684,444 b. between 100 and 300 bases from position 4,684,444 c. between 301 and 999 bases from position 4,684,444 d. more than 1,000 bases from position 4,684,444
The initiation codon for pwpS is located: Select one: **a. within 100 bases of position 4,684,444** b. between 100 and 300 bases from position 4,684,444 c. between 301 and 999 bases from position 4,684,444 d. more than 1,000 bases from position 4,684,444
118
The genome sequence of the Reference strain was determined using a combination of long-read and short-read sequencing technologies (Assembled genome sequence: one circular chromosome and no plasmids). The genome of the Mutant strain was sequenced using a short-read sequencing (Illumina, paired-end, 150 base reads). Table 1 shows all sequence differences between the Reference and Mutant strains. Table 1. Sequence differences between strains. The phenotypic difference is that the Reference strain has a flagellum and the Mutant strain does not. -- The phenotypic difference is likely to be caused by: Select one: a. any one of the differences observed in protein coding regions b. all three differences observed in protein coding regions c. the intergenic difference d. the intergenic difference, the difference in the pwpS gene, or both
The phenotypic difference is likely to be caused by: Select one: a. any one of the differences observed in protein coding regions b. all three differences observed in protein coding regions c. the intergenic difference **d. the intergenic difference, the difference in the pwpS gene, or both**