Lecture 5 DA Flashcards Preview

Advanced Bioinformatics > Lecture 5 DA > Flashcards

Flashcards in Lecture 5 DA Deck (31)
Loading flashcards...
1
Q

What are some reasons why we sequenced the human genome (3)?

A
  • Because its there - a bioinformatical challenge.
  • Helps against inherited diseases, including those we don’t know about.
  • Helps understands consequences of mutation.
2
Q

What is responsible for the phenotypic diversity among different individual humans?

A

Single nucleotide polymorphisms - SNPs.

3
Q

What is more important, the nucleotide sequence or the protein sequence?

A

Protein sequence.

4
Q

Which chromosome was sequenced first, and why? Which came after?

A

22 because it’s the shortest. 21 came after.

5
Q

Describe the hierarchical approach to sequencing the human genome (5).

A
  • Different groups are each given a chromosome to sequence.
  • The groups generate bacterial artificial chromosome sequences (BACS).
  • BACS were divided, and shotgun sequencing was done on them.
  • High fidelity maps with identifiable motifs allowed them to detect overlapping regions and assemble the sequence.
6
Q

Describe the shotgun approach to sequencing the human genome (6).

A
  • DNA is isolated and chopped into fragments.
  • Fragments are cloned into vectors, and sequenced.
  • Overlapping genes are combined to assemble the genome into contigs.
  • Scaffolds generated from contigs.
7
Q

What is celera sequencing, and what is it like to hierarchical sequencing?

A

Celera sequencing is a whole genome shotgun sequence at once. Finished faster than hierarchical approach.

8
Q

At how many locations do SNPs occur?

A

3m.

9
Q

How many genes total were found?

A

~51k.

10
Q

How many coding genes were found?

A

~20k.

11
Q

How many non-coding genes were found?

A

~20k

12
Q

What are pseudogenes, and how many were found?

A

Genes that seem to be protein coding, but mutation rendered them non-coding. 18k found.

13
Q

How many genes with variants were found?`

A

~20k.

14
Q

How many mRNA genes were found? What does this mean?

A

98k. For about every gene, there are 5 mRNAs that can be made, meaning we technically have ~100k genes.

15
Q

What % of the genome is coding? What % is repeating junk DNA?

A

Coding -

16
Q

What are some issues with being able to sequence the human genome?

A
  • Who owns the information.
  • Who can access it (police, insurers, employers etc)
  • Impact on a person
  • Foetal genetic testing - counselling/accuracy
  • Patenting the sequence - impact on medical discoveries
17
Q

What % of the genome encodes small RNAs? What do they do?

A

8-20%. They are regulatory, and can inhibit mRNA translation.

18
Q

Why is junk DNA believed to be so important?

A

It is believed to be like the operating system of the genome, running the coding genes.

19
Q

What number of RNAs are believed to control how a given protein is switched on or off?

A

For every protein, 10 times that number of RNAs control it. This depends on the cell type/developmental stage.

20
Q

What is the output of classical sequencing methods such as sanger sequencing?

A

500-1k base pairs.

21
Q

What is the output of next generation sequencing methods (NGS)?

A

Billions of base pairs.

22
Q

Describe how sanger sequencing works (5).

A
  • DNA sequence is amplified (PCR).
  • Primers are annealed.
  • dNTPs are used for extension.
  • ddNTPs labelled with fluorescence are used to terminate the sequence one base at a time.
  • Fragments seperated using gel/capillary electrophoresis.
23
Q

What are the benefits of NGS? What are the limitations?

A

Benefits
-Huge sequencing cap vs classical sequencing
-Rapid throughput/output - very quick
-No gel electrophoresis needed
Limitations
-Expensive, only economic when using large number of base pairs

24
Q

How do NGS sequencing work (2)?

A
  • Full genome immobilised on a chip, 100 base pairs long.

- All sequenced at once, very quickly.

25
Q

What is the sequence quality score in NGS?

A

Prediction of probabilities of an error in base calling.

26
Q

What is the most common way of genome assembly?

A

Denovo

27
Q

How does denovo work?

A
  • Data froms equencing is partitioned
  • Overlaps are found between datasets, building the genome gradually
  • Forms segments called contigs
  • Contigs are used to form scaffolds
28
Q

Does denovo assembly require a reference sequence?

A

No.

29
Q

Gaps can be found between contigs in denovo assembly. How can they be closed (2)?

A

Hope that a clone can be used to close the gap.

Otherwise can be closed using PCR with a primer at the end of the gap.

30
Q

What is annotation in bioinformatics?

A

Determining what the gene does.

31
Q

What is the data used denovo assembly based on?

A

De bruijn graphs.