Core lecture Flashcards

1
Q

Which format usually stores sequencing data?

A

FASTA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does “>” mean in the FASTA files?

A

Indicates the start of a new sequence entry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the ASCII character and Phred score do?

A

It calculates the quality of the data, meaning it helps determining the reliability of each nucleotide sequenced in a run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does Q20 mean?

A

Q 20 means qualituy value of 20, P error can be calcaulted: 1/10^Q/10 = 1/10^20/10 = 1/100 = 0.01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the typical purpose of base calling in NGS analysis?

A

To identify the nucleotide sequence from the raw data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are paired end reads?

A

Sequences read from both ends of a DNA fragment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the significance of multiplexing?

A

It allows multiple samples to be sequenced together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is main advantage of storing quality scores as ASCII characters?

A

It decreases the file size by only using 1 byte per score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the implication of having a “mate-pair”

A

The pair of reads are far apart and face away from eachother

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the “Seed and Extend algorithm”

A

The seed and extend algorithm are used to align DNA sequenced reads to a reference genome.

The algorithm starts by finding a small piece of the sequencing read( seed) that matches the reference genome. This is usually done quickly using a hash table.

The extend; once a seed match is found, the algorithm tries to extend this match in both directions, to align the rest of the sequence to the genome. It continues to match the extend, until too many mismatches are found.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does NGS stand for?

A

Next-generation sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which method is commonly used for sequencing DNA in NGS?

A

PCR amplification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a primary application of NGS technology?

A

Genetic disorder diagnosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does NGS differ from traditional Sanger sequencing?

A

NGS allows parallel sequencing of multiple fragments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a significant advantage of NGS over previous sequencing technologies?

A

Higher throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In NGS, what is the purpose of using barcodes in sequencing libraries?

A

To track the sample origin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What type of biological molecules can be sequenced using NGS?

A

DNA and RNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How has NGS impacted the field of personalized medicine?

A

It has enabled tailored treatments based on genetic makeup

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is one of the challenges in handling NGS data?

A

The high volume of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does the term “read length” refer to in NGS?

A

Length of the DNA fragments sequenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How has NGS technology influenced cancer research?

A

By identifying genetic mutations associated with cancers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the role of bioinformatics in NGS

A

To analyze and interpret the vast amount of sequencing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does de novo sequencing mean in the context of NGS?

A

Sequencing without a reference genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How does NGS contribute to the study of rare genetic disorders?

A

By enabling the identification of genetic mutations responsible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
In NGS, what does the term "coverage" refer to?
The number of times a nucleotide is sequenced
26
What is the significance of multiplexing in NGS?
Allows sequencing of multiple samples simultaneously
27
Which is an important consideration in NGS data analysis?
The accuracy and integrity of the data
28
How does NGS facilitate the study of microbial communities?
Through metagenomics, sequencing DNA from environmental samples
29
What role does NGS play in agricultural genetics?
Assisting in the development of genetically modified crops
30
What is one of the future directions or potentials of NGS technology?
Enhanced understanding and application in clinical settings
31
What is the main technique used in 1st generation sequencing?
Sanger sequencing
32
Which generation of sequencing is known for introducing massively parallel sequencing?
2nd generation
33
What is a key characteristic of 3rd generation sequencing technologies?
Single-molecule sequencing
34
Which technology is typically associated with 2nd generation sequencing?
Illumina HiSeq
35
What was a major limitation of 1st generation sequencing technologies?
Low throughput
36
Which is a benefit of 3rd generation sequencing?
Real-time sequencing
37
2nd generation sequencing is also known as:
Next-Generation Sequencing
38
Which generation of sequencing first introduced the concept of 'reads'?
1st generation, but it was evolved in 2nd generation
39
In which generation does sequencing occur without the need to stop and start the process?
3rd generation (3rd generation technologies like PacBio and Oxford Nanopore offer real-time sequencing without the need to stop and start)
40
Which sequencing technology typically generates the longest read lengths?
Oxford nanopore
41
What is the first step in assessing NGS data quality?
Generating FastQC reports
42
Why is trimming performed on NGS data?
To remove low-quality bases from the ends of reads
43
When is k-mer correction typically performed in the NGS data preprocessing pipeline?
Before de novo assembly
44
What does a sliding window in quality control processing do?
Identifies and trims low-quality regions in reads
45
Why is it important to remove adapters in NGS data preprocessing?
They interfere with alignment and de novo assembly
46
In the context of NGS, why might sequences be overrepresented?
several reasons, library preperation bias, sequencing bias, biological factors, contamination
47
Which file format is often used for storing compressed NGS data?
gzip
48
What is an outcome of merging paired-end reads in NGS data preprocessing?
Error correction for overlapping regions
49
What does it mean if the FastQC report indicates poor quality at the start of reads?
The first few bases may need to be trimmed
50
What is a common method to handle large amounts of NGS data?
Keep data compressed whenever possible and use workflow managers like Snakemake
51
What is a Single Nucleotide Variation (SNV)?
A change in a single nucleotide
52
What does a Polymorphism (SNP) imply?
Presence in more than 1% of the population
53
What is the significance of transition/transversion ratio in human genetics?
Indicates mutation types
54
What are the consequences of coding mutations?
Can change amino acid sequences
55
What differentiates germline mutations from somatic mutations?
Germline mutations can be passed to offspring
56
What is the role of untranslated regions (UTRs) in gene expression?
Regulating gene expression
57
How do pathogenic mutations affect organisms?
Can lead to diseases or disorders
57
What is a consequence of non-coding mutations?
Can lead to changes in gene expression
58
What is a polygenic risk score used for?
Predicting the risk of complex diseases
58
What is the focus of personalized medicine?
Tailoring medical treatment to individual genetic profiles
59
What is the primary goal of variant calling in NGS data analysis?
Identifying differences from a reference genome
60
Which format is commonly used to store variant calling data?
VCF
61
What does 'hard filtering' in variant calling refer to?
Applying strict criteria to identify true variants
62
What is a 'genotype' in the context of variant calling?
The genetic makeup at a particular locus
63
How does 'base quality score recalibration' enhance variant calling?
By recalibrating the probability of base call errors
64
In VCF files, what does the 'INFO' field provide?
Metadata about the variants
65
What role do known polymorphic sites play in variant filtration?
Help distinguish between true and false variants
66
What is indicated by a high Phred quality score in variant calling?
High confidence in the variant call
67
Why is 'depth of coverage' important in variant calling?
Indicates how many times a base is sequenced
68
What is the significance of 'allele frequency' in variant analysis?
It indicates the rarity of the allele in the population