Genomics and bioinformatics. Flashcards
What is the definition of bioinformatics?
The science of collecting and analysing complex biological data such as the genetic code.
Most things decrease in cost following Moore’s law, this was not the case for genome sequence after the development of NGS. What was the cost decrease from 2001 to 2014 for one megabase?
2001 $10k
2014 $0.10
What is the bioinformatics gap?
The problem of storing and analysing all the genetic date provided from NGS.
What two sequencing techniques are often used in health care?
Illumina Hiseq and Miseq
What sort of sequencing produces ‘big data’?
NGS.
Does Illumina Hiseq have a better read run or read depth?
Read depth- produces 8 billion reads of 150 bp on high output mode.
What method of sequencing now takes up roughly 80% of the market?
Illumina.
What format does a biological file have to be in?
Txt file. NOT IN A WORD PROCESSOR.
What is the most common file format for biological data?
FASTA.
What is found on the title line of a FASTA file?
> , identifier, description.
How many characters can be on each line in a FASTA file?
60.
When sequences are compared what can they be described to have?
A % identity.
What is a padding character?
A - which is filled in to form a ‘gap’, this maximises the sequence identity. T
What is sequence alignment used for?
Used to identify homology between sequences with a common ancestor.
When using - to fill in gaps (padding character) what do you need to assume?
That each sequence is equivalent.
What can sequence alignment be used to do other than work out a % identity (2 things)?
- Create short reads and contigs.
2. Used to map reads to a reference genome (resequencing).
What can resequencing allow you to identify?
Sequence variants and transcribed regions of the genome, allowing to you quantify transcripts and the level of transcription through RNA seq.
Would x– or -x- be more likely?
X– as caused by a single event. Both are still possible however.
Scoring systems are used in sequence alignments. Matches are given a (+) score. What two events give a (-) score?
Gaps and mismatches.
How do the penalties given to longer gaps slightly differ fro the penalties for shorter gaps?
In a longer gap the first (-) will get a minus score while the next can get a slightly lower penally as they only extend the gap.
What are gaps also called in sequence alignment?
INDELs.
What does calculating identities for sequences allow you to do?
Determine evolutionary relationships.
What do sequence alignments allow you to create?
Contigs.
Sequence alignments can be used to map reads to a reference genome. What is this process also called/
Resequencing.