Genomics Flashcards
What is adjusted rand index?
Adjusted Rand index: A measure of the similarity between two data clusterings, adjusted for chance grouping of the elements.
What is CoC analysis?
Cluster of clusters (CoC) analysis: A method of obtaining clusters (e.g., of patient samples) that represent a consensus among the individual data types (in this study, we incorporated DNA methylation, DNA copy number, mRNA expression, and microRNA expression into the analysis).
What is a DM-BER?
Double-minute chromosome–breakpoint-enriched region (DM-BER): As detected by whole-exome and whole-genome sequencing, highly amplified gene regions that are connected by DNA rearrangement breakpoints and allow cancer cells to maintain high levels of oncogene amplification.
Define exon
The portion of a gene that encodes amino acids to form a protein.
Define fusion transcript
A transcript composed of parts of two separate genes joined together by a chromosomal rearrangement, in some cases with functional consequences for oncogenesis, therapy, or both.
What is methylation?
The attachment of methyl groups to DNA at cytosine bases. Methylation is correlated with reduced transcription of the gene immediately downstream of the methylated site.
Define microRNA
A short regulatory form of RNA that binds to a target RNA and generally suppresses its translation by ribosomes.
What is meant by ‘molecular subtype’
Molecular subtype: Subgroup of a tumor type based on molecular characteristics (rather than, e.g., histologic or clinical features); in this study, a molecular subtype is one of three classes based on IDH mutation and 1p/19q codeletion status.
Define mutation frequency
Mutation frequency: The number of mutations detected per megabase of DNA.
What is meant by ‘significantly mutated gene’?
Significantly mutated gene: A gene with a greater number of mutations than expected on the basis of the background mutation rate, which suggests a role in oncogenesis.
Sanger sequencing: how does the classical chain-termination method work?
Method of DNA sequencing based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. The classical chain-termination method requires a single-stranded DNA template, a DNA primer, a DNA polymerase, normal deoxynucleotidetriphosphates (dNTPs), and modified di-deoxynucleotidetriphosphates (ddNTPs), the latter of which terminate DNA strand elongation. These chain-terminating nucleotides lack a 3’-OH group required for the formation of a phosphodiester bond between two nucleotides, causing DNA polymerase to cease extension of DNA when a modified ddNTP is incorporated. The ddNTPs may be radioactively or fluorescently labeled for detection in automated sequencing machines. The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP), while the four other nucleotides are ordinary ones. The dideoxynucleotide is added in approximately 100-fold excess of the corresponding deoxynucleotide(e.g. 0.5mM ddATP : 0.005mM dATP) allowing for enough fragments to be produced while still transcribing the complete sequence. Following rounds of template DNA extension from the bound primer, the resulting DNA fragments are heat denatured and separated by size using gel electrophoresis. In the original publication of 1977,[2] the formation of base-paired loops of ssDNA was a cause of serious difficulty in resolving bands at some locations. This is frequently performed using a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C). The DNA bands may then be visualized by autoradiography or UV light and the DNA sequence can be directly read off the X-ray film or gel image. In the image on the right, X-ray film was exposed to the gel, and the dark bands correspond to DNA fragments of different lengths. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The relative positions of the different bands among the four lanes, from bottom to top, are then used to read the DNA sequence.
In which instances is Sanger sequencing still often useful?
Sanger method remains in wide use, for smaller-scale projects, validation of Next-Gen results and for obtaining especially long contiguous DNA sequence reads (>500 nucleotides).
Limitations of chain-termination methods of Sanger sequencing?
Limitations include non-specific binding of the primer to the DNA, affecting accurate read-out of the DNA sequence, and DNA secondary structures affecting the fidelity of the sequence.
Describe two technical variations of chain-termination sequencing.
Technical variations of chain-termination sequencing include tagging with nucleotides containing radioactive phosphorus for radiolabelling, or using a primer labeled at the 5’ end with a fluorescent dye. Dye-primer sequencing facilitates reading in an optical system for faster and more economical analysis and automation. The later development by Leroy Hood and coworkers of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing.
What does CRISPR stand for?
Clustered regularly interspaced short palindromic repeats Fifty years ago, microbiologists sparked the recombinant-DNA revolution with the discovery that bacteria have innate immune systems based on restriction enzymes. These enzymes bind and cut invading viral genomes at specific short sequences, and scientists rapidly repurposed them to cut and paste DNA in vitro — transforming biologic science and giving rise to the biotechnology industry. Ten years ago, microbiologists discovered that bacteria also harbor adaptive immune systems, and subsequent progress has been breathtakingly rapid.1 Between 2005 and 2009, microbial genetic studies conducted by the laboratories of Mojica, Jansen, Koonin, Horvath, van der Oost, Sontheimer, Marraffini, and others revealed that bacteria have a programmable mechanism that directs nucleases, such as Cas9, to bind and cut invading DNA that matches “guide RNAs” encoded in specific bacterial genome regions containing clustered regularly interspaced short palindromic repeats (CRISPR).
How might CRISPR technology be applied to HIV?
To treat HIV infection, physicians might edit a patient’s immune cells to delete the CCR5 gene, conferring the resistance to HIV carried by the 1% of the U.S. population lacking functional copies of this gene. To treat progressive blindness caused by dominant forms of retinitis pigmentosa, they might inactivate the mutant allele in retinal cells. To prevent MIs that kill patients with homozygous familial hypercholesterolemia, they might edit liver cells to restore a functional copy of the gene encoding low-density lipoprotein receptors. Editing of blood stem cells might cure sickle cell anemia and hemophilia. These goals will require overcoming serious technical challenges (such as avoiding “off-target” edits elsewhere in the genome, which might give rise to cancer), but they pose no unique ethical issues because they affect only a patient’s own somatic cells.
Describe four central issues with human germline editing using CRISPR-Cas9 technology.
- Technical issues: whether genome editing can be performed with sufficient precision to permit scientists to responsibly contemplate creating genetically modified babies. Currently, the technology is far from ready: Liang and colleagues recently applied genome editing to human tripronuclear zygotes (abnormal products of in vitro fertilization [IVF] that are incapable of developing in vivo) and documented problems including incomplete editing, inaccurate editing, and off-target mutations. Even with improved accuracy, the process is unlikely to be risk-free. 2. Do compelling medical needs outweigh the risks both from inaccurate editing and from unanticipated effects of the intended edits. Various potential applications must be considered. 3. Who has the right to decide? Can parents consent for future generations? Some people will argue that parents should have unfettered autonomy — that modifying one’s progeny is akin to using PGD to avoid genetic diseases or choosing sperm donors on the basis of intellectual or athletic prowess. Yet parental autonomy must be weighed against the interests of future generations who cannot consent to the genetic modifications their flesh will be heir to. 4. Morality — what’s right and wrong and how we ought to live as a society. Authorizing scientists to make permanent changes to the DNA of our species is a decision that should require broad societal understanding and consent. It has been only about a decade since we first read the human genome. We should exercise great caution before we begin to rewrite it.
Describe potential applications of germline editing using CRISPR technology and arguments for/against.
i) Preventing devastating monogenic diseases, such as Huntington’s disease. Though avoiding the roughly 3600 rare monogenic disorders caused by known disease genes is a compelling goal, the rationale for embryo editing largely evaporates under careful scrutiny. Genome editing would require making IVF embryos, using preimplantation genetic diagnosis (PGD) to identify those that would have the disease, repairing the gene, and implanting the embryo. Yet it would be easier and safer simply to use PGD to identify and implant the embryos that aren’t at risk: the proportion is high in the typical cases of a parent heterozygous for a dominant disease (50%) or two parents who are carriers for a recessive disease (75%). To reduce the incidence of monogenic disease, what’s needed most is not embryo editing, but routine genetic testing so that the many couples who don’t know they are at risk can avail themselves of PGD. ii) Reducing the risk of common diseases, such as heart disease, cancer, diabetes, and multiple sclerosis. The heritable influence on disease risk is polygenic, shaped by variants in dozens to hundreds of genes. Common variants tend to make only modest contributions (for example, reducing risk from 10% to 9.5%); rare variants sometimes have larger effects, including a few for which heterozygosity provides significant protection against disease. iii) Reshaping the human gene pool by endowing all children with many naturally occurring “protective” variants. However, genetic variants that decrease risk for some diseases can increase risk for others. (For example, the CCR5 mutations that protect against HIV also elevate the risk for West Nile virus, and multiple genes have variants with opposing effects on risk for type 1 diabetes and Crohn’s disease.) The full medical effect of most variants is poorly characterized, let alone the combined effects of many variants. Safety studies would be needed to assess effects across various genetic backgrounds and environmental exposures. The situation is particularly dicey for rare protective heterozygous variants: most have never been seen in the homozygous state in humans and might have deleterious effects. Yet heterozygous parents would routinely produce homozygous children (one quarter of the total) — unless humans forswore natural reproduction in favor of IVF. iv) Currently, the best arguments might be for eliminating the ε4 variant at the APOE gene (which increases risk for Alzheimer’s disease and cardiovascular disease) and bestowing null alleles at the PCSK9 gene (which reduces the risk of myocardial infarction). Still, our knowledge is incomplete. For example, APOE ε4 has also been reported to be associated with better episodic and working memory in young adults. v) Why limit ourselves to naturally occurring genetic variants? Why not use synthetic biology to write new cellular circuits that, for example, cause cells to commit suicide if they start down the road toward cancer? But such efforts would be reckless, at least for now. We remain terrible at predicting the consequences of even simple genetic modifications in mice. One cautionary tale among many is a genetic modification of the tp53 gene that protected mice against cancer while unexpectedly causing premature aging.5 We would also need to anticipate the potential interactions among the diverse genetic circuits that creative scientists will cast into the gene pool. Mistakes would be inevitable, and there would be no way to recall novel genes from the human population. vi) Reshape non-medical traits. Height may prove challenging (the hundreds of natural variants have tiny effects), but hair and eye color may be pliable. Disruption of the MC1R gene is associated with bright red hair, although it also heightens the risk of melanoma. Sports-minded parents might want to introduce the overactive erythropoietin gene that conferred high oxygen-carrying ability on a seven-time Olympic medalist in cross-country skiing. Nonnatural genetic modifications hold even bolder prospects — and risks.
What is X-chromosome skewing? By what mechanisms can it occur? How common is it? Why is it medically significant?
- Skewed X chromosome inactivation occurs when the inactivation of one X chromosome is favored over the other, leading to an uneven number of cells with each chromosome inactivated. It is usually defined as one allele being found on the active X chromosome in over 75% of cells, and extreme skewing is when over 90% of cells are have inactivated the same X chromosome. - It can be caused by a) primary nonrandom inactivation, either by chance due to a small cell pool or directed by genes, or b) caused by secondary nonrandom inactivation, which occurs by selection. - Most females will have some levels of skewing. It is relatively common in adult females; around 35% of women have skewed ratio over 70:30, and 7% of women have an extreme skewed ratio of over 90:10. - This is of medical significance due to the potential for the expression of disease genes present on the X chromosome that are normally not expressed due to random X inactivation. - X chromosome inactivation occurs in females to provide dosage compensation between the sexes. If females kept both X chromosomes active they would have twice the number of active X genes than males, who only have one copy of the X chromosome. At approximately the time of implantation, one of the two X chromosomes is randomly selected for inactivation. The cell undergoes transcriptional and epigenetic changes to ensure this inactivation is permanent. All progeny from these initial cells will maintain the inactivation of the same chromosome, resulting in a mosaic pattern of cells in females.
What is a ‘double minute’
Double minutes are small fragments of extrachromosomal DNA, which have been observed in a large number of human tumors including breast, lung, ovary, colon, and most notably, neuroblastoma. They are a manifestation of gene amplification during the development of tumors, which give the cells selective advantages for growth and survival. They frequently harbor amplified oncogenes and genes involved in drug resistance. Double minutes, like actual chromosomes, are composed of chromatin and replicate in the nucleus of the cell during cell division. Unlike typical chromosomes, they are composed of circular fragments of DNA, up to only a few million base pairs in size and contain no centromere or telomere.
What is an ‘amplicon’?
An amplicon is a piece of DNA or RNA that is the source and/or product of natural or artificial amplification or replication events. It can be formed using various methods including PCR, ligase chain reactions (LCR), or natural gene duplication. In this context, “amplification” refers to the production of one or more copies of a genetic fragment or target sequence, specifically the amplicon. As the product of an amplification reaction, amplicon is used interchangeably with common laboratory terms, such as PCR product. Artificial amplification is used in research, forensics, and medicine for purposes that include detection and quantification of infectious agents, identification of human remains, and extracting genotypes from human hair. Natural gene duplication is implicated in several forms of human cancer including primary mediastinal B cell lymphoma and Hodgkin’s lymphoma. Amplicons in this context can refer both to sections of chromosomal DNA that have been excised, amplified, and reinserted elsewhere in the genome, and to extrachromasomal DNA known as double minutes, each of which can be composed of one or more genes. Amplification of the genes encoded by these amplicons generally increases transcription of those genes and ultimately the volume of associated proteins.
Oncogene. 2013 Jan 10;32(2):135-40. doi: 10.1038/onc.2012.48. Epub 2012 Feb 20. Clonal evolution of acute leukemia genomes. Jan M1, Majeti R.
In large part, cancer results from the accumulation of multiple mutations in a single cell lineage that are sequentially acquired and subject to an evolutionary process where selection drives the expansion of more fit subclones. Owing to the technical challenge of distinguishing and isolating distinct cancer subclones, many aspects of this clonal evolution are poorly understood, including the diversity of different subclones in an individual cancer, the nature of the subclones contributing to relapse, and the identity of pre-cancerous mutations. These issues are not just important to our understanding of cancer biology, but are also clinically important given the need to understand the nature of subclones responsible for the refractory and relapsed disease that cause significant morbidity and mortality in patients. Recently, advanced genomic techniques have been used to investigate clonal diversity and evolution in acute leukemia. Studies of pediatric acute lymphoblastic leukemia (ALL) demonstrated that in individual patients there are multiple genetic subclones of leukemia-initiating cells, with a complex clonal architecture. Separate studies also investigating pediatric ALL determined that the clonal basis of relapse was variable and complex, with relapse often evolving from a clone ancestral to the predominant de novo leukemia clone. Additional studies in both ALL and acute myeloid leukemia have identified pre-leukemic mutations in some individual cases. This review will highlight these recent reports investigating the clonal evolution of acute leukemia genomes and discuss the implications for clinical therapy.
Nature. 2011 Jan 20;469(7330):356-61. doi: 10.1038/nature09650. Epub 2010 Dec 15. Genetic variegation of clonal architecture and propagating cells in leukaemia. Anderson K1, Lutz C, van Delft FW, Bateman CM, Guo Y, Colman SM, Kempski H, Moorman AV, Titley I, Swansbury J, Kearney L, Enver T, Greaves M.
Abstract Little is known of the genetic architecture of cancer at the subclonal and single-cell level or in the cells responsible for cancer clone maintenance and propagation. Here we have examined this issue in childhood acute lymphoblastic leukaemia in which the ETV6-RUNX1 gene fusion is an early or initiating genetic lesion followed by a modest number of recurrent or ‘driver’ copy number alterations. By multiplexing fluorescence in situ hybridization probes for these mutations, up to eight genetic abnormalities can be detected in single cells, a genetic signature of subclones identified and a composite picture of subclonal architecture and putative ancestral trees assembled. Subclones in acute lymphoblastic leukaemia have variegated genetics and complex, nonlinear or branching evolutionary histories. Copy number alterations are independently and reiteratively acquired in subclones of individual patients, and in no preferential order. Clonal architecture is dynamic and is subject to change in the lead-up to a diagnosis and in relapse. Leukaemia propagating cells, assayed by serial transplantation in NOD/SCID IL2Rγ(null) mice, are also genetically variegated, mirroring subclonal patterns, and vary in competitive regenerative capacity in vivo. These data have implications for cancer genomics and for the targeted therapy of cancer.
A nucleotide comprises…
sugar, phosphate and one of four nitrogenous bases (A,T,G,C)