Topic C & F: Genetics and Molecular Evolution Flashcards Preview

PrelimQuestions > Topic C & F: Genetics and Molecular Evolution > Flashcards

Flashcards in Topic C & F: Genetics and Molecular Evolution Deck (120):

What is paused RNA Polymerase II and what is it's function?

RNA polymerase machinery prepares for gene transcription but sits and waits until genes are needed to be transcribed. A cell might not need a particular gene expressed all of the time, but the paused RNA polymerase allows it to be more rapidly expressed when it is needed.


What is the role of enhancers in transcription and describe a mechanism by which they act?

typically upstream of a gene, may be greater than 1 kb away, they enhancer or promote txn of a gene by interacting with a promoter by bringing sites into direct contact by forming a loop like structure, can act in cis or trans


How does DNA methylation repress gene transcription?

DNA methylation typically occurs on Cs in CpG island and block TFs and other proteins from accessing DNA


Describe two mechanism that can lead to short telomeres

Telomeres aren't capped so they are degraded, ageing causes telomeres to shorten


How does the enzyme telomerase extend the ends?

iono add


Describe an experiment that would allow you to determine whether a long non-coding RNA is critical for cell differentiation. How would you distinguish that RNA itself is or the act of txn is responsible for differentation?

design an shRNA to knockdown the ncRNA in cells. You would need cells where you induce a shRNA that doesn't target your ncRNA of interest as control and you then determine if the knockdown of ncRNA affects differentation. To determine if it is the RNA itself or the act of txn you could use homologous recombination in mice to insert a seq generating a mutant copy of the lncRNA. If you see the same effect of the differentation, it is most likely an effect of the txn. If not, it is most likely an effect of the RNA itself.


When an E.coli is growing in medium containing glucose as its only carbon source, the lac operon txn rate is low but not zero. Why is it likely to be advantageous for E.coli that the lac operon txn rate is not high?

1. The proteins encoded by the lac operon do not provide any function when glucose is the sole carbon source.
**There is more to this question


List three genetic experiments that you would perform in a model organism to determine how a missense allele affects a gene's function.

1. You can do CRISPR-Cas9.
2. RNA knockdown
3. Transgenic rescue


What two types of alleles are the most dangerous and why?

antimorphic or dominant negative where abnormal gene product interferes with the function of the gene and causes a more severe phenotype and neomorphic.

amorphic or null alleles knock out the gene and can be lethal. check answers to problem set.


Briefly discuss why null alleles of a gene are useful to model organism geneticists but also have limitations in terms of understanding gene function.

null alleles provide a complete knock down of a gene allowing you to determine the gene function however a complete knockdown maybe lethal so you have to use hypermorphs and hypomorphs to look at a spectrum of functions


What are ultra conserved elements and why are they of interest?

Ultraconserved elements are extremely conserved genomic regions across evolutionary distant species. Since they are highly conserved , they must have very important function.


**What is the most common form of selection in the genome?



What is the neutral theory of evolution?

The vast majority of evolutionary changes at the molecular level are caused by random drift of selectively of neutral mutants (not affecting fitness).


What effect will a 1.selective sweep have on the time to most recent common ancestry and shape of a genealogy?
2.balancing selection

1.It will increase the time and looking like a star shaped genoalogy (population bottleneck, growth)

2. decrease the time and it will looking chicken legs. (population subdivision)


What is the effect of purifying selection on rates of nucleotide divergence between species? What about positive selection?

decrease (you dont want change)

Increase (favoring change)


Do coalescence events take longer in a larger or smaller population?



Assume that an X-linked recessive trait observed in 2.5% of all of the males. What are the genotype frequencies among the females of this population if the population if the population is in HWE?

For male, q = 0.025 p = 0.975. The allele frequency should be the same for male and female. So genotype frequency is q^2, 2pq, q^2.


What effect will a selective sweep have on linkage disequilibrium?

Increase levels of linkage disequilbrium


Which demography events mimic positive directional selection? balancing selection? How can you discriminate between demographic and selective effects?

Population bottleneck and growth mimic positive directional selection. Population subdivision mimics balancing selection. HKA test avoids this problem by comparing patterns of variability at two regions in the same individuals. The McDonald-Kreitman test compares the ratio of ns and synonmous SNPs within species to the ratio of ns to synonmous fixed difference between species, which should not be affected by the shape of the tree


Define effective population size.

The number of different chromsomes segregating in a population aka the number of breeding organisms of a population.


Studies of global patterns of genetic diversity suggest that (choose one):
a) Genetic variation is highest between populations rather than within population
b) Genetic variation is highest within populations rather than between populations
c) African populations have lowest levels of genetic variation
d) Asian populations have highest levels of genetic variation


In general an average of 85% of genetic variation exists within local populations, 7% is between local populations within the same continent, and 8% of variation occurs between large groups living on different continents.


Diseases may differ in prevalence among ethnic groups due to (choose one):
a) genetic drift
b) natural selection c) new mutation
d) all of the above



Briefly describe Wrights Fst measure of genetic differentiation

Fst is equal to Ht-Hs/Ht
where H represents the average heterozygosity across loci within populations (S) and in the total sample(t)

If Fst is close to zero that means they is little or no genetic differentiation

If Fst is close to one, differentiation is very high

In a natural population Fst is low in humans 7%

This means that 93% of all variability in humans is present in every population.


Briefly describe the Transmission Disequilibrium Test (TDT) for family-based association test.

The TDT measures the over-transmission of an allele from heterozygous parents to affected offsprings. Looking at n families, count the number of informative (heterozygous) parents that pass on a marker allele to an affected child. If this is significantly different from n/2 by chi-square test, then that marker is associated with the disease.
A specificity of the TDT is that it will detect genetic linkage only in the presence of genetic association. While genetic association can be caused by population structure, genetic linkage will not be affected, which makes the TDT robust to the presence of population structure.

Look at heterozygote parents and see which allele was transmitted and which not to the affected and unaffected children. Looking for things linked to your phenotype. Ends up being chi-square dis- tributed – T test


Briefly describe what is measured in codon bias test statistics.

frequency of occurrence of synonymous codons in coding DNA.


You study two closely linked autosomal SNP loci, A (alleles A1, A2) and B (alleles B1, B2). Assume that the allele frequency are A1 (0.8), A2 (0.2), B1 (0.3), and B2 (0.7). Knowing nothing else, what do you predict for the frequency of two-locus genotypes in the pop- ulation? State the assumption you make.
Suppose the frequency of the combination A2 B1 on a chromosome was found to be 0.18. Discuss the implication of this observation. You might wish to include: recombination fraction between A and B, linkage disequilibrium, association, haplotypes, or changes in frequency over generations.

assume hardy weinberg populations
just multiply the frequencies
and assume they are independent

We are observing 3X more than we would expect if these alleles were independent so this implies that there is linkage between the loci. The changes in allele frequency for these are not indepen- dent. loci in LD. A2B1 is a haplotype. Recombination fraction between A and B is low.


Among 500 men genotyped by your classmate, she found the following numbers of individuals with each allelic combination: 45 A1B1, 55 A1B2, 255 A2B1, and 145 A2B2.Are the alleles at the A and B loci associated? Explain.

Allele frequencies based on counts from the 500 men A1=.2 A2=.8 B1=.6 B2=.4
they are associated otherwise you would expect equal proportions of genotypes
To quantify how associated they are use a Chi-square test to come up with a P-value that they are not associated using equal expectations for each genotype with (2-1)(2-1)=1 degree of freedom


Briefly describe ‘homeobox genes’.

genes involved in the regulation of patterns of development (morphogenesis) have homeodomain – when translated becomes homeo-domain – DNA binding


What is meant by ‘long-range regulatory determinants’? How have they been discovered? What and how do they work? Do they acts in ‘cis’ or ‘trans’?

Locus Control Regions – can regulate multiple genes in the region They were discovered by enhancer traps, 3C, etc.
They act in cis
Models for how they work inlude looping, tracking, mix (looping and tracking), linking


What is a ‘complementary group’?

A group of mutations that fail to complement because they affect the same locus


Mobile DNA systems are keys players in genome evolution. Name a mobile DNA element, describe its mechanisms of replication and mobilization, and specify an effect on phenotype or genomic change.

Retrotransposons – ‘copy and paste’, transcribed into RNA and reverse transcribed to integrate into the genome in a new location. Can move far away from original location. If it integrates into a gene it could cause a loss of function for that gene. Over evolutionary time these elements are responsible for adding to the length of the genome


The sole active autonomous mobile element in human beings is a member which class: DNA transposons, SINE, non-LTR retrotransposon, or LTR retrotransposon.

Non-LTR retrotransposons


The human genome is composed to a considerable degree of sequences derived from genomic parasites. Name two such parasites and describe their replication mechanisms.

DNA transposons: use a cut-and-paste mechanism to move from one place to another.
Non-LTR retrotransposons: LINEs, SINEs, the only active ones in human, use a copy-and-paste mechanism. First transcribe DNA into RNA with retrotransposase to make a copy of the RNA into DNA, then insert in a new location.


Briefly explain the terms ‘suppressor screen’, ‘enhancer trap’, and ‘synthetic lethal’.

Suppressor screen: Start with pool of mutants with some phenotype and mutate them again and see which are rescued from the phenotype
Enhancer traps - screens to find enhancers for the expression of a given gene – usually done in cell type specific. Randomly fragments from genome and create reporter vectors to test gene expression activity.
Synthetic lethal: When you are null for two genes and the phenotype is lethal but each individual knock out is not


The genomes of almost every organism are infested with genetic parasites, i.e. selfish DNA sequences, that covalently integrate their own sequences into the host cell genome. Name 3 specific examples and briefly describe their replication strategy.

DNA transposons: Cut-and-paste
LTR Retrotransposons: Copy-and-paste, RNA intermediate, makes its own polymerase
Non-LTR Retrotransposons (autonomous): Copy-and-paste, RNA intermediate, makes its own poly- merase
Non-LTR Retrotransposons (non-autonomous): Copy-and-paste, RNA intermediate, requires anotherretrotransposon to replicate


You are getting ready to perform a genetic screen for fly mutants that lack wings. Thus you will want to obtain mutants that are homozygous viable and can be screened for such phenotype. Explain how a balancer chromosome can be used to identify homozygous mutants.

cross balancer/mut X balancer/mut and the mut/mut genotypes will not have the balancer phenotypes


What are off-target effects of RNAi and how might one minimize such effects?

A siRNA could end up targeting other genes, instead of the intended target gene. We can have multiple siRNAs targeting the same genes, or do cDNA rescue.
Redundancy: design multiple siRNAs for the same target and see if the phenotype is the same Rescue: rescue with an RNAi resistant transcript of the target mRNA


In characterizing an allele of gene X, you find that homozygous mutant embryos obtained from crossing two heterozygous parents appear to develop normally but fail to hatch. Interestingly, an injection of RNAi molecule specific for gene X results in embryos lacking abdomen. Assuming that con- trols are done and the RNAi result is specific to knocking down gene X, how can you explain the result?

Maternal effect – the protein of gene X (or mRNA) was oringally in the placenta of the develop- ing embryo but the embryo couldn’t produce it on its own so with the RNAi the embryo can not develop in the begining as it can when the mother’s gene is available


Of the following types of sequences, which makes up the smallest fraction of the human genome: L1 element, exons, introns, or Alu element.



Why are null alleles so important?

Complete removal of gene product is the most rigorous way to assess what processes require that gene (necessity).
However, for genes that play multiple roles in the organism, the null mutant phenotype may reveal only he earliest essential role, and may not allow assessment of later roles. Hypomorphs, conditional alleles or tissue-specific KOs are useful in this case.
Also, for genes that function redundantly with other genes, the null mutant phenotype may not reveal its function. Dominant negative or hypermorphic alleles can be useful in this case.
Finally, null mutations do not reveal sufficiency. For example, does a gene product actually regulate a process vs. being passively required for it?


In a genetically tractable organism like C. elegans or Drosophila, what criteria are typically used to determine if a given allele is likely to be null?

*Recessive phenotype? (lf alleles often recessive, if alleles usually dominant) *Severe molecular lesion? (large deletion, early nonsense or frameshift) *Reduced gene product? (undetectable mRNA and/or protein)
*RNAi mimics the allele
*High frequency of isolating similar alleles (lf alleles more common, or alleles rarer) *Phenotype of allele is most severe (penetrant, expressive) in 1allelic series’
Look with qPCR for mRNA levels and also look at protein levels with a western
As dosages of the mutant allele increase the phenotype doesn’t change and as doses of the wildtype allele increases the phenotype becomes more wild-type.


What are the limitation of null alleles and what other types of alleles might also be useful?.

May be a lethal phenotype so difficult to study. Hypomophs and hypermorphs would be useful in order to get an ‘allelic series’ and a spectrum of phenotypes of different degrees.


What is the underlying principle of genetic analysis? In other words, how does this approach differ from the approach a biochemist might use to study a biological process?

Biochemical approach: in vitro recreate things in a test tube/ reconstruct desired result/ put things together. (ex. knock-out genes)
Genetic analysis:
• take things apart & see if organism cares
• “Don’t know what you have until it is gone”
• relies on mutations
• start with defect & find the basis of the defect
Advantages of genetic analysis:
• general, can be used for any process, in any organism • predictable, mutations are stable & inheritable
• reveals gene functions that are necessary in vivo.
• Technically simple, conceptually complex
Genetic analysis can be used in 2 ways:
(1) gene identification, function: start w/ process of interest & search for mutations that affect this process.
• gene identification: saturation screen - all genes that can mutate to a particular phenotype modification screen (suppressor/enhancer) - mutations that modify another mutations
• gene function: phenotype, interaction w/ another mutant (2) As tools to study other genes
• markers
change morphology or physiology in a useful way


What is a mosaic?

An organism that has two different populations of cells with multiple genotypes but both arise from the same fertilized egg (eg. result of trisomy rescue, x-inactivation)


What method can be used to generate mosaics?

Transplantation experiments: cells from a blastula stage embryo from one genetic background are aspired out and injected into a blastula stage embryo of different genetic background.
Induce recombination (delete out a gene)
Flip recombinase in a tissue (or Cre-Lox system on subset of cells)


What does it mean to say that a gene functions “cell non-autonomously” for a given function? What result in a mosaic analysis would lead to this conclusion?

cell non-automous function: genotypically mutant cells cause other cells, regardless of genotype to exhibit a mutant phenotype
If in a mosaic analysis you see WT genotype cells next to mutant genotype cells but they are all exhibiting mutant phenotype


Besides mosaic analysis, what other experimental approaches can be used to address the question of where a gene functions?

Reporter constructs of Lacz to see where that gene is being expressed


In a hypothetical genetic pathway, gene 1 knockouts have a fat phenotype while gene 2 knockouts have a thin phenotype. gene 1;gene 2 double knockouts have a fat phenotype:Which gene mutation is epistatic?

Imagine that gene 1 encodes a microRNA. Given the microRNAs often inhibit target gene ex- pression, propose a type of genetic screen that might identify gene 1 targets.

Gene 1 is epistatic to gene 2
G2 − − | G1 −− > thin
G1 −−> G2 −−> thin

Random mutagenesis of gene 1 mutant to rescue mutant phenotype. Loss of function mutation in a gene that rescues mutant phenotype is likely to be target of gene 1 miRNA.


There are about 3000 SVA elements in the human genome. They are composite 2.5-3.0 kb sequences composed of SINE-R – variable number tandem repeat (VNTR)-Alu. These elements have a poly A at their 3’ end and have a target site duplications 5’ and 3’ to the entire insertion. You believe that they are mobile elements inserted into the genome by L1 reverse transcriptase. Describe an experiment to determine whether your hypothesis is correct.

Take one of the mobile elements and fuse to GFP that is “backwards” so if the element is inserted into the genome by reverse transcriptase it will be reverse transcribed and when put back in the genome the GFP will be in the proper orientation and the cells will transcribe gfp.


Identify and discuss five key experiments leading from Mendel’s discovery of “indivisiable factors” governing transmission of traits to the chromosomal theory of inheritance and the establish- ment of DNA as the molecule that encodes genetic information.

Pea plant crosses, independent duplication of work by Hugo de Vries and Carl Correns, Chargaffs rules of nucleotide frequencies, Franklins x-ray crystal structure, Watson and crick DNA structure determination, R. A. Fisher combined Mendelian genetics with Darwin’s theory of natural selection into modern synthesis of evolutionary biology.


Two general strategies used to search for genes underlying complex diseases are testing candi- date genes and carrying out a genome scan for linkage or association. What are the advantages and disadvantages of a candidate approach versus a genome approach?

Advantages of candidate gene testing: Not as severe a problem of multiple testing as in GWAS, the collection of samples is easier since you dont need family data (as in linkage studies) or large sample sizes (as in case-control GWAS), can do functional studies whereas in genome scans much of the problem is pruning functional genes from the results
Disadvantages of candidate gene testing: Selected genes might not be sufficient for the underlying phenotype, especially for complex disease where many genes are causal and each contribute a mild effect, you cannot do an unbiased detection of all genes involved in the disease/pathway of interest


Discuss the contrasting views of common disease-common variant and common disease-rare variant models of complex disease risk.

”Common disease-common variant” says that the disease shared by many people are due to a sin- gle variant with high allele frequency in all the affected individual. GWAS is a good method discover these disease variants.
”Common disease-rare variants” says that these diseases are due to many different rare variants. Fam- ily based linkage studies are good for discovering such variants.


ypothesize mechanisms by which a mutation at or near the most significant SNP could affect gene function. What methods would you use to test your hypotheses?

Mutation at/near the most significant SNP could affect gene function by disrupting the genes en- hancer or transcription factor binding site. To test this hypothesis, evolutionary conservation and transcription factor binding site annotations could be checking in this region. Targeted deep sequenc- ing of the region could reveal the full spectrum of SNPs with significantly different allele frequencies. Expression arrays with genes B and C could be screened in subjects of different allele states for the most significant SNP to see which gene is most affected by the SNP variant.


Describe two approaches one can use to determine the function of Genes A, B, and C.

Knockout genes A, B, or C through siRNA or homologous recombination Overexpress genes A, B, or C by transfection of a plasmid with the gene


List 3 different types of sequence variation in human genome and methods for genotyping those variants. Give 2 examples of how polymorphic variants are used in genetic and genomic studies.

SNPs – SNP array, sequencing
CNVs (insertions, deletions, amplifications, duplications) – SNP array, aCGH Inversions – sequence
translocations – sequence
Used in disease association studies and population genetics to determine evolutionary history


What are some mechanisms by which CNV in humans might cause disease? Give 2 sce- narios.

Gene dosage differences
in imprinted regions
deletion of regulatory region for gene (enhancer)


What is a SNP?

A single nucleotide polymorphism: a nucleotide that can vary within a population


What is a haplotype?

Genotype on one chromosome that is a continuous region and is seen to be inherited together


Explain the concept of linkage disequilibrium (LD) and its importance for GWAS.

LD: non-random association of alleles at 2 or more loci
LD is important for GWAS because it diminishes the number of loci you need to test because you only need to look at one SNP in each haplotype block. They act as makers for whatever is causing the phenotype


The schizo gene maps 20 kb downstream of a de novo chromosome inversion in a child with schizophrenia: how would you investigate if common variants in this gene represent schizophrenia susceptibility alleles?How would you explain the disease phenotype in a child with the inversion breakpoint outside the gene?How would you test the null-phenotype of this gene in the mouse?

Association study with cases and controls
Could mess up regulatory elements, like promoters, enhancers, repressors, etc.
Make a knockout by homologous recombination with loxP sites flanking integration site and express cre.


Explain the basic idea for QTL analysis including the experimental designs, identifying the nec- essary empirical data.

QTL: A chromosomal region (locus) containing a gene(s) that affects a quantitative trait
QTL study in mice: breed 2 different strains with extremes of pheno of interest, inbreed F1, F2 population genotyped at sparse markers, using analysis of variance (ANOVA), determine whether there are significant phenotype differences among these three genotype groups


Discuss the statistical models of QTL analysis.

ANOVA, t and F statistic, LOD score, we test the null hypothesis that the genotype status at a given marker has no impact on phenotype or expression of a given gene.


Discuss the potential problems of QTL analysis especially with respect to mapping determinants of gene expression.

Each imputed association needs proved/validated for causality. knockout genes, regulating mech- anism cell biology biochemistry not scalable.
Each imputed association needs to be proved and validated to find the causality. Possible proving way: knockout genes, and analysis possible regulating mechanism.


Define ‘heterosis (overdominance)’. Describe a known example.

heterosis: the heterozygote has a higher fitness than either homozygote example – sickle cell genotypes


What is a physical map of a genome? How is it different from a (meiotic) genetic map?

Physical map: the actual base pair location on a chromosome
Genetic map: recombination frequencies (cM) – relative position of mutant loci (phenotypic markers) 1 cM is roughly 1 M.


A human genetic linkage study showed conclusively that a disease gene region was flanked by two markers separated by 1 centimorgan. Yet, later it was discovered that both markers localized to the same BAC clone. How can this be?

It’s a region with high recombination frequency


Two pure breeding, phenotypically wild type, lines of flies show different recombination fre- quencies in the region from curved wings to black body on chromosome 2. In the normal strain, the recombination frequency between these genes is 28%. In the abnormal strain, the recombination fre- quency is 8%. Although the pure-breeding lines are fertile, the hybrids produced by crossing the two lines are semisterile.
(i) Decide between an inversion and a deletion as the possible cause for the low recombination frequency in the abnormal strain. Describe your rationale.
(ii) Why are the hybrids semi-sterile?

(i) As the pure-breeding lines are fertile, no genes are lost in the abnormal strain. Inversion could be the possible cause for low recombination frequency. A deletion without harming function would make the loci closer together, decreasing the recombination frequency.
(ii) Recombination event happens between the two lines causes heterozygous offsprings. When recom- bination happens in heterozygous, it results in chromosome messed up and deficient for genes in some of the daughter cells.


Define Structural Variation in the context of the Human Genome

Copy number variation
Segmentation duplex
Insertion Deletion Translocation


What impact did structural variation have on the current build of the “finished” human genome?

The variation are different for every individual, and have to pick one of the variation for the “finished” reference human genome.


What are segmental duplications, and what role do they have in structural variation?

Segmental duplications are segments of DNA with near-identical sequence
Creating new primate genes as reflected in human genetic variation.


Describe two examples of potential evolutionary consequences of structural variation.

1. Expand gene families by duplication and evolution of complimentary functions
2. ALU: creating new exons
3. Icelandic population: inversion variant is undergoing positive selection such that carrier females have more children with higher recombination rates than non-carriers.


Describe an example of a human disease associated with structural variation. Do you expect the mechanism(s) by which the variation affects disease to be similar or different from those associated with typical SNP-type variations? Why or why not?

Philadelphia Chromosome: a specific chromosomal abnormality that is associated with chronic myelogenous leukemia. It is the result of a reciprocal translocation between chromosome 9 and 22.
No. The translocation results in oncongene BCR-ABL fusion, while SNP-type variation won’t cause gene fusion.


in the analysis of a new category of tumor, a chromosomal translocation is discovered that fused portions of two proteins to generate a new fusion protein. When the structure of each orig- inal protein is worked out, it is discovered that all of the translocation breakpoints occur within single introns of each gene. What are the possible explanations for this breakpoint localization? Consider this problem from the issue of the codon structure of the ORF, the modular organization of functional domains in proteins, the ultimate function of the protein product, as well as the sequence of the two introns.

could be a fragile site – introns could have homology so recombine


What is an eQTL?

Expression quantitative trait loci, association of a marker (SNP) to the expression of a given gene region, rather than a quantitative trait phenotype as in the classic QTL.


There are several ways to generate genetic diversity through mating. Explain how natural vari- ation differs from other breeding strategies.

Natural variation involves outbreeding and random mating among diverse populations to maintain many genetic loci with nonzero minor allele frequencies. Other breeding strategies involve mating based on specific extreme traits of interest and inbreeding giving rise to many homozygous genetic loci and allelic homogeneity as in certain mouse lab strains.


Explain how a SNP can generate a cis eQTL.

Quantitative Trait Loci local SNP associated to a certain gene expression level. Cis eQTL can af- fect gene expression by changing promoter region, coding region, UTR.


Explain how a SNP can generate a trans eQTL.

Distant SNP, trans eQTL could be in an enhancer, miRNA, endogenous siRNA, transcription fac- tor coding sequence targeting gene with associated expression, long non-coding rnas


Explain how you would prove a given cis eQTL or trans eQTL was causal.

Gene knockout to show more extreme expression change or regulatory mechanism analysis. Another option is to knock in the SNP to a different inbred or outbred mouse strain to see if the expression is similarly impacted.




Describe the difference between forward and reverse genetics.

Forward genetics starts with a phenotype of interest and works back to find the genotype underly- ing the trait.
Reverse genetics starts with a genotype or gene of interest and investigate the resulting phenotype.


(2011) Pair the model organism with the reverse genetic mechanism:
Drosophila, yeast, mammalian cells
siRNA, P element insertion, homologous recombination
(If a reverse genetics technique is applicable to more than one organism, you can indicate it.)

Drosophila: P element insertion
Yeast: S.cerevisiae: homolgous recombination, S.pombe: homolgous recombination,siRNA Mammalian cells: siRNA(not too long otherwise interferon response), homolgous recombination


What’s the rationale for large-scale reverse genetics screens?

The best way to find what a gene does is to perturb its function (miRNA/siRNA) or knock out the gene (homologous recombination) and screen for a loss of function.


How will you determine if both phenotypes (pigmentation defect and circling) are caused by mutations in the same or different genes?

We are trying to distinguish between the following scenarios:
One gene: m/m +/+ two genes: m1/m1 m2/m2
One gene: m/m X +/+ → m/+ two genes: m1/m1 m2/m2 X +/+ +/+ → m1/+ m2/+
First cross the mutant to a wild type mouse. Then perform a test cross on the F1 progeny. If a single gene causes the phenotype, of the offspring should be m/m and WT (m/+ or +/+) assuming a recessive mutation.
If two genes cause the phenotype, expect a smaller percentage of offspring double mutant (1/16) as- suming genes not linked or on different chromosomes.


You discover that both phenotypes are caused by the same missense mutation in gene X. To further study this gene, you want to make a knockout allele. Describe the steps you will take to accom- plish this. In your answer, including a drawing of the targeting construct you will make, information about how you will identify ES cells with the knockout allele, and a breeding scheme for obtaining the knockout mice.

Employ a Cre-Lox system to knock out the gene by site specific homologous recombination. Put the construct under control of an enhancer bound by transcription factors specifically expressed in the embryonic stage with a GFP reporter downstream on the construct. Transfect construct into ES cells. Indentify ES cells with the knockout allele by screening offspring expressing GFP. Implant ES cell into female to develop, newborn mice are chimeras crossbred with wild type, some offspring one copy knocked-out with all cells heterozygous, offspring interbred some knocked-out both parents so homozygous knockout.

WT Enhancer cre loxp gene x loxp gfp


How would you create a transgenic mice with a null-mutation of gene Hyp in the heart but nowhere else.

Homologous recombination to get vector with lox sites flanking the Hyp gene into the genome.
As a reporter have GFP downstream of the lox site so when cre is expressed the GFP is expressed and the Hyp gene is knocked out. (see wikipedia)
Use a heart specific promoter to drive the expression of cre in the heart only so that the Hyp gene is popped out of the genome and the heart is null for Hyp


You have cloned a human disease gene Sick. A BLAST search of the mouse genome with Sick sequence identified three homologous genes on different mouse chromosomes. Suggest a way to identify the most likely functional (mouse) ortholog of the human gene Sick (empirical and/or compu- tational approaches).

See which one folds in the most similar way
Mutate each gene individually and see if one gives a Sick phenotype
Look computationally conserved domains or motifs that are known to be functional in Sick See if interacts with same set of protein (ChIP) or networks or yeast two-hybrid


You examined 20 inbred strains of mice for a specific sleep phenotype (fragmented sleep) and found that 4 strains have significantly more fragmented sleep than other inbred strains. Assum- ing that fragmented sleep is genetically determined, suggest an experimental approach to find genes responsible.

Lots of crosses with strains with different polymorphic backgrounds


What is the difference in the type of alleles that one obtains from a genetic screen using mutagen like DEB that cause a small 50-100 base pair deletions versus using EMS that causes a point mutation.

With DEB the deletions may cause more severe phenotypes whereas with the EMS you can get a more complete spectrum of phenotypes (allelic series).


To start on your thesis project to identify new pathways involved in the specification of Drosophila eye, you are going to carry out an EMS mutagenesis-based screen looking for recessive mutants that have morphological defects in the eye. However, your P.I. made a mistake and fed the flies an EMS 10 times the supposed dose. Which of the following (more than one may be correct) are likely outcomes of this mistake:
The male flies will die at higher frequency than normal.
• The frequency of lethal mutations will increase, and thus decreasing the number of homozygous
viable lines that are useful for your screen.
• All of the interesting mutant alleles that you isolate are going to have a higher frequency of background mutations.
• In general, the allele you get from this screen are going to be 10 times stronger than those obtained with proper dose.
• This will not affect the screen in a detectable way.



Define the difference between forward genetics and reverse genetics.

Forward genetics: starts with a phenotype and looks for mutations that give that phenotype
maja is looking for mutations in mice that give an autistic phenotype. She performs ramdom mutagenis with ENU and chooses those mice that give the phenotype of interest. Then she crosses the mice a lot to a different background so that she can narrow down the region that is responsible for the phenotype.
Reverse genetics: you do targeted mutagenesis (knockdown mouse, siRNA knockdowns)
Hogenesh made a 96 well plate of siRNAs and put yeast in each well to knock down the known 96 genes. Then under the test conditions he looks at the phenotype of each and can determine the phenotype associated with each gene knockdown


In a forward genetic screen, you have identified a new single-gene mutation in the mouse that increases locomotor activity by three fold.How would you identify the gene causing the mutant phenotype?How would you prove that the identied DNA alteration is responsible for the observed mutant phenotype?
Suppose your efforts (points a and b) result in the identification of a mutant gene encoding a novel transcription factor. How would you, on a large scale, identify genes that are regulated (downregulated or upregulated) by this transcription factor?

-positional mapping – cross to different genetic background and keep those with mutant phenotype,continue crossing to minimize region through many recombination events that contains the mutation
-recover with WT transgene – introduce the mutant gene with homologous recombination and show that that specific DNA alteration is causal for the phenotype
-ChIP-chip or ChIP-seq to identify targets then do expression array in presence and absence of the transcription factor and look to see if those targets are upregulated or downregulated


A woman has cystic fibrosis, which is an autosomal recessive trait (assume that its very rare in the general population). She is the only person in her family with this disease.
What is the carrier risk for her mother, her father, her daughter, her daughters son, her brother, and her brothers child?

Carrier risk: probability of being heterozygous Mother: 1 Father: 1 Daughter: 1 Daughters Son: .5 Bother: .66 Brothers Child: .33


inheritance of an autosomal dominant

o “Vertical transmission” with affected individuals in each generation
o Affected children usually have an affected parent
o An affected individual partnered to an unaffected individual has ~equal numbers of affected and unaffected children
o Affects males and females equally
o Can see male-to-male transmission of the trait


inheritance of an autosomal recessive

o Trait skips generations or appears “out of the blue”
o Affected children usually have unaffected parents
o Pedigree may contain consanguinity (e.g. marriage between cousins)
o Affects males and females equally
o An affected individual’s children are all carriers
o If both parents are affected, then all the children are affected


inheritance of an x-linked

o For a recessive, see more affected males than females
o For a dominant, may see more affected females than males
o For a recessive, sons of affected females are always affected whereas daughters are not
o No male-to-male transmission of the trait


Complex or non-genetic?

o Trait does not follow one of the above patterns
o Monozygotic twins are “non-concordant” – one affected, one not


After doing a linkage analysis to identify a disease gene locus for a Mendelian disease, you find a maximum lod-score of 2.8 for a certain marker at a recombination fraction of 0.05. Give a possible interpretation of these findings. What would you do next in your study?

LOD ¿ 3 is considered significant for linkage so this is a suggestive signal. We could follow up by recruiting more families and see if this signal can be repeated in independent replication. Also we can investigate the function of the corresponding gene.


It has recently been recognized that human genome can usefully be viewed as organized into ‘haplotype blocks’. Describe what is meant by ‘haplotype blocks’ and specify some of the associated genetic properties.

haplotype blocks: a chunck of DNA that is in linkage disequilibrium
Tend to be inherited together – unlikely to have recombinations
If you know one polymorphism in the block you can determine others. So in GWAS study, few tag SNPs within each haploptype region are enough


If two mutation ‘a’ and ‘b’ are located 10 map units apart, what proportion of ‘ab’ gametes will be produced by a heterozygote of genotype a + / + b?

5% because 10cM corresponds to 10% recombination, and half of the recombined gametes would be ab, the other half would be ++.


What is the maximum recombination frequency observable in a genetic linkage experi- ment? Why?.

1/2 – you can only observe odd recombinants


After linkage analysis has mapped a mutation to a reasonably small region (e.g. contain- ing 20 genes), what other methods can be used to determine which of the gene is the one affected by the mutation?

Look for genes related to the phenotype you are looking at in UCSC Sequence and look for mutations in the genes
if in mice: try to rescue phenotype with WT transgene


Explain why the rates of evolution in different proteins vary eventhough the rate of a single nucleotide mutation in DNA occurs at a constant rate.

Some sites are under purifying selection so they would be weeded out of the population and not persist through the generations. Other sites are under positive selection and would ’allow’ more mu- tations to fix in those regions.


Do all nonsynonymous base pair substritutions lead to abnormal phenotype? Give an example.

No – changing the amino acid in a non-functional part of the protein (meaning not a binding or catalytic site) to another amino acid that is similar can have no phenotypic affect


Under what condition is genetic drift more influential relative to natural selection on changes in allele frequencies?

Small population size. Randomness cause huge difference


What are the necessary and sufficient conditions for natural selection?

1. There must be variation in the population.
2. That variation must be heritable.
3. The variation must affect reproduction or survival.



missense start codon frameshift



TATA-box mutation promoter binding
silent mutation



polyA tail signal (AAUAAA) (mRNA stability?) 3’ UTR (miRNA targeting?)



splicing (although it is AG/GU, where AG is at the end of exon, GU is in the beginning of the intron)



nonsynounmous missense



TAA: nonsense



Could be consensus intron sites?


Explain this lag in terms of viral evolution, immunology, and production technology.Suggest strategies to reduce the production lag.

need to analyze the virus to see how it evolved, need to develop vaccine, grow it, test safety, etc.
predict viral evolution so ahead of the flu


What is the dn/ds ratio of a coding sequence?

The ratio of non-synonymous to synonymous changes


Explain the use of dn/ds ratio to infer selective history of genes.

dn = the number of non-synonymous substitutions (changes the amino acid)
ds = number of synonymous substitutions (does not change amino acid)
Used to infer the selective forces acting on a region of the genome. If ratio is >1 the region is under positive selection and if ratio is less than one the region is under purifying selection.


What are segmental duplications and how do they contribute to mutation and variation in the human genome? How might they be related to genome evolution?

A segmental duplication (greater than 1Kb) is a region of the genome that have greater than 90% homology and are hypothesized to arise via a duplication.
Because of the homology it can lead to non-allelic homologous recombination.
They increase the rate of subsequent mutations – often end up with CNV regions.
They increase the size of the genome and allow for formation of gene families if the duplications get different mutations over time.


Aside from mutations (which in this context includes all the ways that a genome changes, not just point mutations), list the 4 other known processes of evolution that is, the 4 additional processes that can change allele frequencies in a population.

genetic drift
gene exchange
sexual selection
artificial selection
horizontal transfer (genetic materials introduced from one organism to another)


Comparative Genomic Analysis of 20 vertebrate genome sequences encompassing part of an ENCODE region revealed a large Multispecies Conserved Sequence (MCS) in what is thought to be an intergenic region. Give two possible explanations for the MCS and how you would experimentally test them.

regulatory regions (enhancers, insulators, ncRNAs, etc.) see if the regions are transcribed
GFP construct to see if enhancer


Most miRNA have no known function. Comparative expression analysis across tissues, disease states etc., as well as high-throughput genomic screening can help in determining their functions. Give a pro and con for both of these approaches.

Pros: Identifying interesting gene sets.
Cons: Need further validation of the real affecting mechanism.


A comparative genomic experiment finds a highly conserved region of the human genome in a gene desert, megabases away from the nearest gene. Provide a hypothesis for why this region is conserved and design an experiment to test your hypothesis.

Hypothesis: It could be enhancer
1. tagged 3C, or FISH to see if the conserved region located around a gene.
2. Can also ChIP of putative enhancer marks, and crosslink followed by sequencing and see if the conserved region act as an enhancer.
3. Transgeneic the conservation fragment and see if it enhance the reporter gene.


Define a ‘segregation site’.

Segregating site: mutations present in a population (cluster into clades in a tree)