Topic D & E. DNA Sequencing... Function Genomics Flashcards

Question

Heterochromatin

Answer 1

Closed inactive DNA, tightly coiled and not actively transcribes. Classically associ- ated with methylation of and methyl transferases

Answer 2

Structures of eukaryotic chromosomes that serve as the attachment for the spindle appa- ratus during mitosis. Highly repetitive, and separates long arm for short arm in human chromosomes.

Answer 3

Sequences toward the end of chromosomes that contain mainly simple repeats and du- plicates. They prevent chromosomes from fusing with each other by forming tertiary structures that protect termini. They are interestingly not replicated by polII, but rather their own telomerase.

Answer 4

finished genome repaired many, but not all, of the gaps in the draft sequences. some heterochromatic gaps, gaps at eukaryotic boundary regions and interior regions remained. Finished genome increased continuity with an increase in N50 contig size. the finished genome corrected order and orientation of draft contigs and eliminated artefactual sequence duplications.

Answer 5

telomeric DNA is more condensed and contains many repeating sequences that are hard to assemble with short reads

Answer 6

Rare alleles, interaticons, environmental. maybe many variants with smaller effects are acting together rather than one or two variants with large effect size. make sure there wasn’t population stratification underlying your study.

Answer 7

hierarchical sequencing. you want to use technology that allows for longer reads and paired end reads because the genome is so highly repetitive

Answer 8

RNA sequencing with rRNA depeletion instead of poly A selection for library prep because it captures non poly adenylated RNAs and can measure relative expression levels of these novel RNAs.

Answer 9

changes C’s to U’s in unmethylated sites but C’s are unchanged in methylated sites. The green signals indicate sites where methylation patterns aren’t significantly different in normal versus tumor cells, as the singals in the bottom two panels are similar. The red regions indiciate regions that are significantly more methylated, or repressed, in tumor cells but not in normal cells.

Answer 10

look at the Chip Seq peaks from the encode database for your cell type or tissue of interest and overlay it with RNA seq data. Look to see that you are ssing the dysregualtion of the same genes. Look at Chip seq peaks for normal tissue and tissue of interest to identify significant differences in peaks at transcription factors, inidcating up or down regulation of a transcription factor that may effect expression of your gene of interest

Answer 11

aligning multiple binding information. look at how prevalent SNP is across similar tissues or conditions comparted to surrounding SNPs. look to see if SNP is in funtional or non function region. is it in an exon of a gene? look to see if there are any ohter nearby regulatory markers that may affect this SNP but not the others.

Answer 12

chromatin modifications near promoters seem to be similar irrespective of cell type chromating modi- ficantions near non redundant enhancers seem to be more variable and more cell type specific

Answer 13

(600, 000, 000 × 2 × 150 × 0.75)/1, 900, 000, 000 = 71

Answer 14

1.7Gb × 10(10Xcoverage)/100, 000(100kilobases) = 1.7 × 105 BAC: up to 200Kbs, more commonly used YAC: really huge inserts, sometimes 1Mb

Answer 15

Finished has as few gaps as possible by focused strategies, high quality of base calls q>40 (

Answer 16

Repetitive regions which cannot be resolved by relatively short read sequences Heterochromatic regions: hard to sequence Multigene families that have a lot of structural similarities but polymorphisms between individual gene members Structural variations: inversions, segmental duplications, insertions, deletions

Answer 17

Constitutive heterochromatin is composed mainly of high copy number tandem repeats known as satellite repeats, minisatellite and microsatellite repeats, and transposon repeats

Answer 18

Draft has many more gaps, less continuity, more incorrect order and orientation of draft contigs, more artifactual sequence duplications, segemental duplications and structural variations unresolved. This provides a finished genome definition for experimentation.

Answer 19

SNP calling | Simple sequence motif matching

Answer 20

Paired-end sequences are derived from opposite ends of the same BAC clone (or general reference sequence). Having sequences from both ends of a BAC clone is important for arranging the relative order of clones or contigs in producing a tiling map for assembling high-quality draft genome and resolving gaps. Connections aid assembly and allow inference of full sequence.

Answer 21

Next-gen sequencing reads are very short and thus certain regions of the genome are unlikely to be accurately constructed. This includes repetitive regions (such as pericentromeric or peritelomeric) and regions with high frequency structural variations. To accurately sequence such regions, longer reads are necessary.

Answer 22

pyro sequencing, single molecule sequencing...

Answer 23

Aligning the reads even with sequencing errors – determining true variation compared to sequence errors. Reads that map to many regions of the genome.

Answer 24

draft genomes are in scaffolds and contigs 1 - lengths of repeats that may be missed in the draft 2 - actual genetic positions, full chromosomes, etc. 3 - SNPs vs. Sequencing Error.

Answer 25

create DNA libraries form a tiling path by aligning overlapping fragments sequence individual clones along the tiling path assemble draft genome quality scores, coverage, resequencing,

Answer 26

Reads too short, especially harder to map repeat regions Expensive Less accurate than sanger More prone to error in poly-nucleotide sequences (e.g. 5 A’s v. 6 A’s in a row)

Answer 27

1. No PCR errors 2. No PCR “jackpots” 3. get the haplotype directly

Answer 28

1. We can detect CNV, segment duplication by distance variation between paired-end reads. 2. Paired-end read can deal with assembling problem better than normal next-generation with short reads. Thus it can determinate SNP variation from mis-mapping. 3. It can also identify haplotype, if pair of reads both identify SNPs

Answer 29

Reads are too short, hard to close the gap, and cannot escape repetitive region, thus remain chal- lenge to assemble.

Answer 30

1. ChIP 2. PCR 3. Target capture for interesting regions

Answer 31

ABI SOLiD is sequencing by ligation whereas Illumina and 454 are sequencing by synthesis.

Answer 32

Alu is reactived in humans under positive selection, inferring functionally significance. We can compare the genomes from humans and chimps, looking for regions under strong positive selection by calculation Ka/Ks ratio in the genome or doing Tajima’s D test.

Answer 33

make new promoters, make new alternative splice sites, etc

Answer 34

We can look for open chromatin, nucleosome shifts for DNA, should see peak of H3K4me3 peak over promoter indicates activity, should see H3K36me3 over gene body showing that it is being actively transcribed. ChIP-seq of RNA pol II

Answer 35

Project 1: The lab has identified a new human transcription factor Maniac, but nothing is known about its regulatory targets. Your first project is to identify all regulatory targets of this novel transcription regulator.

Answer 36

We can perform ChIP-seq for H3k4Me and use MACS for peak calling. Then we compare the binding peaks to test if they are highly overlapped compared to random (Fisher’s exact test). Also, co-IP which is a bit more complicated in terms of bench work.

Answer 37

Low cloning efficiency

Answer 38

homologous recombination (HR), double stranded RNA (DS), short hairpin RNA (SH), or small interfering RNA (SI), CRISPR

Answer 39

Fussion – binds receptors on host cell and releases RNA and proteins upon degredation of capsid Reverse transcription – RNA → cDNA Integration – transports into nucleus and inserts itself in the host genome Transcription from genome – translates viral DNAs using host machinery Budding – progeny virions pinch off They reverse transcribe their RNA and integrate as cDNA into host genome – others viruses don’t integrate into the host genome

Answer 40

1) Create DNA libraries of genomic 50-200kb 2) tiling path by aligning overlapping fragments 3) sequence individual clones along the tiling path 4) assemble the draft genome 5) finishing – filling in gaps, increase quality of reads, etc. 3 ways to test the assemble or completeness accuracy based on STS and EST information look at BAC clone fingerprints quality scores, resequencing coverage

Answer 41

The Rhesus monkey serves as an outgroup when comparing this 3-taxon phylogeny. It allows us to determine what was present in the MRCA of the three primates. We can see what are the human specific mutations as well as chimp specific, etc.

Answer 42

We recruit DNA samples from 3000 affected patients and 3000 healthy controls. All these DNA samples were genotyped on Illumina 1M SNP microarray. All the genotypes for 1M SNPs were called by GenomeStudio software. QC was performed to remove low-quality samples and SNPs. After that, we exclude duplicate and relative samples and match controls with cases in ethnicity to avoid population stratification. Then Chi-square test can be applied to detect SNP association with the disease.

Answer 43

If population structure or substructure have been detected, we can use logistic regression with few principal components as covariant to identify which SNPs are significantly associated with the disease. If cases and controls are well matched in ethnicity, we can use Chi-square test or Fisher’s extact test to discover disease-associated SNPs. Since there is multiple testing issue, we need to correct the association P-value by FDR control or Bonferroni correction.

Answer 44

Genomics: DNA sequencing to find polymorphisms Transcriptomics: RNA-Seq, microarrays Proteomics: Mass Spectrometry, Yeast 2 hybrid Epigenomics: ChIP-Seq, ChIP-Chip to screen for genome-wide epigenetic changes or transcription factor loading

Answer 45

Single cell RNA-Seq: study the gene expression in individual neuron cell. Single cell DNA-Seq: study the mutation in individual cancer cell.

Answer 46

Study the organization of chromosomes and interaction of the chromosome regions.

Answer 47

CpG islands are genomic regions that contain a high frequency of CpG sites (¿200 bp and GC% ¿ 50%). CpG islands typically occur at or promoter region. The gene is repressed when methylated and activated when unmethylated.

Answer 48

A systems biology approach differs from a reductionist approach by considering all players in a pathway and all pathways involved in a phenotype or trait of interest, whereas a reductionist approach focuses on single gene/single pathway methods to study biological phenomenon.

Answer 49

``` RNA editing (C→U or A→I(G) by deamination) Insertions, deletions, or other structural variations between the sample and reference Technical error in the base call Incorrect mapping of the sequencing read ``` Technical error in the base calls. Next generation sequencing has a higher error rate per base calls than Sanger sequencing despite partially compensating by higher read depth. Typically targeted sequencing of the discordant base is done by Sanger sequencing for validation by independent technology and some are identified to in fact be concordant to the reference genome. Check if the mismatch is tending towards the end of sequencing reads, where the error rate tends to be higher. Do Sanger sequencing on the same samples where the discordance was observed.

Answer 50

Negative control: a same vector but without any cis elements Positive control: a known cis transcriptional activators, An empty vector to control transfection efficiency A known repressor for comparison of expression levels

Answer 51

Perform HITS-Clip sequencing or small RNA sequencing. HITS-Clip sequencing is done by UV cross linking RNA to protein and immunoprecipitation of Argonaute proteins. After purification and re- versing cross links, make libraries out of extracted RNA molecules which should have come from mRNA-microRNA duplexes. After sequencing these, map them to the human genome and look specif- ically for matches to the 3 UTR regions of known genes. Look for differential gene expression between the normal and breast tumor tissues. Small RNA sequencing is done by isolating the population of small RNA in the cell by gel electrophore- sis separation and cutting out a band around 18-22 nucleotides. After sequencing, map the reads to the reference human genome and look for differential expression between normal and breast tumor samples.

Answer 52

FPKM or RPKM. Fragments Per Kilobase of transcript per Million fragments/reads mapped. This normalizes the read/fragment counts based on total read/fragment counts and gene/exon/fragment.length. Generally, people recommend 100 million reads.

Answer 53

To characterize mRNA and long noncoding RNA, isolate out respective populations (by poly A tail selection and size selection respectively) and create libraries, subsequently amplify and subject to high throughput sequencing by paired-end to allow identification of gene fusions. After mapping paired-end sequences to the human genome, measure/analyze: Transcript levels: Measure FPKM levels of fragments (number of reads per kilobase of exon model per million mapped reads) to quantify the expression levels of individual gene transcripts. Gene fusion: Look for reads whose paired ends map to different genes to identify possible gene fusions. Alternative splice forms: Align read using a splice aware aligner such as Tophat to look for reads that came from 2 different exon-exon junctions. Alternative promoters: Peaks or regions of high density coverage upstream of known promoters to look for alternative promoter usage. Use Cufflinks. Alternative poly A addition sites: Extension of the 3 UTR sites relative to known 3 UTR sites by looking for the presence of reads downstream of known 3 UTR sites. Use Cufflinks.

Answer 54

Look for conserved mutations in all samples. Conserved mutations are likely to be driver mutations. These mutations should have low levels of heterozygosity if they are recessive and if they are domi- nant, the should have low minor allele frequencies. Another approach is to evaluate the dn/ds ratio of mutations and look for those under purifying/negative selection (dn/ds

Answer 55

Look for microRNAs that are differentially expressed in the tumor tissue vs. normal samples and then using public databases, see if they have any known gene targets. If not, leverage mRNA seq datasets and see if any genes have high sequence complimentarity in their 3 UTR regions to the miRNA seed sequence. Further, see if expression of the miRNA is negatively correlated with the mRNA across the 20 samples. Validate the miRNAs with rescue of the WT gene or gene resistant to miRNA identified. See if the phenotype is rescued. For miRNAs found to be differentially expressed (higher expression in tumors) validate to see if transfection of the WT gene prevents metastasis, the genes targeted by the miRNAs in this case are expected to be tumor suppressor genes. For miRNAs found more expressed in normal miRNAs, see if transfection of the WT gene induces metastasis. Expect miRNA in this case to be targeting a proto-oncogene.

Answer 56

annealing temp GC content (stability) folding over on itself (palandromic sequences) – hairpins location of primers in genome – coverage uniquely mapping to genome perfect match and imperfect match (to test for background binding) need multiple spots for each

Answer 57

Top layer: Biological Process, Molecular Function and Cellular Component Divides genes into 3 categories. A gene can be in 1-3 categories

Answer 58

The set of all RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA produced in one or a population of cells.

Answer 59

Quantify RNA molecules. Indicate expression level.

Answer 60

“Whole Transcriptome Shotgun Sequencing,” refers to the use of high-throughput sequencing tech- nologies to sequence cDNA in order to get information about a sample’s RNA content Advantages: 1. identify RNA sequence, can discover novel transcript. 2. get the sequence read directly 3. higher coverage Disadvantages: 1. expensive 2. biased to high abundance transcripts 3. need to align

Answer 61

Knock out checkpoint genes, and analysis RNA expression genes that is blocked in a certain phase. Synchronized the cell population? FISH for chromosomes and stall cell cycle whenever during a certain cell cycle phase, and analyze the gene expression

Answer 62

PCA Hierarchical clustering Pathway calling for the clustered gene list. (gene list enrichment for pathway)

Answer 63

Cancer cell diversity. The millions of cells that make up the lump have become distant relatives.

Answer 64

Passenger mutations do not have any effect on the cancer cell, but driver mutations will cause a clonal expansion. A driver mutation is causally implicated in oncogenesis. A passenger mutation has not been selected, has not conferred clonal growth advantage and has therefore not contributed to cancer development. Driver mutations cluster in the subset of genes that are cancer genes whereas passenger mutations are more or less randomly distributed. - “The cancer genome,” Nature 458, 719-724 (9 April 2009) 1. Find the gene sequence that is consistent but random among the tumor population 2. the driver gene sequence should be consensus within tumor, but across tumor (?) 3. We can distinguish by looking for orthologous genes or genomic region with tumors from mul- tiple species with the same type of cancer.

Answer 65

Do single cell analysis Do controlled experiment where you do mixtures of cell population as well as homogenous cell popu- lations and do arrays – use data to computationally predict mixture models for tumor samples.

Answer 66

16S rRNA sequencing with primer on the conserved regions extending into variable regions which identify the microbe species. The 454 sequencing platform is preferable due to the long read length compared to the Illumina sequencing platform. The 16S rRNA gene is highly conserved and also has certain variance among different microbe species. So sequencing 16S rRNA is easy to quantify the microbe abundance in a single assay.

Answer 67

We can calculate the relative abundance of each by normalizing the read counts with gene length and sequencing depth. The Sharon Index can be used to measure the diversity.

Answer 68

We can do whole genome short gun sequencing, and then de novo assemble the genomes. By com- paring to the existing databases, we can identify novel species.

Answer 69

MALDI-TOF: combine a matrix-assisted laser desorption/ionization source with a time-of-flight mass analyzer. LC MS/MS: Similar to gas chromatography MS (GC/MS), liquid chromatography mass spectrometry (LC/MS or LC-MS) separates compounds chromatographically before they are introduced to the ion source and mass spectrometer. Tandem mass spectrometry (MS/MS). A mass spectrometer consists of three components: ion source, mass analyzer, and detector. The ionizer converts some portion of the sample into ions. An extraction system which removes ions from the sample and gives them a trajectory which allows the mass analyser to sorts the ions by mass-to- charge. The detector, which measures the value of an indicator quantity and thus provides data for calculating the abundances of each ion present.

Answer 70

IP mass spectrometry pulls down a protein of interest by immunoprecipitation through a bead/column or antibody. Next, mass spectrometry is performed on the pulled down protein complex, which likely contains proteins interacting with the protein of interest. IP mass spec is more likely to capture true biological binding partners that act in vivo whereas yeast 2 hybrid is subject to possible false positives arising from artificial interactions that do not happen in ivo especially if the protein is not endogenously a yeast protein interacting in the nucleus.

Answer 71

The idea of centrality is that there are certain hubs in protein protein interaction networks that have many interacting partners and deletion or knockdown of these central genes is often lethal.

Answer 72

Matrix-assisted laser desorption ionization - time of flight The matrix protects the protein from direct contact with the laser. Ionize the proteins by transfering the energy from the laser to the matrix and to the protein. The time of flight is the amount of time it takes for an ion to fly from point A to point B (determined by mass/charge)

Answer 73

Yeast-two hybrid with library of domains Co-immunoprecipiation and mass spec

Answer 74

They are modular

Answer 75

Protein profiles give you actual protein expression levels and nucleotide microarrays give you tran- script levels. Advantages of protein profiling: Gives you protein levels instead of transcripts (b/c don’t always correspond) Can look at PTMs Disadvantages: You can’t amplify proteins – you can generally only detect abundant proteins Expensive and you can only look at things with existing antibodies. Can only probe known proteins.

Answer 76

Many short hairpin RNA molecules that target many genes in the human genome. To ensure speci- ficity, a given gene may have many individual shRNA target sites. In addition, the shRNA may target different regions of a gene, such as different sequences in the 3 UTR. Includes efficient delivery system such as a lentiviral vector.

Answer 77

Infect each of the cell lines with the library of shRNA and monitor cell death after introduction of BRAF inhibitor therapy. Cell lines that die are those where after introduction of a shRNA that targets a gene are no longer able to resist BRAF inhibitor therapy. Recover the specific responsible shRNA by testing and narrowing down subpools of the shRNA library or assay each shRNA in a separate well of a plate. To ensure identified genes in the primary screen are real, perform repeated experiments on all 20 cell lines with addition of shRNA that targets the gene in different regions of the transcript. See if the wild type phenotype (or resistance to BRAF inhibitor therapy) can be rescued by transfection of the WT version of the gene without infection of shRNA or transfection of a version of the gene resistant to the shRNA identified.

Answer 78

Try to rescue the WT phenotype by introducing the WT version of the gene or introducing a version of the gene resistant to the shRNA identified. Knockdown the gene through a more stable pathway such as site specific homologous recombination to see if the phenotype identified in the screen is consistent and persistent.

Answer 79

BRAF inhibitor resistance in melanoma involves a central or hub gene that implicates many path- ways. Thus, there is a lot of cross talk between different pathways. This also suggests that this pathway is important and that knockdown of too many genes may be lethal. Depending on the specific nature of the pathways. Lethal for melanoma specifically and not lethal for normals cells is critical. Target combinations of the 30 target genes and observe which one gives the most effective pheno- type of selective death of melanoma but not normal cells. Be careful not to target too many genes in the same pathway or genes that have similar functions in different pathways as this might induce a synthetic lethal mutation.

Answer 80

Use siRNA library targeting human genes to screen the responsible genes. Make sure each gene has several siRNAs targeting it. Transfect the hedgehog reporter cell line with siRNA library. We also need some controls, such as reporter cell line only and reported cell line with ligand.Find the genes that decrese the fold change after adding the hedgehog ligand. Positive Controls PC-S: This is the positive control for silencing. The PC-Ss are siRNAs that induce a high level of gene knockdown, they are NOT involved in the pathway you are studying and should not target genes that affect cell proliferation or survival (e.g., GAPDH or beta-actin). The PC-S will simply provide information on the efficiency of the positive knockdown in the screen and will NOT be used in the statistical analysis of your data. PC-A: This is the positive control for your assay. The PC-As are siRNAs that should induce your screening phenotype and CAN BE used in the statistical analysis of your data for evaluating hits. The PC-As should target known genes in your pathway and it is very important to test several potential PC-As to find one that produces the desired phenotypic change at the levels you require. PC-A2: It is often very helpful if a second PC-A (we’ll call it PC-A2) is used that induces a moderate phenotypic change in the assay. This control will not be used in the statistical analysis of the data but can be used as a phenotypic marker to evaluate the results. Negative Controls NC-NT: This is the Non-targeting negative control and establishes the baseline for your assay. The NC-NT measures the changes siRNA delivery can make on gene expression. In most cases, this should be a nonsense sequence with no complementary to known genes and should have no effect on your assay results. NC-NsiRNA: This is the nontransfected negative control and contains only seeded cells with no trans- fection reagent/siRNAs. The NC-NsiRNA can be used in conjunction with the NC-NT to determine if siRNA delivery affects assay results. NC-T: This is the treated negative control and is only used in experiments having additional treat- ments (drugs, chemicals, etc.). The NC-T serves as the baseline for effects of treatment alone on cells. NC-NC: This is the no-cell negative control. The assay wells are treated with all reagents used during the experiment and measures non-specific signals from these reagents. This control is considered un- necessary for most screens.

Answer 81

Chromatin immunoprecipiation on an array. Used to pull down with antibodies and look for enrich- ment of what regions of the genome are bound to that protein. Often used for identifying transcription factor binding sites, regions with certain chromatin modifications, etc.

Answer 82

One well with no siRNA use known knockout phenotype use siRNA that does not target anything

Answer 83

Chip-chip, look for DNA methylation (sulfonate), mRNA expression array, SNP calling

Answer 84

ChIP-chip, ChIP-seq

Answer 85

cDNA (and promoter) arrays (e.g. Brown/Botstein): pros: Prevalent infrastructure, cheap to produce in your own lab, sensitive cons: Variable quality, cross hybridization Long oligonucleotide array (e.g. Agilent): pros: Sensitive, commercially sourced, high density cons: Cross hybridization, density, cost Small oligonucleotide arrays (e.g. Affymetrix): pros: Extremely high density, multiple independent measures, open-source analysis algorithms, base level discriminance cons: Cost, sensitivity *Exon arrays The longer the probe, the higher the probability that another gene will have similar sequences that could cross hyb. i.e. think about a genome length probe, almost any sequence would cross hyb some- where. The shorter the probe, the fewer base pairs that hybridize, the less sensitivity of detection. Affy tries to get around this by having many of them.

Answer 86

siRNA screen for cell line with and without hedgehog

Answer 87

Compare with a background (input) sequencing Experiment: whole genome pull-down by CTCF, digest out the unbinding region, and do next- generation sequencing to identify the binding sequence. *cross-link DNA

Answer 88

No. | Binding by CTCF with some other stimulation, co-activators

Answer 89

1. Cohesion itself is regulating some gene by binding the chromotin 2. Combinatorial effect by cohesion 3. Cohesin havs regulating fuction with other transcription factors

Answer 90

High throughput screen of all the protease inhibitors that your company makes

Topic D & E. DNA Sequencing... Function Genomics Flashcards

(121 cards)