Exam3Lec4TheCancerGenomeAtlas Flashcards
What is the biggest data project and what is its purpose?
NIH and its purpose is to help us understand cancer by comparing various pt samples
Is cbioportal info proccessed or unproccessed?
LOTS of processing
it is a public websote and it cant have a whole pt genome
What is Raw Sequencing files and where can you access this?
require high cyber security. This is due to being able to identify a person, you can access on genomic data commons, requires an application
Explain this Article with Dr. B in it: Big genes are big mutation targets: a connection to cancerous spherical cells.
A gene is large, you get a lot of mutations. He compates high a low frequency (large and small genes) and found that a lot of large genes had more mutations, but alot of are common mutations, they dont actaully cause the cancer.Mutagenesis is relatively random. Therefore, large genes (Like ones that encode for cytoskeletal proteins) could have lots of mutation
Article w/Dr. B in it
What type of TCGA files were analyzed in the article?
Un proccessed
Article w/Dr. B in it
What principle or process of mutagenesis is revealed by the ratios of silent to amino acid altering mutations in Table 1 of the article?
You changed the amino acid, but was it a silenced mutation, menaing that it didnt cause the cancer
HIGH FREQ= More Passenger mutation (higher ratio of silenced mutation in amino acid change)
LOW FREQ=Mutation happening in exact right spot ( all mutationns needed to cause cancer)
Article w/Dr. B in it
What is the potential medical or clinical significance of large coding region mutations in cancer?
More commonly mutations
Article w/Dr. B in it
Table 1: Ratio of silent mutation to AA changes. What can be concluded?
You can see there are
A larger number of “High Frequency
Mutations” but in these there are more
Silent mutations
Silent mutation= Change in AA but
Did not cause cancer
article with Dr. B in it
Table 2: Ratios of high to low frequency mutation groups, for average number of mutated genes in the various gene sets. What can be concluded?
As seen for colon cancer out of 53, 9.1 were oncoproteins. the oncoprotein mutations are over-represented in the lower frequency groups for colon and lung cancer.
Mutations have a very large random componentWhen there are lots of mutations, that doesn’t necessarily mean there will be a phenotypic effect ( doesn’t cause cancer)
If you have very few mutations its going to be a higher proportion of those mutations driving the cancer
How does one obtain access to information available for the large collection of cancer samples?
Through processed and unproccessed data
What is processed data?
lots of types of data representing tumor biopsies (no normal tissue) and patient info: You get from cBioPortal and download; Excel, text files.
- PUBLIC ACCESS, download excel to look carefully
- Can see displays and recover the processed data
- Only tumor biopsy data is available; not data on normal tissue
- **Can’t get SNPs **out of processed data because the normal tissue is not available
you cant compare geneome of blood vs genome of cancer with processed data
What is unprocessed data?
reads from sequencing machines, representing DNA or RNA;** tumors and normal tissue**: can find from Genomic Data Commons
- NEED TO APPLY, controlled access and download raw data from NIH
- If looking for a splice variant, would need to look at the unprocessed data
- Original sequencing data need to see unprocessed data
- Need to be able to use code or write code to help to look through unprocessed data
you get the actual reads
Processed (or curated or annotated): What are the types of data?
- Somatic amino acid substitutions
-ONLY CODON, not the surrounding Nucleotides (that’s controlled) - Transcriptome (RNASeq values, NOT reads); RNA microarray results. (NOT NUMBER OF READS)
- Whole genome methylation of cytosines (methylome)
- Copy number variation (keep in mind N-myc in neuroblastoma and DHFR)
-WILL NOT GET SEQUENCE INFORMATION
-MYC=ONCOPROTEIN
-DHFR=Generating thymine - microRNA
- Clinical information (Tumor represented a person who smoked, age, tx, ect
YOU GET FROM http://www.cbioportal.org/
What are the limitations to processed data?
- No processed data from whole genome sequences; can’t see intron mutations (it may be interesting to see if a particular mutation affects splicing)
- Some holding back of information to protect patient security
- NIH does not provide a raw DNA sequence for cancers; however, the amino acid sequences are freely available
- Nothing in the processed data represents an original DNA or RNA sequence because that information could be used to identify a patient
Tumor supressor genes
MSI = microsatellite instability = VNTR instability (more
number of repeats due to strand slippage not being repaired by MMR) = MMR defect.
MSI refers to Measuring MMR defects
If have a mismatch repair defect, these repeats are altered during strand slippage
What does this figure represent?
Small section of the results for the cBioPortal
each bar: tumor sample
red-MSI high= MMR defect
dark green-Missene Mutation Putative driver=drives cancer
light green-Putative passenger: DOES NOT cause cancer
What does this figure represent?
A portion of an Excel file from the patient order download, showing the first several patients with MSI (microsatellite instability).
last number of 407 is the number of different pt woth MSI (so with MMR defect)
What does this figure represent?
A portion of an Excel file from cBioPortal, Colon adenocarcinoma mutations (DFCI). You can basically see what mutation pt got and compare and form hypothesis. So here you see patient number on the left side, genes on the top, and in the middle of the table mutations. You COMPARE ALL THE DATA IN AN EXCEL FILE AND TEST A SPECIFIC HYPOTHESIS
So MMI=MMR defect and you can download an excel file and see what actual mutations it caused and form a hypothesis. A large amy of ppl have this muation can it be related to a specific cancer?
Hypothesis we want to test out about OSTEOGENESIS IMPERFECTA: DO CYTOSKELETAL PROTEINS HAVE MORE TRUNCATING MUTATIONS ?If so, do the shapes differ greatly than in mismatch repair patients
Yes there are lot of truncatingmutations and it did cause cell to be a diff shape. If a truncating mutation occurs earlier and consequently would have half of the collagen protein missing, would expect the phenotypic outcome to be more deleterious