Exam3Lec4TheCancerGenomeAtlas Flashcards by Daniella Andre

What is the biggest data project and what is its purpose?

NIH and its purpose is to help us understand cancer by comparing various pt samples

How well did you know this?

Not at all

Perfectly

Is cbioportal info proccessed or unproccessed?

LOTS of processing

it is a public websote and it cant have a whole pt genome

How well did you know this?

Not at all

Perfectly

What is Raw Sequencing files and where can you access this?

require high cyber security. This is due to being able to identify a person, you can access on genomic data commons, requires an application

How well did you know this?

Not at all

Perfectly

Explain this Article with Dr. B in it: Big genes are big mutation targets: a connection to cancerous spherical cells.

A gene is large, you get a lot of mutations. He compates high a low frequency (large and small genes) and found that a lot of large genes had more mutations, but alot of are common mutations, they dont actaully cause the cancer.Mutagenesis is relatively random. Therefore, large genes (Like ones that encode for cytoskeletal proteins) could have lots of mutation

How well did you know this?

Not at all

Perfectly

Article w/Dr. B in it

What type of TCGA files were analyzed in the article?

Un proccessed

How well did you know this?

Not at all

Perfectly

Article w/Dr. B in it

What principle or process of mutagenesis is revealed by the ratios of silent to amino acid altering mutations in Table 1 of the article?

You changed the amino acid, but was it a silenced mutation, menaing that it didnt cause the cancer

HIGH FREQ= More Passenger mutation (higher ratio of silenced mutation in amino acid change)
LOW FREQ=Mutation happening in exact right spot ( all mutationns needed to cause cancer)

How well did you know this?

Not at all

Perfectly

Article w/Dr. B in it

What is the potential medical or clinical significance of large coding region mutations in cancer?

More commonly mutations

How well did you know this?

Not at all

Perfectly

Article w/Dr. B in it

Table 1: Ratio of silent mutation to AA changes. What can be concluded?

You can see there are
A larger number of “High Frequency
Mutations” but in these there are more
Silent mutations

Silent mutation= Change in AA but
Did not cause cancer

How well did you know this?

Not at all

Perfectly

article with Dr. B in it

Table 2: Ratios of high to low frequency mutation groups, for average number of mutated genes in the various gene sets. What can be concluded?

As seen for colon cancer out of 53, 9.1 were oncoproteins. the oncoprotein mutations are over-represented in the lower frequency groups for colon and lung cancer.

Mutations have a very large random componentWhen there are lots of mutations, that doesn’t necessarily mean there will be a phenotypic effect ( doesn’t cause cancer)

If you have very few mutations its going to be a higher proportion of those mutations driving the cancer

How well did you know this?

Not at all

Perfectly

How does one obtain access to information available for the large collection of cancer samples?

Through processed and unproccessed data

How well did you know this?

Not at all

Perfectly

What is processed data?

lots of types of data representing tumor biopsies (no normal tissue) and patient info: You get from cBioPortal and download; Excel, text files.

PUBLIC ACCESS, download excel to look carefully
Can see displays and recover the processed data
Only tumor biopsy data is available; not data on normal tissue
**Can’t get SNPs **out of processed data because the normal tissue is not available

you cant compare geneome of blood vs genome of cancer with processed data

How well did you know this?

Not at all

Perfectly

What is unprocessed data?

reads from sequencing machines, representing DNA or RNA;** tumors and normal tissue**: can find from Genomic Data Commons

NEED TO APPLY, controlled access and download raw data from NIH
If looking for a splice variant, would need to look at the unprocessed data
Original sequencing data need to see unprocessed data
Need to be able to use code or write code to help to look through unprocessed data

you get the actual reads

How well did you know this?

Not at all

Perfectly

Processed (or curated or annotated): What are the types of data?

Somatic amino acid substitutions
-ONLY CODON, not the surrounding Nucleotides (that’s controlled)
Transcriptome (RNASeq values, NOT reads); RNA microarray results. (NOT NUMBER OF READS)
Whole genome methylation of cytosines (methylome)
Copy number variation (keep in mind N-myc in neuroblastoma and DHFR)
-WILL NOT GET SEQUENCE INFORMATION
-MYC=ONCOPROTEIN
-DHFR=Generating thymine
microRNA
Clinical information (Tumor represented a person who smoked, age, tx, ect
YOU GET FROM http://www.cbioportal.org/

How well did you know this?

Not at all

Perfectly

What are the limitations to processed data?

No processed data from whole genome sequences; can’t see intron mutations (it may be interesting to see if a particular mutation affects splicing)
Some holding back of information to protect patient security
NIH does not provide a raw DNA sequence for cancers; however, the amino acid sequences are freely available
Nothing in the processed data represents an original DNA or RNA sequence because that information could be used to identify a patient

How well did you know this?

Not at all

Perfectly

Tumor supressor genes

MSI = microsatellite instability = VNTR instability (more
number of repeats due to strand slippage not being repaired by MMR) = MMR defect.

MSI refers to Measuring MMR defects

If have a mismatch repair defect, these repeats are altered during strand slippage

How well did you know this?

Not at all

Perfectly

What does this figure represent?

Small section of the results for the cBioPortal

each bar: tumor sample
red-MSI high= MMR defect
dark green-Missene Mutation Putative driver=drives cancer
light green-Putative passenger: DOES NOT cause cancer

How well did you know this?

Not at all

Perfectly

What does this figure represent?

A portion of an Excel file from the patient order download, showing the first several patients with MSI (microsatellite instability).

last number of 407 is the number of different pt woth MSI (so with MMR defect)

How well did you know this?

Not at all

Perfectly

What does this figure represent?

A portion of an Excel file from cBioPortal, Colon adenocarcinoma mutations (DFCI). You can basically see what mutation pt got and compare and form hypothesis. So here you see patient number on the left side, genes on the top, and in the middle of the table mutations. You COMPARE ALL THE DATA IN AN EXCEL FILE AND TEST A SPECIFIC HYPOTHESIS

So MMI=MMR defect and you can download an excel file and see what actual mutations it caused and form a hypothesis. A large amy of ppl have this muation can it be related to a specific cancer?

How well did you know this?

Not at all

Perfectly

Hypothesis we want to test out about OSTEOGENESIS IMPERFECTA: DO CYTOSKELETAL PROTEINS HAVE MORE TRUNCATING MUTATIONS ?If so, do the shapes differ greatly than in mismatch repair patients

Yes there are lot of truncatingmutations and it did cause cell to be a diff shape. If a truncating mutation occurs earlier and consequently would have half of the collagen protein missing, would expect the phenotypic outcome to be more deleterious

How well did you know this?

Not at all

Perfectly

For BRAF, melanoma (SKCM TCGA legacy), a ____ tab is available with the results from cBioPortal.

Study These Flashcards

survival. You can type in cancer for skin cutaneous melanoma and see cell genes mutated as well as the survival rates and download the excel sheet.

What does this screenshot represent?

Study These Flashcards

cBioPortal “oncoprint”, showing mutations for BRAF only. So 52% of ppl with this cancer (melanoma) has this mutation. We can then compare their survival rate vs everyone who didnt have braf mutation survival rate.

Survival curve for BRAF mutations

Study These Flashcards

Pts who do have BRAF mutations (red) do better than those who do not

Why? They have quickly growing tumors. Due to this they are more susceptible to chemo
We target rapidly dividing cells. More vulnerable to apoptosis.

the same goes for pt who have Oncoproteins mutations

Study with Dr. B study: TCGA: increased oncoprotein coding region mutations correlate with greater expression of apotosis-effector genes and a postive outcome for stomach adenocarcinoma.

Study These Flashcards

The data reported in this article is a reflection of the fact that proliferation genes and apoptosis genes share transactivators (TA).
The more oncoproteins, the more activation of apoptosis genes. Thus, such cancers can be less deadly.

There is a “ fail-safe” becasue the TA that activate pro-proliferation also activate apoptosis genes (to control). So if a cell rec that its messed up, it will activate the apotisos gene and kill itselff.

When a significant number of mutations accumulate for oncoproteins, one is bound to land on a cell apoptosis gene
Cell that divide more quickly are more sensitive to chemotherapy
Also, when cells carry out too much DNA replication, apoptosis is induced in the cell

the more oncoportein (more-pro proliferative protein) the more chance the cell would kill itslef= the more sensitie to chemo= the more chance of survival of pt

What is included in unprocessed data?

Study These Flashcards

The actual reads that come off of the sequencing machines, for tumor specimens: exomes WGS, RNAseq files.
Whole genome sequences, WGS (There is no processed version of WGS mutations for either tumor or normal tissue.(NOT PUBLIC)
* All patient-normal tissue sequences, e.g., exomes, WGS.
Normal RNASeq (transcriptome), very rare

YOU GET FROM
https://portal.gdc.cancer.gov/repository

Blood exomes are used to establish what is normal and what is mutant (compare blood exoms to cancerous tissue)

What does this screenshot represent?

List of a very **few controlled exome (WXS) and RNASeq files, containing the actual reads **that have been generated by the massively parallel sequencing process. Each file represents the results for one tumor sample. Look at the last column (Data Type) to identity either exome or RNAseq. | This is genome commons and controlled data bc large file ## Footnote note the Access column, to be reminded that this is controlled data, i.e., only available to scientific investigators after NIH approval. Note also “BRCA” for breast cancer.

Which of the following represents controlled access data? -Mutant AA in tumor -RNA seq files with NO seq reads -Raw seq files for an exome

-Raw seq files for an exome

Why would someone want to work with unprocessed data?

Copy number variation example Mutations in normal tissue ## Footnote did we inherit it, or is it specific to cancer

What does this screenshot represent? add screenshot

****Example list of DNA sequencing reads obtained from a “raw” TCGA sequence file, each about **100 nucleotides long. **The read looks like DNA sequences after illumine or massively parallel sequencing

If I had a hypothesis that these cancers represented 5 times the ordinary # of transcription factor binding sites for a DNA polymerase gene Could I address this hypothesis with?? cBIOportal Genomic data commons Both neither

Genomic data commons bc transcription factor binding site are NOT in just exons, they are in entire genome. Need unprocessed data

If looking for one transcription factor binding site, ____would NOT be helpful for that. We would need the whole genome to visualize

cBioPortal

Access to ____ data allows us to access **the number of reads** that are contributing to the variations in copy number; counting the number of reads

Unprocessed ## Footnote We compare the read number from tumor vs Blood (tumor vs normal tissue)

Explain this pic of copy number assesment of MYC

We are looking at copy number for MYC and we count reads and compare them to the blood sample. Found that the blood sample MYC was NOT amplified, In the tumor MYC was amplified. very high ration! No BRCA1= lower survival

Explain this screenshot | mutations in normal tissue

If you are looking at an SNP for systemic, its impt to look over at least 2 diff tissues (in this case breast and blood) . Mutations in green means that it is found in both the both the breast and blood ( normal/blood smple and cancer smple). This shows that bc mutation is found in both blood and breast it is INHERITED. ## Footnote remember Need unprocessed to compare tumor to blood (germline)

Where has TCGA succeeded?

big data within a given topic, e.g., exome data, somatic mutations; landscape of tumor molecular features.

Where is TCGA lacking?

We cant Link detailed clinical information to molecular information For example: * **Linking responses to drugs to molecular information.** * Can’t compare melanoma to people without melanoma in the case of TCGA; can’t make comparisons in mutations for people who do or do not have cancer or a given mutation between cancers via TCGA Can only compare within the specific cancer type when using TCGA

What are Examples of clinical information and molecular traits?

* Melanoma outcomes shown previously (more oncoproteins or BRCA2 mutations) * Smoking and mutation rates * MMR Defect

A study with Dr. B. showed data of the Average number of mutations in lung cancer samples from persons who smoked and persons who did not. What was found?

It was found that if you smoke you have more mutatations. There is a better sutvival rate if there was more muations on oncoproteints bc they are transactivators it can also undergo apoptosis. Better survival woth ppl with Brca1 protein mutation bc more sensitive to chemo.

What does this whole lecture have to the do with the practice of healthcare>

**Discovery based approached** to genome “biomarkers” that will predict susceptibility or outcomes. We don’t know everything about the sequences yet so it’s hard to have a solid hypothesis. You learn more as you go through the data **“Discovery based” approach** to new understanding possible via TCGA: As an example, if 1000 tumor samples are sequenced, it may be “discovered” that 500 have a particular gene mutated, a gene never before considered as relevant to cancer 2. Biomarkers for monitoring 3. New (unexpected) molecular features of cancer, e.g., SNPs, ## Footnote For any number of these diseases, there is a huge collection of blood exomes available (blood is easily accessible); however, tissue for these diseases is not available as readily

What are some examples what can be done possibly in the future in addition to what we are currently doing>

Alzheimer’s brain genome atlas? (NO PUBLIC ACCESS) -Future Heart disease and stroke genome atlases? -Future Single cell DNA sequencing

What are researchers doing now with Alzheimer's

Looking at brain tissue, adaptive immune receptors, They are building brain exome and blood exome file and is only availbale with NIH access.

Processed TCGA data can be had in convenient and commonly used file formats, such as ____ and represents amino acid substitutions, methylome data, clinical data, ___________, and miRNA data, among other data

excel & text files, transcriptome

Unprocessed data amounts to raw sequencing ____ from which all of the above processed data is generated, but also including whole genome sequences and ________ tissue data.

reads, normal

Processed data is analyzed by building a collection of barcodes (patient samples) and data-types for download from the cBioPortal web site; unprocessed data is obtained from the ____ web page.

genomic data commons

A read from a DNA sequencing process is about ____ nucleotides long. Overlapping reads are used to piece together a more comprehensive picture of the genome.

100

Colon cancers with microsatellite instability (MSI) have ____ defects that lead to increased numbers of ____ mutations.

DNA, truncating

Cancers can have transcription factor binding sites that occur in____ copy numbers.

different

A positive outcome in melanoma and stomach cancer is associated with more mutations of ____

oncoprotein

Lung cancers from persons who smoke have many more ____ than do lung cancers from nonsmokers.

mutations

Exam3Lec4TheCancerGenomeAtlas Flashcards

(48 cards)