Hard questions from exam Flashcards

Question 1

Q

What is the formula for cost function

Answer

A

cost=V(p)=Σ(yt-yhat+(p))^2 / SEMt^2

y=mean data points
SEM standard error of mean, how long the arrows are / how much the data varies / how uncertain the data points are
yhat=the model we’re measuring
p=different paramters we test to see which creates the lowest function

We do it ^2 necause signs doesn’t matter, get rids of negative values

Question 2

Q

What is hypothesis driven modelling compared to data driven modelling?

Answer

A

We have small avaible data but also need biological knowledge. You form a hypothesises based on small amounts of data and biolgoical knowdledge and create hypothesis to explain biological functions behind reactions and to plan future experiments.

Data-driven modeling is often more useful for identifying patterns or generating predictions based on large datasets.

Question 3

Q

Formulate the null hypothesis underlying a likelihood ratio test! (1 point)

Answer

A

H0=There is no difference between the models and the data
H1=One model is better then the other

Question 4

Q

What do you conclude when you cannot reject the null hypothesis in a c2-test?

Answer

A

H0=The residuals are small /there is no difference between the model and the data
H1=The residuals are big /tghere is difference between the model and the data

Question 5

Q

What do you conclude when you reject the null hypothesis in a whiteness test?

Answer

A

H0=The residuals are not too correlated
H1=The resdiuasl are too correlated

Question 6

Q

Explain the test cross-validation in relation to modeling in systems biology. What do we
test and what happens if the test leads to a rejection? What is the next step?

Answer

A

Cross validation is a test to analyze if we have overfitted the model to the data. H0 means it’s not overfitted and H1 means it’s overfitted. We can take away a few paramters or data points and do a new cross validation to see if it’s better.

Question 7

Q

How do you calculate average shortest path for a network?

Answer

A

average shortest path = (sum of shortest path distances for all node pairs) / (total number of node pairs)

Question 8

Q

Please compare degree with a more complex measures of centrality. What
pros and cons has the different measures in the context of identify the most
important genes.

Answer

A

There are a few different ways of measuring centrality such as:
Degree
Closeness
Eigenvector
Betweenness

The simplest centrality measure is the degree centrality, which is defined by the number of connections attached to each node.

In-degree represents the number of directed connections reaching a node, while out-degree represents the number of directed edges leaving a node.

Closeness centrality is the average distance of the node to all others. A central node, with high closeness, should therefore be close to all other nodes in the network in terms of their shortest path distances.

Eigenvector centrality is ranking centrality in measures of the node being linked to many other important nodes. The important nodes has high centrality to other nodes. So it’s one node that has high centrality and is connected to many other central nodes.

Betweeness centrality is measuring the number of shortest paths going through the node.

Question 9

Q

Please describe how two different articles described by different authors
make use of the disease module hypothesis and how they differ between
each other. Include how they motivate the use of it, what different steps can
make it falsifiable?

Answer

A

Answer this

Question 10

Q

Describe as detailed as possible the different steps in one algorithm
frequently used to identify disease modules.

Answer

A

MCODE (Molecular Complex Detection) is a clique-based algorithm designed to identify densely connected subgraphs (modules) in protein-protein interaction networks. The algorithm works by scoring each node in the network based on its local connectivity and then recursively expanding highly scored nodes into a dense subgraph.

The algorithm consists of the following steps:

Node Scoring: The algorithm assigns a local score to each node in the network based on its degree and the degree of its immediate neighbors. The score is calculated as the sum of the product of the degrees of each node in a given node’s neighborhood. The higher the score, the more likely the node is to be part of a densely connected module.

Seed Selection: The algorithm selects the highest-scoring node as a seed node and expands it into a module by including all its first neighbors with a score greater than a pre-defined cutoff.

Module Expansion: The algorithm continues to expand the module by adding neighboring nodes that meet a specified score cutoff until no more nodes can be added without decreasing the overall score of the module.

Module Scoring: The algorithm calculates a score for each module based on the sum of the scores of its nodes.

Output: The algorithm outputs all modules with a score above a predefined cutoff.

MCODE is a powerful algorithm for identifying biologically relevant subgraphs in protein-protein interaction networks, and it has been successfully applied to a variety of biological systems, including cancer and infectious diseases.

Question 11

Q

Describe as detailed as possible network biology procedure how to search to
repurpose an old drug for a new disease. Include the following concepts:
a) How to test whether networks are good tools to use. How do you define b)
disease-module, and c) side-effect module. Also discuss d) potential validation
strategies. (tot 5p)

Answer

A

a) Testing the usefulness of networks:

Network biology is a useful tool for identifying drug targets and repurposing drugs for new diseases. To test the usefulness of networks, researchers can analyze different types of networks, such as protein-protein interaction networks, gene regulatory networks, and metabolic networks. These networks can be analyzed using different methods, such as clustering, network motif analysis, and functional enrichment analysis.

b) Defining disease modules:

To identify potential drug targets and repurpose an old drug for a new disease, researchers need to identify disease modules within the network. Disease modules are clusters of genes or proteins that are implicated in the disease. Researchers can use different methods, such as differential gene expression analysis, genome-wide association studies, and protein-protein interaction analysis, to identify disease modules. Once the disease modules have been identified, researchers can use different computational methods to prioritize potential drug targets.

c) Defining side-effect modules:

To repurpose an old drug for a new disease, researchers also need to identify side-effect modules. Side-effect modules are clusters of genes or proteins that are associated with the side effects of the drug. Researchers can use different methods, such as drug target prediction and gene set enrichment analysis, to identify side-effect modules. Once the side-effect modules have been identified, researchers can use different computational methods to predict potential off-target effects of the drug.

d) Potential validation strategies:

Once potential drug targets and repurposed drugs have been identified using network biology approaches, researchers need to validate these findings using experimental methods. There are different types of experimental validation strategies, such as in vitro assays, animal models, and clinical trials. In vitro assays can be used to test the effects of the drug on specific cells or proteins. Animal models can be used to test the efficacy and safety of the drug in vivo. Clinical trials can be used to test the efficacy and safety of the drug in humans.

Question 12

Q

Describe different random graph models

Answer

A

Typical random graph models include the Barabási-Albert model (or “scale-free”), the Erdös-Rényi model (or “random”), and the Watts-Strogatz model (or “small world”).

The “random” graph model creates a graph with nodes and edges completely at random and often look quite dense (look at computer lab 6. Graph models for examples)

Scale free creates edges between nodes based on the importance of potential connections. It looks the most like a possible protien-protein netowork with a few cliques in the graph

Small world often create ring like structures. Most nodes can be reached from every other node by a small number of steps.

If it’s not from a random process but for example a protien-protein interaction network it’s more likely to look like a few highly connected proteins (hubs) in the network, while most proteins have relatively few connections. This is called a scale free network which is the same premiere as scale free random network generator.

Question 13

Q

10c. In the study of one disease of interest what test could be done to possibly falsify the disease module hypothesis in that data? Please motivate your answer.

Answer

A

To potentially falsify the disease module hypothesis in the study of one disease of interest, one possible test could be a randomized control trial (RCT). An RCT is a study design in which participants are randomly assigned to either a treatment group or a control group. The treatment group receives the intervention being tested, while the control group does not.

If the disease module hypothesis is correct, we would expect to see a significant difference in disease outcomes between the treatment and control groups. However, if the disease module hypothesis is false, we would not expect to see a significant difference between the two groups.