Exam 2020 augusti Flashcards

1
Q
  1. What can be measured in a DEO equation?
A

Yhat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is characterizing model paramters?

A

Model paramters stay constant with time and consists of x(0), k, and yhat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is characterizing model parameters?

A

Change over time, are the x1, x2 etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Q1: Use Euler forward to compute the concentration of B after one time step with the step
size Dt = 0.1, i.e. compute B. Assume the following values for the kinetic rate
constants: k1 = 3, k2 = 2, k3 = 1

A

The euler method uses the formula:
x(Δt)=x(0)+d/dt(x)(0)*Δt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Answer the questions below
(a) Formulate the null hypothesis underlying a likelihood ratio test! (1 point)

A

H0=There is no difference between the models and the data
H1=One model is better then the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do you conclude when you cannot reject the null hypothesis in a c2-test?

A

H0=The residuals are small /there is no difference between the model and the data
H1=The residuals are big /tghere is difference between the model and the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do you conclude when you reject the null hypothesis in a whiteness test?

A

H0=The residuals are not too correlated
H1=The resdiuasl are too correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give an example of a situation when cross validation is useful in small-scale systems
biology!

A

When we believe we have overfitted our data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

3e) Which test would you use to reject the model in Figure 1? Motivate your answer! (2
points)

A

Chi2 test q

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is hypothesis driven modeling and how does this approach relate to data driven
modeling? Give an example of a question where you would use hypothesis driven modeling
and motivate why hypothesis driven modeling is more useful than data driven modeling for that
question.

A

Overall, while both hypothesis-driven and data-driven modeling approaches have their strengths and limitations, hypothesis-driven modeling is often more useful for testing specific hypotheses about biological mechanisms or for validating experimental results, while data-driven modeling is often more useful for identifying patterns or generating predictions based on large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the steps taken to evaluate if a small-scale mechanistic model is in agreement with
experimental data!

A

We start with a visual inspection, Chi2 test and then perfomring different statistical tests depending on the model like cross validation if we have multiple models to see which is the best fit, whitness test to see if the data is correlated etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Choose a biological network of choice, define what is a node in this particular
network, what interactions do exist, and what types are the underlying
interactions (motivate your answer). (1p)

A

Nodes: Each node in the network represents a unique protein, which may be involved in a variety of different biological processes. Nodes are typically labeled with the name or identifier of the protein they represent.

Edges: Each edge in the network represents an interaction between two proteins, which may take a variety of forms. For example, an edge may represent a physical binding interaction between two proteins, or it may represent a functional interaction in which one protein regulates the activity of another.

Underlying reactions: The interactions between proteins in the network are often based on underlying biochemical reactions, such as protein-protein binding or enzyme-substrate interactions. These reactions can be represented as edges in the network, with the nodes representing the proteins or other molecules involved in the reaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

7b. Draw the graph of the network defined by the following adjacency matrix (2p)

A

Draw this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

c. Is the network directed, and/or weighted? (1p)

A

Directed network: If the network is directed, then the adjacency matrix will be asymmetric.

Weighted network: If the network is weighted, then the adjacency matrix will have nonzero values that represent the strength or weight of the connections between nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

7d, Calculate this: What is the average shortest path of this network?

A

average shortest path = (sum of shortest path distances for all node pairs) / (total number of node pairs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

7e. What is the clustering coefficient of this network? (1p)

A

Clustering coefficient also known as global transitivity (which means it’s the entire graph being analyzed)

It’s closed triplets / closed+ open triplets

Each closed triplet / triangle counts as tree while open triplet is one

17
Q

8 Centrality in network. All questions relate to the network below. (tot 4p)
a. Which node has the highest degree?

A

The simplest centrality measure is the degree centrality, which is defined by the number of connections attached to each node.

In-degree represents the number of directed connections reaching a node, while out-degree represents the number of directed edges leaving a node.

18
Q

8b. Is the network likely to come from a random process? If so describe what random process, if not motivate why.

A

Typical random graph models include the Barabási-Albert model (or “scale-free”), the Erdös-Rényi model (or “random”), and the Watts-Strogatz model (or “small world”).

The “random” graph model creates a graph with nodes and edges completely at random and often look quite dense (look at computer lab 6. Graph models for examples)

Scale free creates edges between nodes based on the importance of potential connections. It looks the most like a possible protien-protein netowork with a few cliques in the graph

Small world often create ring like structures. Most nodes can be reached from every other node by a small number of steps.

If it’s not from a random process but for example a protien-protein interaction network it’s more likely to look like a few highly connected proteins (hubs) in the network, while most proteins have relatively few connections. This is called a scale free network which is the same premiere as scale free random network generator.

19
Q

8c. Which node has the highest clustering coefficient

A

Clustering coefficient also known as global transitivity (which means it’s the entire graph being analyzed)

It’s closed triplets / closed+ open triplets

Each closed triplet / triangle counts as tree while open triplet is one

20
Q

8d. Which node has the highest betweenness centrality?

A

Finally, if we measure the centrality as the number of shortest paths going through a vertex or an edge, we would be ranking them based on their betweenness centrality.

21
Q

8e. Which genes form the maximum clique of maximal size in this network

A

One of the multiple methods for the detection of modules in a graph is based on the identification of cliques. A set of nodes forms a clique (or complete subgraph) if all possible connections between the nodes exist. A two-node clique is simply two connected nodes. A three-node clique is also known as a triangle.

Graphs also contain maximal cliques, which are complete subgraphs such that no other node can be added while maintaining completeness.

22
Q

8f. What is the clustering coefficient of node 8?

A

Clustering coefficient also known as global transitivity (which means it’s the entire graph being analyzed)

It’s closed triplets / closed+ open triplets

Each closed triplet / triangle counts as tree while open triplet is one

23
Q
  1. Consider the human protein-protein interaction network. (tot 5p)
    a. Sketch the degree distribution.
A

The degree distribution of a protein-protein network in humans is expected to follow a power-law distribution, also known as a scale-free distribution. This means that there are a few highly connected proteins (hubs) in the network, while most proteins have relatively few connections.

24
Q

9b. How is this degree distribution different from a degree distribution of a randomly generated network? Where are we expected to find the highest fraction of disease-associated genes, please motivate why this is likely. (2p)

A

In a protein-protein network, the degree distribution is often described as “scale-free”, meaning that it follows a power-law distribution. This means that the majority of nodes in the network have relatively few connections, while a small number of nodes (known as “hubs”) have a very large number of connections. This type of distribution is characteristic of many real-world networks, including social networks and biological networks.

In contrast, a randomly generated network typically has a degree distribution that follows a Poisson distribution. In this type of distribution, the number of nodes with a given degree is expected to follow a bell-shaped curve, with the majority of nodes having a degree close to the average degree of the network.

Degree correlate with lethality, meaning that if a node has a high degree is has higher correlation to lehtality and disease asscoiated genes. This is because if the gene is used a lot and is involved in many pathways if something goes wrong it will go wrong in a lot of places causing a higher lethality and more disease asscotation.

25
Q

9c. Compare degree with a more complex measures of centrality. What pros and cons has the different measures in the context of identify the most important genes.

A

There are a few different ways of measuring centrality such as:
Degree
Closeness
Eigenvector
Betweenness

The simplest centrality measure is the degree centrality, which is defined by the number of connections attached to each node.

In-degree represents the number of directed connections reaching a node, while out-degree represents the number of directed edges leaving a node.

Closeness centrality is the average distance of the node to all others. A central node, with high closeness, should therefore be close to all other nodes in the network in terms of their shortest path distances.

Eigenvector centrality is ranking centrality in measures of the node being linked to many other important nodes. The important nodes has high centrality to other nodes. So it’s one node that has high centrality and is connected to many other central nodes.

Betweeness centrality is measuring the number of shortest paths going through the node.

26
Q
  1. Disease modules (tot 7p)
    a. What is meant by a network community in the context of protein-interaction
    networks, and what biological features could a network community
    correspond to? (2p)
A

A network community is a set of nodes that are densely connected to each other but sparsely connected to the rest of the network. Nodes in the same network are often involved in the same pathways, regulatory mechanisms, or other biological processes. For example protein interaction networks could be:

Signaling pathways: Proteins can interact with each other to transmit signals within a cell or between cells. These signaling pathways are critical for many cellular processes, including cell growth, differentiation, and apoptosis.

Metabolic pathways: Proteins can also interact with each other to catalyze biochemical reactions that are involved in metabolic pathways. These pathways are responsible for the breakdown and synthesis of molecules that are essential for cell function, such as carbohydrates, lipids, and amino acids.

Transcriptional regulation: Proteins can interact with DNA to regulate gene expression. This can involve direct interactions between transcription factors and DNA, as well as indirect interactions through intermediary proteins.

27
Q

10b. What is meant by the disease module hypothesis?

A

Modules are sets of nodes that are densely connected among each other, but sparsely connected to other nodes outside their community.

The disease module hypothesis states that complex diseases are often not due to malfunctioning of a single gene but a disease module, aka a group of densely connected nodes. This means that multiple genes and pathways are affected and causes the disease.

28
Q

10c. In the study of one disease of interest what test could be done to possibly falsify the disease module hypothesis in that data? Please motivate your answer.

A

To potentially falsify the disease module hypothesis in the study of one disease of interest, one possible test could be a randomized control trial (RCT). An RCT is a study design in which participants are randomly assigned to either a treatment group or a control group. The treatment group receives the intervention being tested, while the control group does not.

If the disease module hypothesis is correct, we would expect to see a significant difference in disease outcomes between the treatment and control groups. However, if the disease module hypothesis is false, we would not expect to see a significant difference between the two groups.

29
Q

10d. Describe as detailed as possible the different steps in one algorithm frequently used to identify disease module.

A

There are different options, a clique based algorithm is MCODE for example.

MCODE (Molecular Complex Detection) is a clique-based algorithm designed to identify densely connected subgraphs (modules) in protein-protein interaction networks. The algorithm works by scoring each node in the network based on its local connectivity and then recursively expanding highly scored nodes into a dense subgraph.

The algorithm consists of the following steps:

Node Scoring: The algorithm assigns a local score to each node in the network based on its degree and the degree of its immediate neighbors. The score is calculated as the sum of the product of the degrees of each node in a given node’s neighborhood. The higher the score, the more likely the node is to be part of a densely connected module.

Seed Selection: The algorithm selects the highest-scoring node as a seed node and expands it into a module by including all its first neighbors with a score greater than a pre-defined cutoff.

Module Expansion: The algorithm continues to expand the module by adding neighboring nodes that meet a specified score cutoff until no more nodes can be added without decreasing the overall score of the module.

Module Scoring: The algorithm calculates a score for each module based on the sum of the scores of its nodes.

Output: The algorithm outputs all modules with a score above a predefined cutoff.

MCODE is a powerful algorithm for identifying biologically relevant subgraphs in protein-protein interaction networks, and it has been successfully applied to a variety of biological systems, including cancer and infectious diseases.