AI Flashcards

(51 cards)

1
Q

Network Medicine is based on…

A

network science, physics, applied mathematics and statistics, computer science, biology, and medicine

*
Patients are unique
*
Patients with the same clinical picture do not share necessarily the same disease pathophenotype
*
Networks of molecular interactions (interactome) to identify unknown disease phenotypes and pathogenic event
*
Network of Networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Network Medicine is …?

A

*
Different biological networks capture the complex interactions between genes, proteins, RNA molecules, metabolites and genetic variants in the cells of organisms
*
These networks, also interchangeably known as graphs, are representations in which the complex system components are simplified as nodes that are connected by links (edges)
*
Network medicine is largely discovery driven, rather than hypothesis driven, uncovering previously unknown relationships and leading to the identification of new biomarkers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Network-based studies have to primarily identify two things…?

A

*
what are the critical entities in the system under investigation (nodes)
*
what is the nature of the interactions between theseentities (edges)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the kinds of grraph is network medicine ?

A

Binary vs Weighted
Directed vs undirected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how is the Identification of disease associated network components within the interactome done ?

A

*
Consideration of the topological properties of the nodes and assess the functional role of their hubness which is the property of having a higher number of connections
*
Identification of new disease genes in the network by using “guilt-by-association“a property not based on direct evidence but association with other disease genes
*
Prioritization of candidate disease genes, molecular interaction networks assists in the identifification of sub-networks mechanistically linked to disease phenotypes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how does the Co-expression based network modeling to identify disease biomarkers work ?

A

*
Patterns of transcript abundance are studied in the context of the disease after construction of Gene Co-expression Networks (GCNs)
*
Combination of important seed genes with an organic network of co-expression patterns derived from the gene expression data from the same system
*
GCNs identify the functionally coordinated participation of genes in response to an external stimulus or condition
*
GCNs can be signed or unsigned, weighted or unweighted, and may either be constructed using microarray or RNA-Seq data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how are we Inferring ( forming an opinion ) Phenotype Specific Gene Regulatory Networks?

A

*
Separate networks can be built for each phenotype which may be case-control, disease-specific, tissue or cell-specific, sex-specific, or for different disease subtypes
*
Network comparison model stems from the axiom of “differential networking” over “differential expression”
*
The comparison of networks helps to uncover the specific rewiring of pathways, such as those induced by disease, pharmacological treatment, or environmental stimuli and more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The Future Needs in NM?

A

*Define as much as possible the biological heterogeneity to increase the precision of risk prediction and the personalization of prevention and intervention strategies
*Help the researchers to better understand the human physiological and clinical relevance (to avoid reverse technological processes) and to focus on the relevance for the patients needs
*Integrate data of different nature in a way able to rapidly reduce the dimensionality in order to distill implementable results in drug discovery/healthcare management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the use of NM?

A

Disease Understanding: Network medicine enables researchers to characterize diseases as perturbations in complex biological networks rather than isolated anomalies. By mapping out the interactions among genes, proteins, and other molecular entities, network medicine provides insights into disease mechanisms, progression, and heterogeneity. This holistic approach aids in identifying novel biomarkers and therapeutic targets.

Personalized Medicine: By integrating patient-specific data, such as genomics, transcriptomics, and clinical information, with network-based models, personalized treatment strategies can be devised. Network analysis helps in identifying patient subgroups with similar molecular profiles and predicting individual responses to drugs, allowing for tailored therapeutic interventions.

Drug Discovery and Repurposing: Network medicine facilitates the identification of drug targets and the repurposing of existing drugs for new indications. By analyzing drug-protein interaction networks and their effects on disease-associated pathways, researchers can identify candidate compounds with therapeutic potential and optimize drug combinations for synergistic effects.

Systems Pharmacology: Network medicine provides a systems-level understanding of drug actions and their effects on biological pathways. By integrating pharmacological data with molecular networks, researchers can predict drug efficacy, side effects, and interactions, aiding in the design of safer and more effective treatments.

Biomarker Discovery: Network-based approaches help in the identification of molecular signatures and biomarkers associated with disease diagnosis, prognosis, and treatment response. By analyzing the connectivity and dynamics of biomolecular networks, researchers can uncover diagnostic markers for early disease detection and monitor disease progression.

Biological Network Visualization and Interpretation: Network visualization tools and software platforms allow researchers to visually explore and interpret complex biological networks. By representing molecular interactions as graphical networks, researchers can identify key nodes (e.g., hubs, bottlenecks) and pathways implicated in disease pathogenesis, facilitating hypothesis generation and experimental validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are Artificial
Intelligence ( AI)& Machine Learning (ML) ?

A

*
AI : the theory and development of computer systems able to perform
tasks that normally require human intelligence, such as visual
perception, speech recognition, decision making, and translation
between languages.
*
ML : The use and development of computer systems that are able to
learn and adapt without following explicit instructions, by using
algorithms and statistical models to analyze and draw inferences from
patterns in data.
-
-> Artificial intelligence is simulated intellectual tasks. Machine Learning is algorithms
trained on data to learn patterns to make predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Machine
learning use cases in life science
Genomics

A

Genomics

*
Variant calling
*
Genetic sequence
of a cancer e.g.
druggable targets
*
Functional
predictions

OMICS &
life
science
*
Risk factors (e.g.,
hypertension)
*
Integration of
Multiomics
*
Protein structure
predictions
*
DDI networks
*
Drug Discovery

Diagnostics
*
Images of
patients e.g. eye,
skin, hair
*
CT pictures e.g. of
the head , cancer
*
X ray films
*
Real time video
of a colonoscopy

Healthcare
Diagnostics
*
Alerts &
diagnostics from
ral time EHR data
*
Predictive health
management
*
Healthcare
provider
sentiment
analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the big difference between deep learning and machine learning ?

A

feature extraction is done manually in machine learning whereas in deep learning we don’t give it the features , it learns how to classify by itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

can we have both acuracy and interpretability in ML?

A

Trade
off between accuracy and interpretability for ML models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how does chat gpt work ?

A

The chat gpt splits the words to models
It predicts what word comes after the other

Possible
token levels
*
Sentence
*
Words
*
Subword
*
Character

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how does supervised learning work ?

A

Supervised learning we give training data that is categorized
so then it can say if its good or bad for example ( binary )

What if we have more than one input ?
It can draw a line in two dimensions and categorise the elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how does Unsupervised
learning work ?

A

“the data comes only with inputs
x but not output labels y,
and the algorithm has to find some structure or some
pattern or something interesting in the data.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Questions
, apply supervised or unsupervised learning algorithm

*
Given email labeld as spam /not spam , learn a spam filter

*
Given a set of published papers found on pubmed , group them
into sets of articles about the same research topic

*
Given a databse of expression data of patients , automatically
discover signals and group patients into different response
groups

*
Given a datasdet of patients diagnosed as either having
diabets or not, learn to classify new patients as having
diabetes or not

A

Supervised

Unsupervised

Unsupervised

Supervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is the basic principle of supervised regression learning ?

A

training set - learning algorithm = Feature - model- Prediction (Estimated y)

What
is f?
𝑓(𝑥)=𝑤𝑥+𝑏
Linear
regression with one
variable/ feature
=Univariate linear regression

Needed:

Matrix of features

Matrix of coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Principle of machine learning algorithms

A

3 step process

Infer / Predict

Error / Loss

Train / Learn

-Predict : MOVE

-Error: BAD or GOOD

-Learn :Oh,
this was a
terrible
idea

-Reinforcment :
Well done , do it again
Model:
Decreasing or increasing the weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what does the cost function do ?

A

Squared error cost function
calculates the distance( Mean Squared Error) from the correct value and then :

𝑓(𝑥)=𝑤𝑥+𝑏
Optimize w and b to get lowest Mean Squared Error ( sometimes this can be a loval minimum and thats a problem )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is overfitting ?

A

Overfitting is an undesirable machine learning behavior that occurs when the machine learning model gives accurate predictions for training data but not for new data. When data scientists use machine learning models for making predictions, they first train the model on a known data set. It is too fitted to the training data xw+x^2w + x^3w….+ b

21
Q

why is alphafold not perfect ?

A

Functional predictions of variants
Prediction of “
AlphaFold has not been validated for
predicting the effect of mutations . In particular,
AlphaFold is not expected to produce an unfolded
protein structure given a sequence containing a
destabilising point mutation.”

Best
assessment of whether a variant has structural or
functional impact also requires contextual knowledge

but

You can predict the function of variants with alphafold misssence

22
Q

can we predict
CYP2D6 phenotype with Machine learning ?

A

yes

and we ca do Functional assessment of
pharmacogenomic variants

Predicting with Machine learning for CYP2D6 we can skip annotation as star alleles and allocating numeric values of 1, 0 and we also get great results

There is other ways to predict also using star alleles

23
Q

what is Ensemble
or Metalearner?

A

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better
predictive performance than could be obtained by any of the constituent algorithms.

You can use multiple machine learning and the algorith decides which ones are better- Superlearner pachage in R

24
appllications ?
The hemoglobin levels and the amount of blood transfused can be estimated with less error than before because of ML Supervised machine learning methods trained using SNPs and total baseline depression scores predicted remission and response at 8 weeks with area under the receiver operating curve (AUC) 0.7 70% prediction acccuracy Assesment of drug drug interactions in polypharmacy using graph convolutional networks AI performs as well as doctors in university tests
25
wha is the difference btween unsupervised machine learning and deep learning ?
Unsupervised Machine Learning: In unsupervised learning, the algorithm is given a dataset without explicit instructions on what to do with it. The algorithm must find patterns, structure, or relationships within the data on its own. Unsupervised learning techniques include clustering, dimensionality reduction, and association rule learning. Clustering algorithms like K-means or hierarchical clustering group similar data points together based on their inherent patterns or similarities. Dimensionality reduction techniques such as Principal Component Analysis (PCA) or t-distributed Stochastic Neighbor Embedding (t-SNE) aim to reduce the number of features in a dataset while preserving its important characteristics. Unsupervised learning is often used for tasks such as anomaly detection, data compression, and exploratory data analysis. Deep Learning: Deep learning is a subset of machine learning that involves neural networks with multiple layers (deep neural networks). These networks can automatically learn hierarchical representations of data. Deep learning models are typically trained using large amounts of labeled data, and they learn to extract features directly from the raw data without the need for manual feature engineering. Deep learning has shown remarkable success in various tasks such as image recognition, natural language processing, speech recognition, and recommendation systems. Common architectures in deep learning include Convolutional Neural Networks (CNNs) for image-related tasks, Recurrent Neural Networks (RNNs) for sequence data, and Transformers for natural language processing tasks. While deep learning models can be used for unsupervised learning tasks (e.g., autoencoders for dimensionality reduction or generative adversarial networks for generating synthetic data), they are more commonly associated with supervised learning where they learn to map inputs to outputs.
26
key points of machine learning and AI in pharmacogenomics?
Personalized Medicine: Machine learning and AI techniques enable the development of personalized medicine approaches in pharmacogenomics. By analyzing an individual's genetic information, along with other relevant clinical data, these techniques can predict a patient's response to a particular drug or dosage regimen. Predictive Modeling: Machine learning algorithms can build predictive models that identify genetic markers or signatures associated with drug response or adverse reactions. These models can be used to stratify patient populations and guide treatment decisions, ultimately leading to more effective and safer drug therapies. Drug Discovery and Development: AI algorithms can accelerate drug discovery and development processes by analyzing vast amounts of genomic and chemical data. These techniques help identify potential drug targets, predict drug-drug interactions, optimize drug candidates, and design more effective clinical trials. Genomic Data Analysis: Machine learning methods are instrumental in analyzing large-scale genomic datasets, including genome-wide association studies (GWAS) and next-generation sequencing data. These techniques can uncover genetic variants associated with drug metabolism, pharmacokinetics, and pharmacodynamics. Drug Repurposing: AI-driven approaches facilitate drug repurposing efforts by identifying new therapeutic indications for existing drugs based on their genomic and pharmacological profiles. This approach can expedite the development of novel treatments for various diseases. Adverse Drug Reaction Prediction: Machine learning models can predict the likelihood of adverse drug reactions based on genetic factors, enabling proactive measures to mitigate risks and improve patient safety. Clinical Decision Support Systems: AI-powered clinical decision support systems integrate genomic data with electronic health records (EHRs) to provide healthcare professionals with personalized treatment recommendations and dosage adjustments tailored to individual patients.
27
Supervised Machine Learning (SML) methods we learned ?
Linear regression, K-nearest neighbours (KNN), Random Forest
28
* Unsupervised Machine Learning (UML)
K-means clustering, Hierarchical clustering, Principal component
29
AI, Machine Learning, Neural Network and Deep Learning. What’s the difference?
Machine learning (ML) is a subfield of AI, or a path to AI Algorithms to learn insights and recognise patterns from data Deep Learning and Neural Networks are methods of ML Deep Learning structures algorithms in Neural Networks, with the aim of teaching them to take decisions
30
Supervised Machine Learning (SML) , how does it work ?
In SML, algorithms learn from labelled data * Regression is used to understand the relationship between dependent and independent variables * Classification assign test data into categories based on specific variables
31
Simple Linear (and logistic) regression , when can we apply it ?
Used to predict (forecast) the value of the dependent variable based on the independent variable * Linear regression is applied on continuous variables, whilst logistic regression on discrete
32
Simple linear regression, how does it work ?
* Residuals can be used to validate the model by making sure that they are independent and normally distributed * As independent variables increases, multiple linear regression is applied 𝑦 = 𝑎 + 𝑏𝑥+ ∈
33
Multiple linear regression, how does it work?
* Builds a model to describe Y in the best way using Xn * Use independent variables to predict the dependent variable. Example: à Total Cholesterol = a + b1*BMI + b2*Time exercising + b3*Shoe size… + ∈ * But is shoe size relevant? 𝑦 = 𝑎 + 𝑏!𝑥! + 𝑏"𝑥" + 𝑏#𝑥# + …+ ∈
34
Multiple linear regression assumptions ?
* Parametric test based on assumptions: - Linear relationship between Y and X - Xi are not highly correlated with each other -The variance of the residuals is constant - Independence of observations - Residuals are normally distributed
35
how can we test a Multiple linear regression model?
* Model can be tested with Root Mean Square Error (RMSE), the standard deviation of the residuals: adding all the residuals squared, deviding by the number sample size, quare rooting all
36
how to use Multiple linear regression for prediction ?
1. Create a random 80/20 split of the data, generating training data (80%) and test data (20%) 2. Train a regression model on the training data 3. Apply the model on the test data 4. Calculate RMSE of the training data (in-sample RMSE) and test data (out-of-sample RMSE) * Compare the RMSE. Indicates how well the model performs on new data. * More complex model à Decreasing RMSE à Overfitting
37
Linear regression models pros and cons ?
Pros: * Can be used on continuous (linear) and discrete (logistic) data * Determine influence of independent variables on the dependent * Identifying outliers Cons: * No mixed data (continuous & discrete * Many assumptions * Requires complete data and no missing data
38
K-nearest neighbors (KNN) , how does it work ?
* Non-parametric algorithm i.e. no strong assumptions * Often used for classification, predicting the group of a data point * Applies majority voting based on: Distance metrics Number of K’s 1. Calculate the distances, usually with Euclidean distance 2. Find the nearest neighbours by ranking the distances 3. Majority vote on the predicted class label based on the K nearest neighbours K is the number of nearest neighbors taken into account
39
KNN pros and cons ?
Pros: * It is easy to implement * No need to train a model * Versatile, distance algorithms can handle different types of data Cons: * Data should be of the same scale which can be difficult with large datasets * Setting the K can be challenging Tips: * Test different K’s * K should be odd numbers to avoid any draws
40
Decision tree and random forest, how does it work ?
Random forest is based on decision tree’s * Generates many decision tree’s creates the random forest to classify unlabeled dataà A single tree is not accurate * Can use both categorical and continuous variables Random forest 1. Create a bootstrapped dataset that is the same size of the original à Randomly selected data, where duplicates are allowed 2. Create a decision tree using the bootstrapped data using a random subset of variables 3. Repeat 1 and 2 multiple times 4. Impute your unlabeled data and let the random forests’ many classifiers label 5. Majority vote classifies the unlabeled data Random forest validation with Out-of-Bag * The Random forest model can be validated using the Out-of-bag error * The Random forest is used to predict labels of data not selected for the bootstrapped data (test set)
41
Random forest pros and cons ?
Pros: * Can be used on many types and mixes of data * Can be applied on both classification and regression problems * Can be applied on data with missing values * No overfitting and curse of dimensionality Cons: * Very complex and you can’t follow the decision of the tree * Training the model takes time and computing power
42
types oreasons to use Unsupervised machine learning (UML) and how does it work ?
* In UML, algorithms are used to analyze and cluster unlabelled data àData grouping based on patterns àSimilarities and differences of the data * Clustering is applied on raw data and groups it based on similarities and differences between the structure and/or patterns of the data * Dimensionality reduction can be applied to reduce complexity of data whilst preserving the structure to reduce ”noise” and overfitting ML algorithms.
43
K-means clustering ?
* Not to be confused with KNN * Groups similar datapoints in clusters * K is the number of cluster and means generated 1. Set the number of K’s With Elbow plot 2. Generates K random centroids 3. Creates K clusters by assigning each data point to closest centroid 4. Calculates new centroids for each cluster 5. Reassigns points with new centroids If new assignments, repeat 4 If no new assignments, terminate algorithmElbow plot determines number of K’s * First step of K-means clustering is to set the K * The Elbow method is common * Distortions is the sum of squared distances of data points from cluster centers -Decreases as K increases. -0 when K = number of points
44
Hierarchical clustering, how does it work ?
* Groups similar data points to clusters * Defines clusters that are distinct from each other and datapoints within are similar * Creates cluster by ordering clusters: - Bottom-up (Agglomerative) - Top-down (Divisive) * The length of the branch in the dendogram show how similar the data points are. à Long branch = dissimilar, short branch = similar
45
Hierarchical clustering pros and cons ?
Pros: * Easy to use * The dendrogram gives information about the data structure * Can be used to set number of clusters Cons: * Sensitive to outliers * Does not work well with missing data or mixed data * In complex data, difficult to determine number of relevant clusters
46
Principal component analysis (PCA)?
Common and versatile method used for: * Analysing the structure of data features * Pre-processing for other ML algorithms * Visualisation Summarises large multi-dimensional datasets to smaller number of dimensions (ideally 2) that can be visualised 1. Plot the data. Gene 1 & 2 is higher in sample 1 & 2… 2. Calculate the average of gene 1 and 2 (and n) to find the center of the data. 3. Center the data at the origin (0,0) Find the line, through the origin, with the best fit. The best fit is defined by PCA projecting the distance of the point to the line and minimizing it. The line is called Principal Component 1 (PC1) The eigenvectors are calculated. Higher loading indicated more influence on the PC i.e. Gene 1 (0.82) influence more than Gene 2 (0.57). Multi-dimensions and PC n * PC2 is perpendicular to PC1. PC3 is perpendicular to PC1 and PC2 etc. * PCs are the same number as genes * PC1 explains most of the variance in the data. P2 the second most etc. * Projection in 2D, so two PC’s are projected * The datapoints are projected onto PC. * Hopefully, we see some clustering…
47
PCA pros and cons ?
Pros: * Can remove noise (correlated features) * Improve ML algorithms by removing noise à Reduces overfitting * Visualisation Cons: * PCA turns independent variables to PC’s which can be hard to interpretate * Requires standardised data and therefore does not work well on mixed data Karolinska Institutet 03/02/2023 35 tSNE and UMAP are advancements of PCA, projecting the data better making clustering easier
48
how is actually ML being used in medicine and pharmacology?
* ML algorithms are used together * Nested in networks or parts of pipelines * Used as tools, from a ML toolbox * Important to know when and why to use it
49
GestaltMatcher and Face2Gene is an example of the use of which ML type ?
Supervised classifiers are often used in image analysis, for example when diagnosing rare diseases. Here, KNN is nested into a Deep Neural Network. Datapoints in the KNN is other phenotype patients
50
what does DESeq2 do?
* Most used method in analysing bulk RNAsequencing data * Other methods are limma and edgeR. Commom aim is to find differentially expressed genes (proteins, lipids etc.)