4 - Generative modeling Flashcards

(44 cards)

1
Q

What is QSAR modeling?

A

Quantitative Structure-Activity Relationship modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of virtual screening in drug discovery?

A

To identify potential drug candidates from a large library of compounds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does GNN stand for?

A

Graph Neural Networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is zero/few-shot learning?

A

A machine learning approach that requires very few examples to learn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define proteochemometrics.

A

A field that combines computational chemistry and bioinformatics to analyze biological data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is contrastive learning?

A

A technique that learns to differentiate between similar and dissimilar data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the estimated number of synthesized molecules in the chemical space of drug-like compounds?

A

Approximately 10^8 synthesized molecules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the goal of distribution learning in molecule generation?

What model types are used, and how is the process evaluated?

A

Goals:
- To mimic the property distribution of known molecules
- learn the syntax and produce new, realistic molecules

Models: VAE (Variational Autoencoders), GANs (Generative Adversarial Networks), Diffusion models

Evaluation:
- validity, uniqueness
- similarity to reference molecules
- FCD (Frechet ChemNet Distance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does goal-directed learning optimize for?
What are typical use cases?

A

A specific objective, such as bioactivity

  • drug candidate optimization
  • Hit-to-lead or lead-to-candidate transitions
  • Finding high-potency molecules with low toxicity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is conditional generation in the context of molecular design?

A

Generating molecules based on explicit input conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the significance of SMILES in molecular representation?

A

A notation that encodes molecular structures in a textual format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does LSTM stand for?

A

Long Short-Term Memory, a type of recurrent neural network that’s designed to handle sequential data.
They can work better with long-term dependencies (don’t forget important things over time) than regular RNNs. (Using gates)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or False: The LSTM can achieve up to 98% correct compounds in SMILES generation.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a key challenge in generating molecules using machine learning?

A

Unclear synthesis methods for the generated molecules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is mode collapse in goal-directed generative models?

A

The tendency to repeatedly generate the same or very similar molecules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Pareto ranking in multi-objective optimization?

A

Selecting non-dominated solutions based on multiple conflicting objectives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Fill in the blank: The scoring function in continuous optimization can be described as _______.

A

Hill-climbing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are some strategies to promote diversity in generated molecules?

A
  • Select diverse high-scoring molecules
  • Mix exploitation with randomized exploration
  • Penalize redundant solutions
  • Expand around key substructures
  • Sample proportionally to reward
  • Reuse diverse SMILES variants during training
19
Q

What does the acronym VAE stand for?

A

Variational Autoencoder

20
Q

What is the role of scoring functions in drug design?

A

To evaluate and rank generated molecules based on desired properties

21
Q

What is the estimated number of all drug-like molecules?

A

Approximately 10^60

22
Q

What is the significance of using a memory buffer in molecule generation?

A

To penalize rewards for similar molecules and promote diversity

23
Q

What does the term ‘chemical space’ refer to?

A

The vast array of possible molecular structures and properties

24
Q

What is the purpose of hybrid models in molecule generation?

A

To combine different modeling approaches for improved generation

25
What is the primary benefit of using diffusion models in molecular generation?
To generate diverse, high-quality molecular structures
26
What is the importance of scoring functions in multi-objective optimization?
They help balance trade-offs between conflicting objectives in drug design
27
What is the primary goal of sampling candidates in generative models?
to generate diverse and valid molecules that explore the learned chemical space, enabling discovery of novel structures with desired properties
28
What is conditional generation in the context of molecular design?
Generate molecules that explicitly satisfy user-defined conditions, such as property values, shapes, or target sequences ## Footnote It involves learning a shared latent space between molecules and conditions.
29
Which model types are commonly used in conditional generation?
* Conditional VAEs * Conditional RNNs and Transformers * Conditional Diffusion Models
30
What is an advantage of conditional generation?
Direct control over properties or constraints via conditioning input ## Footnote This eliminates the need for a scoring function.
31
What is a disadvantage of conditional generation?
Requires paired data (molecules with known properties or targets) ## Footnote This limits generalization to unseen or rare conditions.
32
What is one of the metrics for evaluating generative models?
Chemical/Structural Validity and Non-Redundancy ## Footnote This includes metrics like the percentage of syntactically correct molecules.
33
What does the term 'diversity of generated molecules' refer to?
Historically: internal diversity; fraction of unique compounds/scaffolds New: #Circles (clustering)
34
What does property-based similarity involve?
Compare distributions of molecular descriptors, e.g., logP, MW, TPSA ## Footnote Metrics include Kullback-Leibler (KL) divergence and Kolmogorov-Smirnov (KS) distance.
35
What is Fréchet ChemNet Distance used for?
To compare how molecules are perceived by a pretrained neural network ## Footnote It maps molecules to Gaussian distributions.
36
What is the importance of desirability in goal-directed learning?
The objective is to produce molecules that satisfy all specified goals, not just valid molecules ## Footnote Desirable molecules must also be suitable for real drug discovery.
37
Fill in the blank: Generative deep learning methods can readily be used to generate new _______.
[molecules]
38
What are the three strategies of training generative models?
* Distribution learning * Goal-directed learning * Conditional generation
39
What is a key aspect of evaluating the quality of generated molecules?
Desirability, synthesizability, QED, etc. ## Footnote This evaluation is not straightforward.
40
What different approaches exist for molecule generation?
1. Autoregressive models Sequence-based (e.g., SMILES generation) Graph-based (generate atoms and bonds step by step) 2. Variational Autoencoders (VAEs) Encode molecule to a latent space, sample, decode 3. Generative Adversarial Networks (GANs) Compete a generator vs. discriminator to create realistic molecules 4. Diffusion models Gradually transform noise into a molecule (high diversity and quality) 5. Rule-based / Template-based Apply known reaction templates or synthesis rules 6. Matrix-based Generate adjacency matrices and atom types (less common)
41
Model types for distribution learning
- autoregressive models - generative adversarial network - variational autoencoder - flow-based model - diffusion models - hybrid models
42
Validity, novelty, uniqueness definition (in molecule generation)
Validity: % of syntactically correct molecules Novelty: % of molecules that were not in the training set Uniqueness: % of unique molecules
43
What models are used in goal-directed learning (molecule generation)?
1. Reinforcement learning 2. Genetic algorithms 3. Reward-guided transformers (continuous optimization) 4. Iterative distributed learning
44
What is the Fréchet ChemNet Distance?
A metric used to evaluate how similar two sets of molecules are — typically: Generated molecules vs. Training (reference) molecules. Method: compare real molecules and generated molecules by checking: Do their feature Gaussian distributions (in a neural network) look similar? A pretrained NN (ChemNet) is used