4 - Generative modeling Flashcards
(44 cards)
What is QSAR modeling?
Quantitative Structure-Activity Relationship modeling
What is the purpose of virtual screening in drug discovery?
To identify potential drug candidates from a large library of compounds
What does GNN stand for?
Graph Neural Networks
What is zero/few-shot learning?
A machine learning approach that requires very few examples to learn
Define proteochemometrics.
A field that combines computational chemistry and bioinformatics to analyze biological data
What is contrastive learning?
A technique that learns to differentiate between similar and dissimilar data points
What is the estimated number of synthesized molecules in the chemical space of drug-like compounds?
Approximately 10^8 synthesized molecules
What is the goal of distribution learning in molecule generation?
What model types are used, and how is the process evaluated?
Goals:
- To mimic the property distribution of known molecules
- learn the syntax and produce new, realistic molecules
Models: VAE (Variational Autoencoders), GANs (Generative Adversarial Networks), Diffusion models
Evaluation:
- validity, uniqueness
- similarity to reference molecules
- FCD (Frechet ChemNet Distance)
What does goal-directed learning optimize for?
What are typical use cases?
A specific objective, such as bioactivity
- drug candidate optimization
- Hit-to-lead or lead-to-candidate transitions
- Finding high-potency molecules with low toxicity
What is conditional generation in the context of molecular design?
Generating molecules based on explicit input conditions
What is the significance of SMILES in molecular representation?
A notation that encodes molecular structures in a textual format
What does LSTM stand for?
Long Short-Term Memory, a type of recurrent neural network that’s designed to handle sequential data.
They can work better with long-term dependencies (don’t forget important things over time) than regular RNNs. (Using gates)
True or False: The LSTM can achieve up to 98% correct compounds in SMILES generation.
True
What is a key challenge in generating molecules using machine learning?
Unclear synthesis methods for the generated molecules
What is mode collapse in goal-directed generative models?
The tendency to repeatedly generate the same or very similar molecules
What is Pareto ranking in multi-objective optimization?
Selecting non-dominated solutions based on multiple conflicting objectives
Fill in the blank: The scoring function in continuous optimization can be described as _______.
Hill-climbing
What are some strategies to promote diversity in generated molecules?
- Select diverse high-scoring molecules
- Mix exploitation with randomized exploration
- Penalize redundant solutions
- Expand around key substructures
- Sample proportionally to reward
- Reuse diverse SMILES variants during training
What does the acronym VAE stand for?
Variational Autoencoder
What is the role of scoring functions in drug design?
To evaluate and rank generated molecules based on desired properties
What is the estimated number of all drug-like molecules?
Approximately 10^60
What is the significance of using a memory buffer in molecule generation?
To penalize rewards for similar molecules and promote diversity
What does the term ‘chemical space’ refer to?
The vast array of possible molecular structures and properties
What is the purpose of hybrid models in molecule generation?
To combine different modeling approaches for improved generation