exam Flashcards

question for the exam (62 cards)

1
Q

What is the basic structure of a Neuron (Perceptron) in ANN?

A

A Neuron (Perceptron) consists of inputs, weights, a bias, an activation function, and an output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of an activation function in Neural Networks?

A

The activation function determines whether a neuron should be activated or not, introducing non-linearity into the model.

Activation function selectively activates neurons, ensuring that the model does not depend on just a few but rather distributes learning efficiently. It also introduces non-linearity, which is crucial for deep learning models to recognize complex patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the steps involved in training a Neural Network?

A

The steps include initializing weights, feeding input data, calculating output, computing loss, and updating weights using backpropagation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we assess the performance of our model?

A

Performance can be assessed using metrics such as accuracy, precision, recall, F1 score, and loss.

Model performance is assessed using evaluation metrics, loss functions, and validation techniques. The choice of metric depends on whether the task is classification, regression, or clustering.

✅ Classification → Accuracy, Precision, Recall, F1-score.
✅ Regression → MSE, MAE, R² Score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can you highlight the differences between Batch Gradient Descent and Stochastic Gradient Descent in the context of Machine Learning?

A

Both tries to minimize loss - find where the gradiant of graph over loss function is 0

Batch Gradient Descent uses the entire dataset to compute gradients, while Stochastic Gradient Descent updates weights using one sample at a time.

Batch is computational costly when working with large dataset

Stochastic computes the gradient using small subsets of each iteration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which method is commonly used to determine optimal values for parameters like weights and biases in a Neural Network?

A

Gradient Descent is commonly used to determine optimal values for weights and biases.

The Gradient Descent algorithm is commonly used to optimize weights and biases in a neural network by minimizing the loss function.

✅ Key Idea – Adjust weights and biases iteratively to minimize the error between predictions and actual values.
✅ Uses Backpropagation – Computes gradients of the loss function with respect to parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a loss function, and why is it important?

A

A loss function quantifies how well the model’s predictions match the actual data, guiding the optimization process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What role do hyperparameters play in a Neural Network?

A

Hyperparameters control the learning process, including learning rate, batch size, and number of layers.

We dont know the best hyperparameters beforehand, there’s always a trade-off between speed, accuracy, and generalization.
We need to balance these hyperparameters rather than just maxing everything out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the parameters of a Neural Network?

A

Parameters include weights and biases that are learned during training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How should you select the suitable format of a neural network (MLP, RNN, CNN, GNN) for a project?

A

the right neural network architecture depends on the type of data you’re working with and the problem you’re solving

Multi-Layer Perceptron (MLP) - Tabular/Structured Data (e.g., Spreadsheets, Financial Data,
Convolutional Neural Network (CNN) - Image
Recurrent Neural Network (RNN) - Sequential Data (Text, Time-Series, Audio)
Graph Neural Network (GNN) - Graph-Structured Data (Networks, Social Relationships, Molecules, Knowledge Graphs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you select the most suitable setting for the loss function in ANN?

A

Selecting the right loss function for an Artificial Neural Network (ANN) depends on the type of task, output activation function, and data distribution.

✅ Classification → Use Cross-Entropy Loss (Binary or Categorical).
✅ Regression → Use MSE, MAE, or Huber Loss.
✅ Imbalanced Data → Use Weighted Loss Functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What exactly is Gradient Descent?

A

Gradient Descent is an optimization algorithm used to minimize the loss function by iteratively adjusting parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does Mean Squared Error (MSE) tell us in machine learning?

A

MSE measures the average squared difference between predicted and actual values. Is an indicating for the models accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does backpropagation work in Neural Networks?

A

Is the core algorithm used to train Neural Networks (NNs). It enables the model to learn by adjusting its weights and biases based on how much error it makes in predictions.

Backpropagation calculates gradients of the loss function with respect to each weight by applying the chain rule, allowing weights to be updated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain forward pass and backward pass in the ANN training process.

A

The forward pass computes the output from inputs, while the backward pass updates weights based on the error calculated from the output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is it important to split data into training and testing sets in machine learning?

A

Splitting data helps evaluate model performance on unseen data, preventing overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the difference between binary, multi-class, and multi-label classification? Also, explain which activation function is best suited for each.

A

Binary classification predicts two classes, multi-class predicts multiple classes, and multi-label predicts multiple labels. Sigmoid is best for binary, Softmax or cross entropy for multi-class, and Sigmoid for multi-label.

Classification tasks can be divided into three types based on the number and nature of output categories:

✅ Binary Classification → Two possible classes (e.g., Spam vs. Not Spam).
✅ Multi-Class Classification → More than two classes, but only one label per sample (e.g., Dog, Cat, or Bird).
✅ Multi-Label Classification → Each sample can belong to multiple categories (e.g., an image containing both a Dog and a Car).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which loss function is best suited for regression and classification?

A

Mean Squared Error is best for regression, while Cross-Entropy is best for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which activation function best suits the input layer in an MLP?

A

The ReLU activation function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which activation function best suits the input layer in a CNN?

A

Depends on task Sigmoid, softmax or Relu.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which activation function best suits the input layer in an RNN?

A

The Tanh activation function is often used in the input layer of an RNN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the role of the learning rate in Gradient Descent?

A

The learning rate (α) controls the step size of weight updates in Gradient Descent. It determines how quickly the model learns by adjusting its parameters to minimize the loss function

The learning rate determines the size of the steps taken towards the minimum of the loss function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Using PyTorch, what is the procedure for constructing a Neural Network encompassing various layers, including input, hidden, and output?

A

The procedure involves defining a class, initializing layers in the constructor, and implementing the forward method to define the forward pass.

In PyTorch, a Neural Network is built using the torch.nn.Module class, which includes input, hidden, and output layers. The key steps are:

✅ Step 1: Import necessary libraries.
✅ Step 2: Define the neural network architecture using torch.nn.Module.
✅ Step 3: Initialize weights and activation functions.
✅ Step 4: Define the forward pass.
✅ Step 5: Instantiate the model and set up loss and optimizer.
✅ Step 6: Train the model using forward and backward propagation.

💡 Example: A simple feedforward neural network for classification can be implemented in PyTorch following these steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What strategies can be employed to mitigate the issue of overfitting in a complex neural network?

A

Strategies include using dropout, regularization, early stopping, and data augmentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Which trained deep-learning model components should be saved for future use?
Components such as model architecture, weights, and optimizer state should be saved. When saving a deep-learning model for future use, the following key components should be stored: ✅ Model Architecture – Defines the structure (layers, neurons, activation functions). ✅ Trained Weights & Biases – The learned parameters from training. ✅ Optimizer State – Saves learning rate and momentum for resuming training. ✅ Training Configuration – Includes hyperparameters like batch size, dropout, and loss function. ✅ Preprocessing Steps – Ensures input data is processed the same way during inference.
26
How to prevent overfitting in Neural Networks?
Overfitting can be prevented through techniques like regularization, dropout, and using more training data. Overfitting occurs when a Neural Network memorizes training data instead of learning general patterns. To prevent this, use regularization techniques, data augmentation, dropout, and early stopping. ✅ Regularization (L1/L2, Dropout) ✅ More Training Data or Data Augmentation ✅ Early Stopping ✅ Batch Normalization ✅ Smaller Network Architecture (Reduce Complexity)
27
Why can't we use a Multilayer Perceptron (MLP) for sequential data?
MLPs do not have memory and cannot capture temporal dependencies in sequential data. A Multilayer Perceptron (MLP) is not well-suited for sequential data because: ❌ No Memory – MLPs treat inputs as independent and cannot retain previous information. ❌ Fixed Input Size – Requires fixed-length inputs, making them ineffective for variable-length sequences. ❌ Ignores Temporal Relationships – Cannot capture dependencies between past and future elements.
28
Why is preparing data so important for RNN models?
Data preparation ensures that sequences are properly formatted and normalized, which is crucial for RNN performance.
29
What is a Recurrent Neural Network, and how does it function?
An RNN is a type of neural network designed for sequential data, where connections between nodes can create cycles, allowing information to persist.
30
How can RNNs be used for tasks such as time series analysis?
RNNs can model temporal dependencies in time series data, making them suitable for forecasting and pattern recognition. Recurrent Neural Networks (RNNs) are used in time series analysis to process sequential data and make predictions based on past patterns. They remember previous inputs and learn temporal dependencies, making them ideal for forecasting trends and detecting anomalies.
31
What are the common challenges and pitfalls to avoid when working with RNNs?
Challenges include vanishing gradients, exploding gradients, and difficulty in capturing long-term dependencies. One of the biggest challenges with RNNs is the problem of vanishing or exploding gradients. This occurs when the gradients of the loss function with respect to the parameters of the RNN become very small or very large as they propagate through time. This can make it difficult to train the RNN effectively, as the updates to the parameters will be very small or very large, and the network will not learn effectively.
32
What is an LSTM network, and how is it different from traditional RNNs?
An LSTM is a type of RNN designed to remember long-term dependencies, using gates to control the flow of information and reduce vanishing gradient.
33
What are the purposes and benefits of using LSTMs for tasks such as sequence generation?
LSTMs are beneficial for tasks requiring memory of past events, such as language modeling and text generation. Long Short-Term Memory (LSTM) networks are ideal for sequence generation because they retain long-term dependencies and effectively model complex temporal patterns. ✅ Purposes: Generates text, speech, or music in a structured manner. Captures long-range dependencies in sequences. Ensures coherent and contextually relevant outputs. ✅ Benefits: Prevents vanishing gradients (unlike standard RNNs). Stores past information using cell state & gating mechanisms. Generates smooth and context-aware sequences.
34
How does an LSTM manage information differently than a traditional RNN?
LSTMs use memory cells and gates to selectively remember or forget information, addressing the vanishing gradient problem. Long Short-Term Memory (LSTM) networks manage information using a memory cell and three gates (Forget, Input, Output), whereas traditional RNNs rely only on hidden states. This design allows LSTMs to handle long-term dependencies better than standard RNNs.
35
What is the vanishing gradient problem, and how does LSTM address it?
The vanishing gradient problem occurs when gradients become too small for effective learning. LSTMs mitigate this with their gating mechanisms. ✅ Uses a Cell State – A separate memory mechanism that allows information to flow unchanged over long sequences. ✅ Has Gating Mechanisms – Controls what information to keep, update, or forget.
36
How does an LSTM remember long-term dependencies?
LSTMs use a cell state that carries information across time steps, allowing them to remember long-term dependencies. Long Short-Term Memory (LSTM) networks retain long-term dependencies using a memory cell and gating mechanisms that regulate information flow. ✅ Key Feature – LSTMs use Forget, Input, and Output gates to selectively store or discard information. ✅ Prevents Vanishing Gradient Problem – Unlike standard RNNs, LSTMs preserve important signals over long sequences.
37
What are attention mechanisms in neural networks, and why are they useful?
Attention mechanisms allow models to focus on specific parts of the input sequence, improving performance on tasks like translation.
38
Explain the main structure of the Transformer model.
The Transformer model consists of an encoder-decoder architecture with self-attention mechanisms and feed-forward neural networks. The Transformer model consists of an encoder and a decoder, both made up of multiple layers. Each layer has multi-head self-attention, feedforward neural networks (FFN), and layer normalization with residual connections. The encoder processes the input, and the decoder generates the output while attending to encoder outputs. 1️⃣ Input: Words go in (e.g., "The cat is sleeping."). 2️⃣ Embedding: Converts words into numbers for understanding. 3️⃣ Self-Attention 🪄: Focuses on important words (e.g., "cat" is linked to "sleeping"). 4️⃣ Layers: Processes information in multiple steps to refine meaning. 5️⃣ Output: Generates a smart response!
39
Explain the various types of text embeddings (Word and Sentence embeddings)?
Text embeddings are numerical vector representations of words or sentences that capture semantic meaning. They transform text into a format that neural networks can process. ✅ Word Embeddings – Represent individual words. ✅ Sentence Embeddings – Represent entire sentences or documents.
40
What's the difference between pre-training and fine-tuning a model?
Pre-training involves training a model on a large dataset, while fine-tuning adjusts the model on a smaller, task-specific dataset. fine-tuning specifically aims to adapt a pre-trained model to a particular task or domain with less data and fewer resources, continued pre-training seeks to broaden or deepen the model's knowledge and capabilities by training it further on additional data, potentially improving its generalization
41
What does the structure of a BERT model look like?
BERT (Bidirectional Encoder Representations from Transformers) is an encoder-only Transformer model made up of multiple Transformer encoder layers. It is designed to process text bidirectionally, meaning it considers both left and right context in a senten A BERT model consists of multiple transformer layers, including attention heads and feed-forward networks, designed for bidirectional context.
42
How do Transformer models create text embeddings (explain based on the encoder or decoder part of a transformer model)?
Transformers create embeddings by processing input through self-attention layers and encoding contextual information. Transformer models create text embeddings using self-attention mechanisms and feedforward layers. Depending on whether we use the encoder or decoder, embeddings are processed differently: ✅ Encoder (BERT-like models) – Generates contextualized embeddings for input text. ✅ Decoder (GPT-like models) – Produces autoregressive embeddings for text generation. 💡 Example: In BERT, the word "bank" gets different embeddings based on whether it means "riverbank" or "financial bank".
43
How are word embeddings different from sentence embeddings?
Word embeddings represent individual words as dense numerical vectors in a high-dimensional space. These vectors capture semantic meaning and relationships between words. "king" and "queen" have similar word embeddings. Word embeddings help models understand word-level meaning but not full sentence context. Word embeddings represent individual words, while sentence embeddings capture the meaning of entire sentences.
44
How does the attention mechanism work in neural networks?
The attention mechanism allows a neural network to focus on relevant parts of the input while processing information. Instead of treating all input data equally, attention assigns higher weights to important elements and reduces focus on less relevant ones. ✅ Key Benefit – Helps models handle long-range dependencies efficiently. ✅ Common in – Transformers, NLP, Computer Vision, Speech Recognition.
45
What are the output dimensions of embeddings when using BERT and SBERT for the sentence 'I am a data scientist.'?
BERT typically outputs a vector of 768 dimensions - (5, 768) maybe differnet with tags SBERT take the entire sentence - (1, 768)
46
Why might one need to adjust a language model for specific tasks?
Adjusting a language model can improve its performance on specific tasks by tailoring it to the nuances of the task. Pre-trained language models (like BERT, GPT, or LLaMA) learn general language understanding, but they may not perform optimally on specific tasks. ✅ General vs. Task-Specific Knowledge – A general model may lack domain-specific terminology or formatting. ✅ Improve Accuracy – Fine-tuning aligns the model with task-specific datasets. ✅ Reduce Hallucinations – Ensures the model generates factually correct responses in a specific domain.
47
How can Large Language Models be optimized effectively?
LLMs can be optimized using efficient training, model architecture improvements, and inference optimizations to reduce computational costs while maintaining performance. Optimization can involve adjusting hyperparameters, using better training data, and employing techniques like distillation. Large Language Models (LLMs) can be optimized using techniques that reduce memory usage, improve inference speed, and enhance training efficiency while maintaining performance. ✅ Optimization Techniques → Quantization, Pruning, LoRA, Distillation, Efficient Transformers. ✅ Hardware Optimization → GPU parallelization, TPUs, FlashAttention. ✅ Data Efficiency → Fine-tuning, Retrieval-Augmented Generation (RAG), Curriculum Learning. 💡 Example: GPT models use LoRA and Quantization to reduce computational costs.
48
What is Prompt Engineering, and how does it work?
Prompt engineering is about designing effective prompts to get the best response from an AI model. Prompt Engineering is the practice of designing effective inputs (prompts) to guide Large Language Models (LLMs) like GPT, BERT, or Claude to produce accurate and useful responses. It optimizes the model's output without changing its internal parameters. Prompt Engineering involves designing input prompts to elicit desired responses from language models.
49
Explain the Parameter-Efficient Fine-Tuning approach for fine-tuning LLMs.
Parameter-Efficient Fine-Tuning (PEFT) is a technique used to adapt Large Language Models (LLMs) to new tasks while modifying only a small subset of model parameters instead of fine-tuning the entire model. This drastically reduces computational cost, memory usage, and training time compared to full fine-tuning. Parameter-Efficient Fine-Tuning focuses on adjusting a small subset of model parameters to reduce computational costs.
50
What issues should be considered when fine-tuning large language models, such as catastrophic forgetting?
When fine-tuning Large Language Models (LLMs), several key challenges must be addressed to ensure optimal performance and avoid problems like catastrophic forgetting, overfitting, and resource constraints. Catastrophic Forgetting occurs when a fine-tuned model forgets previously learned general knowledge from pretraining as it learns a new task.
51
How does Retrieval-Augmented Generation (RAG) work in LLMs?
RAG combines retrieval of relevant documents with generation, improving the quality and relevance of responses. Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by retrieving relevant external knowledge from a database or search system before generating a response. ✅ Combines: Retrieval – Fetches relevant documents based on the query. Generation – Uses an LLM to generate an informed response using retrieved knowledge.
52
How can Prompt Engineering improve the responses of a language model?
Effective prompts can guide the model to produce more accurate and contextually relevant responses.
53
How does information move through a Graph Neural Network?
Information moves through a GNN by aggregating features from neighboring nodes, allowing for relational learning. In a Graph Neural Network (GNN), information is passed between nodes using a process called Message Passing or Neighborhood Aggregation. Each node updates its representation by gathering information from its neighbors over multiple iterations (layers).
54
What are LLM agents, and how do they differ from standard language models?
LLM agents are enhanced language models that incorporate memory and reasoning capabilities, enabling more complex interactions. LLM Agents are AI systems that extend Large Language Models (LLMs) by giving them decision-making, tool usage, and reasoning capabilities. They interact dynamically with their environment, enabling them to complete complex tasks beyond static text generation.
55
Why are LLM agents particularly useful for tasks that require sequential reasoning and memory?
LLM agents can maintain context and recall previous interactions, making them suitable for tasks requiring sequential reasoning. LLM Agents are designed to handle multi-step decision-making and long-term memory storage, making them ideal for tasks that require logical progression and contextual awareness over multiple interactions.
56
How can LLM agents handle complex legal or technical questions that require in-depth analysis and planning?
LLM agents can utilize their memory and reasoning capabilities to analyze context and generate informed responses. LLM Agents use multi-step reasoning, retrieval-augmented generation (RAG), external tool integration, and structured workflows to break down, analyze, and answer complex legal or technical questions accurately.
57
What are the main benefits of using LLM agents in real-world applications?
Benefits include improved accuracy, contextual understanding, and the ability to handle complex queries. LLM Agents offer enhanced reasoning, automation, memory, and integration capabilities, making them ideal for complex, real-world tasks beyond basic text generation.
58
What are the key components of an LLM agent, and how do they contribute to its functionality?
Key components include memory, reasoning modules, and external tools, enhancing the agent's ability to interact and respond. An LLM Agent consists of multiple components that allow it to understand, reason, plan, and interact with its environment. These include a language model core, reasoning module, memory, retrieval tools, and execution mechanisms.
59
How does the memory component (short-term vs. long-term) enhance an LLM agent's performance?
Short-term memory allows for immediate context retention, while long-term memory enables recall of past interactions, improving coherence. Memory in LLM agents allows them to retain information over multiple interactions, improving context understanding, personalization, and reasoning. ✅ Short-term memory → Stores temporary session-based information. ✅ Long-term memory → Retains knowledge across interactions for personalized responses.
60
Why is planning essential for LLM agents, and how do techniques like Chain-of-Thought and Tree-of-Thought improve their reasoning capabilities?
Planning is essential for LLM Agents because it allows them to break down complex tasks, execute multi-step reasoning, and improve decision-making instead of generating isolated responses. While techniques like Chain of Thought (CoT) and Tree of Thought (ToT) enhance reasoning. CoT - Guides the model to explain step-by-step reasoning before giving a final answer. ToT - Expands multiple solution paths like a tree, evaluates them, and selects the best one.
61
What are tools' roles in LLM agents, and how do they enable agents to interact with external environments?
Tools allow LLM agents to access external data and perform actions, expanding their capabilities beyond text generation. Tools in LLM Agents are external functions, APIs, or systems that allow the agent to interact with the real world. They extend the model’s capabilities beyond text generation by enabling retrieval, computation, execution, and automation.
62
Explain the Masked Language Modeling (MLM) approach for pre-training.
Masked Language Modeling (MLM) is a self-supervised learning approach used in NLP models like BERT to pre-train on large text datasets. It involves randomly masking some words in a sentence and training the model to predict the missing words based on the surrounding context.