exam Flashcards

Question

Which trained deep-learning model components should be saved for future use?

Answer 1

Components such as model architecture, weights, and optimizer state should be saved. When saving a deep-learning model for future use, the following key components should be stored: ✅ Model Architecture – Defines the structure (layers, neurons, activation functions). ✅ Trained Weights & Biases – The learned parameters from training. ✅ Optimizer State – Saves learning rate and momentum for resuming training. ✅ Training Configuration – Includes hyperparameters like batch size, dropout, and loss function. ✅ Preprocessing Steps – Ensures input data is processed the same way during inference.

Answer 2

Overfitting can be prevented through techniques like regularization, dropout, and using more training data. Overfitting occurs when a Neural Network memorizes training data instead of learning general patterns. To prevent this, use regularization techniques, data augmentation, dropout, and early stopping. ✅ Regularization (L1/L2, Dropout) ✅ More Training Data or Data Augmentation ✅ Early Stopping ✅ Batch Normalization ✅ Smaller Network Architecture (Reduce Complexity)

Answer 3

MLPs do not have memory and cannot capture temporal dependencies in sequential data. A Multilayer Perceptron (MLP) is not well-suited for sequential data because: ❌ No Memory – MLPs treat inputs as independent and cannot retain previous information. ❌ Fixed Input Size – Requires fixed-length inputs, making them ineffective for variable-length sequences. ❌ Ignores Temporal Relationships – Cannot capture dependencies between past and future elements.

Answer 4

Data preparation ensures that sequences are properly formatted and normalized, which is crucial for RNN performance.

Answer 5

An RNN is a type of neural network designed for sequential data, where connections between nodes can create cycles, allowing information to persist.

Answer 6

RNNs can model temporal dependencies in time series data, making them suitable for forecasting and pattern recognition. Recurrent Neural Networks (RNNs) are used in time series analysis to process sequential data and make predictions based on past patterns. They remember previous inputs and learn temporal dependencies, making them ideal for forecasting trends and detecting anomalies.

Answer 7

Challenges include vanishing gradients, exploding gradients, and difficulty in capturing long-term dependencies. One of the biggest challenges with RNNs is the problem of vanishing or exploding gradients. This occurs when the gradients of the loss function with respect to the parameters of the RNN become very small or very large as they propagate through time. This can make it difficult to train the RNN effectively, as the updates to the parameters will be very small or very large, and the network will not learn effectively.

Answer 8

An LSTM is a type of RNN designed to remember long-term dependencies, using gates to control the flow of information and reduce vanishing gradient.

Answer 9

LSTMs are beneficial for tasks requiring memory of past events, such as language modeling and text generation. Long Short-Term Memory (LSTM) networks are ideal for sequence generation because they retain long-term dependencies and effectively model complex temporal patterns. ✅ Purposes: Generates text, speech, or music in a structured manner. Captures long-range dependencies in sequences. Ensures coherent and contextually relevant outputs. ✅ Benefits: Prevents vanishing gradients (unlike standard RNNs). Stores past information using cell state & gating mechanisms. Generates smooth and context-aware sequences.

Answer 10

LSTMs use memory cells and gates to selectively remember or forget information, addressing the vanishing gradient problem. Long Short-Term Memory (LSTM) networks manage information using a memory cell and three gates (Forget, Input, Output), whereas traditional RNNs rely only on hidden states. This design allows LSTMs to handle long-term dependencies better than standard RNNs.

Answer 11

The vanishing gradient problem occurs when gradients become too small for effective learning. LSTMs mitigate this with their gating mechanisms. ✅ Uses a Cell State – A separate memory mechanism that allows information to flow unchanged over long sequences. ✅ Has Gating Mechanisms – Controls what information to keep, update, or forget.

Answer 12

LSTMs use a cell state that carries information across time steps, allowing them to remember long-term dependencies. Long Short-Term Memory (LSTM) networks retain long-term dependencies using a memory cell and gating mechanisms that regulate information flow. ✅ Key Feature – LSTMs use Forget, Input, and Output gates to selectively store or discard information. ✅ Prevents Vanishing Gradient Problem – Unlike standard RNNs, LSTMs preserve important signals over long sequences.

Answer 13

Attention mechanisms allow models to focus on specific parts of the input sequence, improving performance on tasks like translation.

Answer 14

The Transformer model consists of an encoder-decoder architecture with self-attention mechanisms and feed-forward neural networks. The Transformer model consists of an encoder and a decoder, both made up of multiple layers. Each layer has multi-head self-attention, feedforward neural networks (FFN), and layer normalization with residual connections. The encoder processes the input, and the decoder generates the output while attending to encoder outputs. 1️⃣ Input: Words go in (e.g., "The cat is sleeping."). 2️⃣ Embedding: Converts words into numbers for understanding. 3️⃣ Self-Attention 🪄: Focuses on important words (e.g., "cat" is linked to "sleeping"). 4️⃣ Layers: Processes information in multiple steps to refine meaning. 5️⃣ Output: Generates a smart response!

Answer 15

Text embeddings are numerical vector representations of words or sentences that capture semantic meaning. They transform text into a format that neural networks can process. ✅ Word Embeddings – Represent individual words. ✅ Sentence Embeddings – Represent entire sentences or documents.

Answer 16

Pre-training involves training a model on a large dataset, while fine-tuning adjusts the model on a smaller, task-specific dataset. fine-tuning specifically aims to adapt a pre-trained model to a particular task or domain with less data and fewer resources, continued pre-training seeks to broaden or deepen the model's knowledge and capabilities by training it further on additional data, potentially improving its generalization

Answer 17

BERT (Bidirectional Encoder Representations from Transformers) is an encoder-only Transformer model made up of multiple Transformer encoder layers. It is designed to process text bidirectionally, meaning it considers both left and right context in a senten A BERT model consists of multiple transformer layers, including attention heads and feed-forward networks, designed for bidirectional context.

Answer 18

Transformers create embeddings by processing input through self-attention layers and encoding contextual information. Transformer models create text embeddings using self-attention mechanisms and feedforward layers. Depending on whether we use the encoder or decoder, embeddings are processed differently: ✅ Encoder (BERT-like models) – Generates contextualized embeddings for input text. ✅ Decoder (GPT-like models) – Produces autoregressive embeddings for text generation. 💡 Example: In BERT, the word "bank" gets different embeddings based on whether it means "riverbank" or "financial bank".

Answer 19

Word embeddings represent individual words as dense numerical vectors in a high-dimensional space. These vectors capture semantic meaning and relationships between words. "king" and "queen" have similar word embeddings. Word embeddings help models understand word-level meaning but not full sentence context. Word embeddings represent individual words, while sentence embeddings capture the meaning of entire sentences.

Answer 20

The attention mechanism allows a neural network to focus on relevant parts of the input while processing information. Instead of treating all input data equally, attention assigns higher weights to important elements and reduces focus on less relevant ones. ✅ Key Benefit – Helps models handle long-range dependencies efficiently. ✅ Common in – Transformers, NLP, Computer Vision, Speech Recognition.

Answer 21

BERT typically outputs a vector of 768 dimensions - (5, 768) maybe differnet with tags SBERT take the entire sentence - (1, 768)

Answer 22

Adjusting a language model can improve its performance on specific tasks by tailoring it to the nuances of the task. Pre-trained language models (like BERT, GPT, or LLaMA) learn general language understanding, but they may not perform optimally on specific tasks. ✅ General vs. Task-Specific Knowledge – A general model may lack domain-specific terminology or formatting. ✅ Improve Accuracy – Fine-tuning aligns the model with task-specific datasets. ✅ Reduce Hallucinations – Ensures the model generates factually correct responses in a specific domain.

Answer 23

LLMs can be optimized using efficient training, model architecture improvements, and inference optimizations to reduce computational costs while maintaining performance. Optimization can involve adjusting hyperparameters, using better training data, and employing techniques like distillation. Large Language Models (LLMs) can be optimized using techniques that reduce memory usage, improve inference speed, and enhance training efficiency while maintaining performance. ✅ Optimization Techniques → Quantization, Pruning, LoRA, Distillation, Efficient Transformers. ✅ Hardware Optimization → GPU parallelization, TPUs, FlashAttention. ✅ Data Efficiency → Fine-tuning, Retrieval-Augmented Generation (RAG), Curriculum Learning. 💡 Example: GPT models use LoRA and Quantization to reduce computational costs.

Answer 24

Prompt engineering is about designing effective prompts to get the best response from an AI model. Prompt Engineering is the practice of designing effective inputs (prompts) to guide Large Language Models (LLMs) like GPT, BERT, or Claude to produce accurate and useful responses. It optimizes the model's output without changing its internal parameters. Prompt Engineering involves designing input prompts to elicit desired responses from language models.

Answer 25

Parameter-Efficient Fine-Tuning (PEFT) is a technique used to adapt Large Language Models (LLMs) to new tasks while modifying only a small subset of model parameters instead of fine-tuning the entire model. This drastically reduces computational cost, memory usage, and training time compared to full fine-tuning. Parameter-Efficient Fine-Tuning focuses on adjusting a small subset of model parameters to reduce computational costs.

Answer 26

When fine-tuning Large Language Models (LLMs), several key challenges must be addressed to ensure optimal performance and avoid problems like catastrophic forgetting, overfitting, and resource constraints. Catastrophic Forgetting occurs when a fine-tuned model forgets previously learned general knowledge from pretraining as it learns a new task.

Answer 27

RAG combines retrieval of relevant documents with generation, improving the quality and relevance of responses. Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by retrieving relevant external knowledge from a database or search system before generating a response. ✅ Combines: Retrieval – Fetches relevant documents based on the query. Generation – Uses an LLM to generate an informed response using retrieved knowledge.

Answer 28

Effective prompts can guide the model to produce more accurate and contextually relevant responses.

Answer 29

Information moves through a GNN by aggregating features from neighboring nodes, allowing for relational learning. In a Graph Neural Network (GNN), information is passed between nodes using a process called Message Passing or Neighborhood Aggregation. Each node updates its representation by gathering information from its neighbors over multiple iterations (layers).

Answer 30

LLM agents are enhanced language models that incorporate memory and reasoning capabilities, enabling more complex interactions. LLM Agents are AI systems that extend Large Language Models (LLMs) by giving them decision-making, tool usage, and reasoning capabilities. They interact dynamically with their environment, enabling them to complete complex tasks beyond static text generation.

Answer 31

LLM agents can maintain context and recall previous interactions, making them suitable for tasks requiring sequential reasoning. LLM Agents are designed to handle multi-step decision-making and long-term memory storage, making them ideal for tasks that require logical progression and contextual awareness over multiple interactions.

Answer 32

LLM agents can utilize their memory and reasoning capabilities to analyze context and generate informed responses. LLM Agents use multi-step reasoning, retrieval-augmented generation (RAG), external tool integration, and structured workflows to break down, analyze, and answer complex legal or technical questions accurately.

Answer 33

Benefits include improved accuracy, contextual understanding, and the ability to handle complex queries. LLM Agents offer enhanced reasoning, automation, memory, and integration capabilities, making them ideal for complex, real-world tasks beyond basic text generation.

Answer 34

Key components include memory, reasoning modules, and external tools, enhancing the agent's ability to interact and respond. An LLM Agent consists of multiple components that allow it to understand, reason, plan, and interact with its environment. These include a language model core, reasoning module, memory, retrieval tools, and execution mechanisms.

Answer 35

Short-term memory allows for immediate context retention, while long-term memory enables recall of past interactions, improving coherence. Memory in LLM agents allows them to retain information over multiple interactions, improving context understanding, personalization, and reasoning. ✅ Short-term memory → Stores temporary session-based information. ✅ Long-term memory → Retains knowledge across interactions for personalized responses.

Answer 36

Planning is essential for LLM Agents because it allows them to break down complex tasks, execute multi-step reasoning, and improve decision-making instead of generating isolated responses. While techniques like Chain of Thought (CoT) and Tree of Thought (ToT) enhance reasoning. CoT - Guides the model to explain step-by-step reasoning before giving a final answer. ToT - Expands multiple solution paths like a tree, evaluates them, and selects the best one.

Answer 37

Tools allow LLM agents to access external data and perform actions, expanding their capabilities beyond text generation. Tools in LLM Agents are external functions, APIs, or systems that allow the agent to interact with the real world. They extend the model’s capabilities beyond text generation by enabling retrieval, computation, execution, and automation.

Answer 38

Masked Language Modeling (MLM) is a self-supervised learning approach used in NLP models like BERT to pre-train on large text datasets. It involves randomly masking some words in a sentence and training the model to predict the missing words based on the surrounding context.

exam Flashcards

question for the exam (62 cards)