Revise Flashcards

(140 cards)

1
Q

What is Artificial Intelligence (AI)?

A

Computer systems that perform tasks typically requiring human intelligence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Machine Learning (ML)?

A

Branch of AI that operates through data patterns and training, rather than explicitly coded instructions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Deep Learning?

A

ML model inspired by the human brain, using layers of neural networks to solve complex problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Generative AI?

A

A subset of deep learning focused on creating new content (such as text, images, or music) from learned data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Large Language Models (LLMs)?

A

A type of Generative AI focused on understanding and generating human-like text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between Gen AI and Traditional AI?

A

AI’s goal is to interpret, analyze, and respond to human actions, whereas Gen AI focuses on creating new content or data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Natural Language Processing (NLP)?

A

A machine technique that can understand the context of a corpus (body of related text).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Regression?

A

Regression predicts a continuous numerical value.

Example: What is the temperature going to be tomorrow?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Classification?

A

Classification predicts categorical outcomes, such as will it be cold or hot tomorrow?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Clustering?

A

Clustering groups similar data points together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Supervised Learning?

A

Train model with pre-labelled data so the machine can learn from these results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Unsupervised Learning?

A

Train model with unlabelled data to discover patterns and apply its own labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Reinforcement Learning?

A

Model learns through trial and error, receiving feedback in the form of rewards or penalties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Semi-supervised Learning?

A

Training data contains very few labelled examples and a large number of unlabelled examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a Neural Network (NN)?

A

Describes algorithms mimicking the brain, with data inputted into neurons and passed through layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a Perceptron?

A

Algorithm for supervised learning of binary classifiers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the types of data?

A
  1. Structured data - saved as rows. 2. Semi-structured data - key value pairs. 3. Unstructured data - cannot be stored in a table format. 4. Time-series data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the components of a Machine Learning Model?

A
  1. Algorithm: Describes the relationship between input and output. 2. Inference Model: Software that implements the model. 3. Model Artifacts: Consists of trained parameters and metadata.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Inference?

A

The process of using a trained ML model to make predictions on new, unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are Inference Parameters?

A

Parameters like response length and stop sequences control the output generated by a model during inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the types of Inference?

A
  1. Real-time inference: model deployed on a persistent endpoint. 2. Batch transform inference: suitable for offline processing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Overfitting?

A

When a model performs better on training data than on real data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Underfitting?

A

When a model cannot determine meaningful relationships between input and output data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Bias and Fairness in ML?

A

Lack of diversity in training data leading to biased predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is a Foundation Model (FM)?
A general-purpose model trained on vast amounts of data, which can be fine-tuned for specific tasks.
26
What is a Large Language Model (LLM)?
A foundation model that implements the transformer architecture.
27
What is Transformer Architecture?
Effective at NLP due to multi-head attention and positional encoding.
28
What is Tokenization?
Converts text into a sequence of tokens that a model can process.
29
What are Embeddings?
Specialized vectors that represent semantic meaning and relationships.
30
What is Fine Tuning?
Retraining a pretrained model's weights on a smaller dataset.
31
What is the Machine Learning (ML) Pipeline?
A systematic process used to build, train, and deploy machine learning models.
32
What is the first step in the ML Pipeline?
Identify Business Goal: Define success criteria and align stakeholders.
33
What is the second step in the ML Pipeline?
Frame the ML Problem: Define inputs, outputs, and metrics.
34
What is the third step in the ML Pipeline?
Collect Data: Prepare the necessary data for training the model.
35
What is the fourth step in the ML Pipeline?
Pre-Process Data: Clean and prepare the data for training.
36
What is the fifth step in the ML Pipeline?
Engineer Features: Select and engineer features that enhance model performance.
37
What is the sixth step in the ML Pipeline?
Train, Tune, and Evaluate the Model: Train the model and evaluate performance.
38
What are Training Parameters (Hyperparameters)?
Parameters learned during the model training process, updated iteratively.
39
What are Inference Parameters?
Parameters used when deploying a trained model to make predictions.
40
What are parameters used when deploying a trained model?
Parameters are fixed values used to make predictions with a trained model.
41
What does Temperature control in text generation?
Temperature controls the randomness of the generated text.
42
What is Top K in text generation?
Top K selects only the top k most likely tokens for output.
43
What is Top P in text generation?
Top P uses cumulative probability to choose tokens, focusing on the smallest set of tokens with a combined probability of P.
44
When should you use Top-P?
Use Top-P when you want adaptive diversity but want to stay closer to more likely outcomes.
45
When should you use Temperature?
Use Temperature when you need consistent randomness control across the board.
46
What is Beam search?
Beam search explores multiple candidate sequences, keeping track of several promising routes at each intersection.
47
What is Greedy search?
Greedy search selects the most likely token at each step.
48
What does Response length specify?
Response length specifies the maximum length of generated output.
49
What are Penalties in text generation?
Penalties apply to repeated tokens or sequences to encourage variety in the generated text.
50
What are Stop sequences?
Stop sequences define specific sequences where the model will stop generating text.
51
What is MSE (Mean Squared Error)?
MSE is the average of squared differences between predictions and actual values. Lower is better.
52
What is RMSE (Root Mean Squared Error)?
RMSE is the square root of MSE, providing error measurement in the same units as the original data. Lower is better.
53
What does Perplexity measure?
Perplexity measures how well a language model predicts sequences of words/tokens. Lower values indicate better prediction capability.
54
What is Precision in classification metrics?
Precision is the ratio of true positives to all positive predictions. Higher is better.
55
What is Recall (TPR)?
Recall is the ratio of true positives to all actual positives. Higher is better.
56
What is False Positive Rate (FPR)?
FPR is the ratio of false positives to all actual negatives. Lower is better.
57
What is Specificity (TNR)?
Specificity is the ratio of true negatives to all actual negatives. Higher is better.
58
What is Accuracy in classification metrics?
Accuracy is the ratio of all correct predictions to total predictions. Higher is better.
59
What is F1 Score?
F1 Score is the harmonic mean of precision and recall, balancing both metrics. Higher is better.
60
What does the ROC Curve plot?
The ROC Curve plots TPR against FPR at various thresholds. Higher AUC is better.
61
What are SageMaker Training Jobs?
SageMaker Training Jobs manage training processes, specifying training data, hyperparameters, and compute resources.
62
What are SageMaker Experiments?
SageMaker Experiments track model runs and hyperparameter tuning.
63
What is Automatic Model Tuning (AMT)?
AMT automatically tunes hyperparameters using the specified metric.
64
What is Real-Time Inference in SageMaker?
Real-Time Inference is for low-latency, sustained traffic predictions with auto-scaling capabilities.
65
What is Batch Transform in SageMaker?
Batch Transform processes large batches of data asynchronously.
66
What is Asynchronous Inference?
Asynchronous Inference handles long-running inference requests with large payloads without immediate responses.
67
What is Serverless Inference?
Serverless Inference is for intermittent traffic, where the model scales automatically without infrastructure management.
68
What is On-Demand Inference in Bedrock?
On-Demand Inference is pay-per-use based on the number of input/output tokens.
69
What is Provisioned Throughput in Bedrock?
Provisioned Throughput provides guaranteed capacity for consistent, high-throughput inference.
70
What are Bedrock Agents?
Bedrock Agents deploy agents for multi-step workflows, integrating models with tools like Amazon Kendra and AWS Lambda.
71
What is AWS API Gateway?
AWS API Gateway exposes the model as an API endpoint for integration with applications.
72
What is Data Drift?
Data Drift occurs when the input data changes, but the relationship between inputs and outputs remains the same.
73
What is Concept Drift?
Concept Drift occurs when the relationship between inputs and outputs changes, meaning the model's learned patterns no longer apply.
74
What is SageMaker Model Monitor?
SageMaker Model Monitor schedules and monitors data drift, sending results to CloudWatch.
75
What is MLOps?
MLOps involves DevOps practices to manage machine learning models throughout their lifecycle.
76
What are SageMaker Pipelines?
SageMaker Pipelines automate and manage the ML workflow end-to-end.
77
What is AWS CodePipeline?
AWS CodePipeline automates the build, test, and deploy phases for models.
78
What is SageMaker Model Registry?
SageMaker Model Registry manages and tracks model versions and metadata.
79
What is Amazon S3 used for?
Amazon S3 is used to store trained model artifacts after training.
80
What is Model Governance?
Model Governance ensures transparency, accountability, and regulatory compliance for ML models.
81
What is SageMaker Clarify?
SageMaker Clarify helps identify and mitigate biases in ML models.
82
What are SageMaker Model Cards?
SageMaker Model Cards create documentation for trained models, including performance metrics and intended use.
83
What is ML Governance from SageMaker?
ML Governance from SageMaker provides tools for tighter control and visibility over ML models.
84
What is SageMaker ML Lineage Tracking?
SageMaker ML Lineage Tracking captures the entire workflow, tracking model lineage for reproducibility.
85
What is Glue DataBrew?
Glue DataBrew simplifies data governance with visual data preparation and quality rules.
86
What is AWS Audit Manager?
AWS Audit Manager automates the auditing of AWS services for continuous compliance.
87
What is AWS Artifact?
AWS Artifact provides on-demand access to compliance reports and agreements.
88
What is AWS Trusted Advisor?
AWS Trusted Advisor provides recommendations for cost and performance improvements.
89
What is SageMaker Managed Spot Training?
SageMaker Managed Spot Training reduces training costs by utilizing spare AWS EC2 capacity.
90
What is SageMaker Profiler?
SageMaker Profiler identifies inefficient resource use during model training.
91
What is Amazon Inspector?
Amazon Inspector automates security assessments of ML applications.
92
What is Continual Learning?
Continual Learning involves continuously retraining models to account for new data and changing conditions.
93
What is Continued-Pretraining?
Continued-Pretraining uses unlabeled data to expand the model's overall knowledge.
94
What is Transfer Learning?
Transfer Learning involves fine-tuning an existing model for a new problem.
95
What is the Least Privilege Principle?
The Least Privilege Principle ensures IAM roles grant only necessary permissions.
96
What are PrivateLink and VPC Endpoints?
PrivateLink and VPC Endpoints lock down SageMaker to prevent exposure to the internet.
97
What is Encryption at Rest and in Transit?
SageMaker encrypts data at rest and in transit using KMS.
98
What are IAM Roles and Policies?
IAM Roles and Policies manage secure access to model data and resources.
99
What is S3 Block Public Access?
S3 Block Public Access prevents model data from being exposed.
100
What is AWS IAM Identity Center?
AWS IAM Identity Center centralizes identity management across AWS accounts.
101
What is AWS Config?
AWS Config continuously monitors and records configuration changes across AWS resources.
102
What is AWS CloudTrail?
AWS CloudTrail logs API calls and tracks user activity for auditing.
103
What is Amazon SageMaker?
Amazon SageMaker is an integrated machine learning service for building, training, and deploying models.
104
What is the typical SageMaker training process?
The typical SageMaker training process involves Data, Instances, Training Images, Configuration, and Home Output Bucket.
105
What is Ground Truth?
Ground Truth is a human-powered data labeling service.
106
What is Data Wrangler?
Data Wrangler is a tool for easy data cleaning and transformation.
107
What is Feature Store?
Feature Store is a central repository for ML features.
108
What is SageMaker Studio?
SageMaker Studio is a web-based ML IDE.
109
What is SageMaker JumpStart?
SageMaker JumpStart provides pre-trained models and solutions.
110
What is SageMaker Canvas?
SageMaker Canvas is a no-code visual model building tool.
111
What is SageMaker Autopilot?
SageMaker Autopilot automates model building and tuning.
112
What is MLflow?
MLflow is a tool to track and compare experiments.
113
What is A2I (Augmented AI)?
A2I provides human review for quality assurance.
114
What is Amazon Q?
Amazon Q is a generative AI-powered assistant for tasks like answering questions and generating content.
115
What is Amazon Q Business?
Amazon Q Business helps with tasks by accessing enterprise data sources.
116
What is Amazon Q Developer?
Amazon Q Developer includes features like code generation and security scanning.
117
What is Amazon Q in QuickSight?
Amazon Q in QuickSight allows natural language querying of business intelligence data.
118
What is Amazon Q in Connect?
Amazon Q in Connect improves customer service by automating responses.
119
What is Amazon Q in AWS Supply Chain?
Amazon Q in AWS Supply Chain optimizes supply chain management.
120
What is Amazon Bedrock?
Amazon Bedrock is a fully managed, serverless service providing access to high-performing foundation models.
121
What is Amazon Q in Connect?
Amazon Q helps improve customer service by answering customer inquiries, automating responses, and managing tickets using natural language AI.
122
What is Amazon Q in AWS Supply Chain?
Amazon Q assists in optimizing and automating supply chain management by generating insights from supply chain data, streamlining inventory management, and forecasting demand.
123
What is Amazon Bedrock?
Amazon Bedrock is a fully managed, serverless service that provides access to high-performing foundation models (FMs) from leading AI companies through a single API.
124
What do you need to use Amazon Bedrock?
1. Prompt: Specific set of input to guide LLMs to generate an appropriate output or completion 2. Inference parameters: Temperature, Top K, Top P, Response length, Stop sequences.
125
What are the features of Amazon Bedrock?
- Model Catalog: AI model library for browsing and selecting foundation models. - Custom Models: Customize foundation models with your data. - Foundation Model Evaluation: Compare models side-by-side. - Playgrounds: Experiment with deployed models APIs. - Bedrock Knowledge Bases: Fetch data from private sources. - Bedrock Agents: Create agents for complex tasks. - Serverless: Simplifies deployment and scaling. - Security and Privacy Guardrails: Ensure compliance with policies. - PartyRock: Build generative AI apps without coding.
126
What is AWS Glue?
AWS Glue is a fully managed, cloud-optimized ETL (Extract, Transform, Load) service that helps prepare and load data for analytics and AI models.
127
What are the features of AWS Glue?
- AWS Glue ETL Service: Cloud-based ETL service for data preparation. - AWS Glue Data Catalog: Centralized repository for managing ETL jobs. - AWS Glue Databrew: Visual tool for data preparation. - AWS Glue Data Quality: Detects anomalies and recommends data quality rules.
128
What are the design considerations for applications using foundation models?
1. Cost: Training vs. using pre-trained models. 2. Latency: Inference times of foundation models. 3. Modalities: Combining multiple models for different input types. 4. Architecture: Model size and complexity alignment. 5. Complexity: More parameters require more resources. 6. Performance and metrics: Select appropriate evaluation metrics. 7. Bias and fairness: Evaluate model outputs for demographics. 8. Availability and Compatibility: Verify model availability in regions. 9. Security and Privacy: Implement data handling procedures. 10. Scalability: Design for varying loads.
129
What is Retrieval Augmented Generation (RAG)?
RAG combines retrieval systems with generative AI models by retrieving relevant information, augmenting the prompt, and generating a response.
130
What are the benefits of RAG?
- Accuracy: Responses grounded in specific data. - Freshness: Access to up-to-date information. - Reduced hallucinations: Less likely to generate incorrect information. - Transparency: Citations for information sources. - Cost efficiency: More practical than fine-tuning models.
131
What are effective prompt engineering techniques?
- Zero-Shot Prompt: No examples provided. - One-Shot Prompt: One example provided. - Few-Shot Prompt: A few examples provided. - Negative Prompting: Standardized format for prompts. - Prompt Template: Removes unwanted aspects in image generation. - Chain-of-Thought Prompting: Encourages step-by-step reasoning. - Prompt Tuning: Adjusting prompts for better performance.
132
What is latent space?
A compressed, continuous numerical representation where high-dimensional data is encoded into lower-dimensional vectors.
133
What is a context-window?
The maximum amount of tokens an LLM model can process at once, affecting tasks like long-form text generation.
134
What are hallucinations in AI models?
Hallucinations occur when a model generates incorrect information that sounds plausible but is not factual.
135
What are multi-modal models?
Models that work across multiple data types, embedding text, images, or audio into a shared space for richer outputs.
136
What is catastrophic forgetting?
When an AI model learns new tasks but completely forgets old ones.
137
What is continuous pre-training?
The process of providing unlabeled data to pre-train a model on new, domain-specific data.
138
What are methods to evaluate foundation model performances?
Assessing performance on benchmark tasks, evaluating fine-tuning for specific applications, testing resilience, analyzing biases, and understanding model interpretability.
139
What is responsible AI?
Responsible AI is classified by fairness, explainability, robustness, transparency, governance, and privacy/security.
140
What are key activities for security in AI systems?
- Least Privilege Principle: Grant only necessary permissions. - PrivateLink and VPC Endpoints: Secure access to resources. - Encryption: Protect data at rest and in transit. - IAM Roles: Manage secure access to model data. - S3 Block Public Access: Prevent exposure of model data.