Filler Flashcards
(44 cards)
Sage maker canvas
No code solution to bring together data preparation model selection and deployment. Uses #DATA wrangler for #DATA preparation. auto pilot for data cleansing and ML model selection.
Jumpstart
Evaluate compare, select foundational models and algorithms. Customizable reference architectures.
If you see a question about scanned PDFs with analyzed embedded images, this service is used
Rekognition
Knowledge cutoff
This is a specific concern of Gen AI
Data collection is imperative for Demand prediction use cases
Instruction dataset fine tuning
prompt response pairs, specific responses and instructions
Domain adaption fine tuning
what you would htink
Underfitting is matched with
High bias
MAPE and MAP are good metrics for these use cases
Monthly revenue
Forecasting
Accuracy and F1 are good metrics for
Classification
Which factors can directly influence the latency of a machine learning model’s inference? (Select TWO.)
Length of the generated output sequence
Length of the input data sequence
Chain of Thought prompting
The primary advantage of Chain-of-thought prompting lies in its ability to produce detailed, sequential explanations, making it an effective tool for scenarios requiring deep reasoning and clear communication
Tree-of-thought is a technique that involves organizing information in a hierarchical structure, just like a decision tree.
Tree of Thought helps visualize relationships and pathways rather than breaking down complex problems into sequential, explainable steps.
Directional-stimulus
involves guiding the model’s responses based on specific cues or directions. This technique can influence the direction or focus of the responses but does not specifically enhance the model’s ability to deliver structured, step-by-step explanations
Binary classification
is a supervised machine learning model specifically designed to distinguish between two distinct categories or classes. This model is widely used in various applications, such as sentiment analysis, fraud detection, and medical diagnosis, where the objective is to classify data points into one of two predefined categories.
Multiclass classification model
This option is only applies when there are more than two categories to predict
Ensemble learning
combines multiple models to improve overall performance and robustness.
Root mean squared error (RMSE)
This metric is typically used for regression models, not classification models.
Recall
this metric measures the proportion of actual positive instances (true positives) correctly identified by the model.
Precision
is incorrect because it is a metric that measures the proportion of correct predicted positive instances. Precision is particularly valuable in scenarios where the cost of false positives is high, such as in spam detection or targeted advertising.
Tokenization vs embeddings
Tokeneization involves breaking down a sequence of text into smaller units called tokens, such as words, subwords, or characters. Embedings is vectors.
Amazon Textract is a fully managed AWS service that uses machine learning to extract written text, handwriting, tables, and other information from scanned documents and photos. It is used to process documents in formats that include PDFs, JPEGs, and PNGs, making it an effective solution for enterprises that manage large amounts of documents. Textract can recognize and extract critical features, including names, dates, amounts, and other structured data from various documents, including contracts, forms, and invoices, making the data machine-readable and suitable for further processing.
Amazon Kendra
This is an intelligent search service designed to help users find information across various data sources. While it can retrieve unstructured data, but it does not focus on transforming or structuring data for analysis.
AWS Glue is a fully managed extract, transform, and load (ETL) service that can categorize, clean, and transform unstructured data, like medical records, into a structured format. It simplifies the process of preparing data for analysis, including healthcare research and predictive analytics, by automating schema discovery and code generation.