General Flashcards
(11 cards)
OpenSearch Service
- vector database.
- store and retrieve vectors as high-dimensional points.
- include capabilities for efficient and fast lookup of nearest neighbors in the N-dimensional space.
- suitable to store information for RAG use
K-means clustering
is a popular unsupervised machine learning algorithm used for partitioning a dataset into a pre-defined number of clusters
Pre-training bias metrics
- Class Imbalance (CI)
- Label Imbalance (DPL)
- Kullback-Leibler Divergence (KL)
- Jensen-Shannon Divergence (JS)
- Lp-norm (LP)
- Total Variation Distance (TVD)
- Kolmogorov-Smirnov (KS)
- Conditional Demographic Disparity (CDD)
https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-data-bia
Post-training bias metrics
- Difference in Positive Proportions in Predicted Labels (DPPL)
- Disparate Impact (DI)
- Difference in Conditional Acceptance (DCAcc)
- Difference in Conditional Rejection (DCR)
- Specificity difference (SD)
- Recall Difference (RD)
- Difference in Acceptance Rates (DAR)
- Difference in Rejection Rates (DRR)
- Accuracy Difference (AD)
- Treatment Equality (TE)
- Conditional Demographic Disparity in Predicted Labels (CDDPL)
- Counterfactual Fliptest (FT)
- Generalized entropy (GE)
https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-post-tra
Partial dependence plots (PDP)
show the dependence of the predicted target response on a set of input features of interest.
Shapley values
- determine the contribution that each feature made to model predictions.
- method (solution concept) for fairly distributing the total gains or costs among a group of players who have collaborated.
The difference in proportions of labels (DPL)
compares the proportion of observed outcomes with positive labels for facet d with the proportion of observed outcomes with positive labels of facet a in a training dataset
Weight
Multiplies the input value, controlling its influence on the output.
Bias
Adds a constant term, allowing the model to fit the data better by shifting the activation function.
Text embeddings
represent meaningful vector representations of unstructured text such as documents, paragraphs, and sentences. You input a body of text and the output is a (1 x n) vector. You can use embedding vectors for a wide variety of applications.
Amazon Fraud Detector
is a fully managed service that you can use to detect fraudulent activities. Examples of fraudulent activities include fraudulent transactions or the creation of fake accounts.