General Flashcards

(11 cards)

1
Q

OpenSearch Service

A
  • vector database.
  • store and retrieve vectors as high-dimensional points.
  • include capabilities for efficient and fast lookup of nearest neighbors in the N-dimensional space.
  • suitable to store information for RAG use
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

K-means clustering

A

is a popular unsupervised machine learning algorithm used for partitioning a dataset into a pre-defined number of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pre-training bias metrics

A
  • Class Imbalance (CI)
  • Label Imbalance (DPL)
  • Kullback-Leibler Divergence (KL)
  • Jensen-Shannon Divergence (JS)
  • Lp-norm (LP)
  • Total Variation Distance (TVD)
  • Kolmogorov-Smirnov (KS)
  • Conditional Demographic Disparity (CDD)

https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-data-bia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Post-training bias metrics

A
  • Difference in Positive Proportions in Predicted Labels (DPPL)
  • Disparate Impact (DI)
  • Difference in Conditional Acceptance (DCAcc)
  • Difference in Conditional Rejection (DCR)
  • Specificity difference (SD)
  • Recall Difference (RD)
  • Difference in Acceptance Rates (DAR)
  • Difference in Rejection Rates (DRR)
  • Accuracy Difference (AD)
  • Treatment Equality (TE)
  • Conditional Demographic Disparity in Predicted Labels (CDDPL)
  • Counterfactual Fliptest (FT)
  • Generalized entropy (GE)

https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-post-tra

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Partial dependence plots (PDP)

A

show the dependence of the predicted target response on a set of input features of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Shapley values

A
  • determine the contribution that each feature made to model predictions.
  • method (solution concept) for fairly distributing the total gains or costs among a group of players who have collaborated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The difference in proportions of labels (DPL)

A

compares the proportion of observed outcomes with positive labels for facet d with the proportion of observed outcomes with positive labels of facet a in a training dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Weight

A

Multiplies the input value, controlling its influence on the output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bias

A

Adds a constant term, allowing the model to fit the data better by shifting the activation function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Text embeddings

A

represent meaningful vector representations of unstructured text such as documents, paragraphs, and sentences. You input a body of text and the output is a (1 x n) vector. You can use embedding vectors for a wide variety of applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Amazon Fraud Detector

A

is a fully managed service that you can use to detect fraudulent activities. Examples of fraudulent activities include fraudulent transactions or the creation of fake accounts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly