Quiz 2 Flashcards

(35 cards)

1
Q

What is Cluster Analysis?

A

Segments observations into similar groups based on observed variables.

Used in market segmentation and identifying outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What type of machine learning does Cluster Analysis fall under?

A

Unsupervised Machine Learning

There is no dependent variable to predict.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some uses of Cluster Analysis?

A
  • Market segmentation
  • Identifying outliers

Applications include fraud detection and anomalies in data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Hierarchical Clustering?

A

A clustering method that starts with each observation in its own cluster and merges similar clusters iteratively.

Forms a dendrogram.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the methods for measuring similarity in Hierarchical Clustering?

A
  • Single Linkage
  • Complete Linkage
  • Group Average Linkage
  • Median Linkage
  • Centroid Linkage

Each method uses different approaches to measure cluster similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is K-Means Clustering?

A

A clustering method that requires predefining k clusters and iteratively assigns observations to these clusters.

It includes initialization, update, and assignment steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between Hierarchical Clustering and K-Means Clustering?

A
  • Hierarchical: Better for small datasets (≤500 obs.), forms a dendrogram
  • K-Means: Better for large datasets (>500 obs.), predefined number of clusters

K-Means creates distinct clusters while Hierarchical captures nested clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Euclidean Distance?

A

Measures straight-line distance between points.

It is affected by scale differences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the solution to the scale differences issue in Euclidean Distance?

A

Use z-scores for standardization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Manhattan Distance?

A

Measures grid-based distance, like navigating city blocks.

More robust to outliers compared to Euclidean Distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are Categorical Data Similarity Measures?

A
  • Matching Coefficient: Counts total matches between two binary variables
  • Jaccard’s Coefficient: Ignores matching zero entries

Jaccard’s is more effective for categorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What factors influence the choice between Hierarchical and K-Means Clustering?

A

Dataset size, cluster relationships, and computational resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Probability?

A

A numerical measure of the likelihood of an event occurring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a Random Experiment?

A

A process that generates uncertain outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define Sample Space.

A

The set of all possible outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a Random Variable?

A

a variable whose value is unknown or a function that assigns values to each of an experiment’s outcomeA numerical representation of an experiment’s outcome.

17
Q

What are the types of Random Variables?

A
  • Discrete Random Variable
  • Continuous Random Variable

Discrete takes specific values; continuous can take any value in an interval.

18
Q

What does a Discrete Probability Distribution describe?

A

Range and likelihood of values for a discrete random variable.

19
Q

What is the formula for Expected Value?

A

Central tendency of a probability distribution.

20
Q

What is Variance?

A

Measures how spread out values are.

21
Q

What characterizes a Discrete Uniform Distribution?

A

All values in the sample space are equally likely.

22
Q

What is the Binomial Distribution used for?

A

Models repeated independent trials with two outcomes.

23
Q

What does the Poisson Distribution model?

A

The number of occurrences in a fixed interval.

24
Q

What distinguishes Continuous Probability Distributions from Discrete?

A

Continuous distributions use Probability Density Functions (PDFs).

25
What is the Normal Distribution?
A bell-shaped curve defined by mean (μ) and standard deviation (σ).
26
What is the empirical rule for Normal Distribution?
* 68% of data falls within 1 standard deviation * 95% within 2 standard deviations * 99.7% within 3 standard deviations.
27
What are some applications of Probability Distributions?
* Market Analysis * Finance * Operations Management * Medical Research ## Footnote Used for predicting demand fluctuations, modeling stock price movements, etc.
28
What is the Excel function for Binomial Distribution?
BINOM.DIST(x, n, p, cumulative).
29
What is the Excel function for Poisson Distribution?
POISSON.DIST(x, lambda, cumulative).
30
What is the Excel function for Normal Distribution?
NORM.DIST(x, mean, std_dev, cumulative).
31
What is an example use of a Discrete Uniform Distribution?
Rolling a fair die.
32
What is an example use of a Binomial Distribution?
Success/failure in repeated trials (e.g., pass/fail test results).
33
What is an example use of a Poisson Distribution?
Number of calls per hour.
34
What is an example use of a Uniform Distribution?
Randomly generated wait times.
35
What is an example use of a Normal Distribution?
Heights, test scores, stock returns.