Advanced DA (ML) Flashcards

1
Q

What Is Clustering? List the Main Properties of Clustering Algorithms.

A

Clustering is the technique of identifying groups or categories within a dataset and placing data values into those groups, thus creating clusters.

Clustering algorithms have the following properties:

Iterative
Hard or soft
Disjunctive
Flat or hierarchical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What Is Logistic Regression?

A

Logistic regression is a form of predictive analysis that is used in cases where the dependent variable is dichotomous in nature.
Examples: is an e-mail spam or not, is tumor malignant or not…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What Is Linear Regression?

A

Linear regression is a statistical method used to find out how two variables are related to each other. The process used to establish this relationship involves fitting a linear equation to the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain Kmeans Clustering.

A

Analysts use K-means clustering to partition observations into k non-overlapping sub-groups called clusters. It is a popular technique for cluster analysis in data mining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What Do You Mean by Hierarchical Clustering?

A

Hierarchical clustering is a data analysis method that first considers every data point as its own cluster. It then uses the following iterative method to create larger clusters:

Identify the values, which are now clusters themselves, that are the closest to each other.
Merge the two clusters that are most compatible with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Data Warehousing.

A

A data warehouse is a data storage system that collects data from various disparate sources and stores them in a way that makes it easy to produce important business insights. Data warehousing is the process of identifying heterogeneous data sources, sourcing data, cleaning it, and transforming it into a manageable form for storage in a data warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How Do You Differentiate Between Overfitting and Underfitting?

A

Underfitting and overfitting are both modeling errors.

OVERFITTING:
The model trains the data well using the training set.
The performance drops considerably over the test set.

UNDERFITTING:
The model neither trains the data well nor can generalize to new data.
Performs poorly both on the train and the test set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly