ACO 423 Flashcards

(48 cards)

1
Q

Which of the following are examples of programming environments for Python?

A

VSCode and Jupyter Notebook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following is a Python package specified for plotting data?

A

matplotlib

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

According to the syllabus, what percentage is attendance and class participation?

A

0.15

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following shows an example of a list in Python?

A

x = [3, 1, 5, 2, 4]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following shows an example of a tuple in Python?

A

x = (3, 1, 5, 2, 4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The pandas module can do all of the following except

A

plot the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Order matters for positional arguments but not for keyword arguments

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If v1 = df.col1 and v2 = df[df.col1 > 22], what are their types?

A

v1 is Series and v2 is DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In which machine learning algorithms are labels provided to the algorithm?

A

Supervised algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Clustering is an example of which type of algorithm?

A

Unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the attribute type for things like the brand of a product and Zip code?

A

Nominal (Categorical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which of the following plots are used for visualizing one variable (univariate)?

A

Histogram and Box plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the Inter-quartile Range (IQR) calculated?

A

Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The height of the box in a Box plot is equal to IQR

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which plot shows a positive correlation between two variables x and y?

A

Plot c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A data matrix has n rows for n data points and p columns for p attributes

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

If dissimilarity is (p - m) / p, what is the formula for similarity?

A

m / p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Minkowski distance is a generalization of which distances?

A

Euclidean and Manhattan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A correlation matrix shows the correlation coefficient magnitude for each attribute pair

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the range of values in a correlation matrix cell?

21
Q

A bubble chart is a variation of a scatterplot that uses a third variable to determine size

22
Q

In a tag cloud, the importance of a tag is represented by

A

font size or color

23
Q

What are methods for handling missing data in a dataset automatically?

A

Global constant, attribute mean, class mean, or most probable value

24
Q

A high correlation between attributes A and B may suggest redundancy

25
What does sampling without replacement mean?
Once an object is selected, it is removed from the population
26
What is stratified sampling?
Partition the data and sample proportionally from each partition
27
What is lossless data reduction?
Compression where the original data can be reconstructed without information loss
28
What is the correct definition of normalization?
Scaling data to a specified range such as -1.0 to 1.0
29
What does equal-depth frequency binning do?
Divides the range into N intervals with equal number of samples
30
What kind of learning uncovers patterns in unlabeled data?
Unsupervised learning
31
In supervised learning, what is used to label the data?
Class labels
32
What are the two main types of supervised learning?
Classification and regression
33
What type of learning is fraud detection?
Classification
34
KNN is used for what kind of problems?
Classification
35
Which sets are used to evaluate a model’s performance?
Training set and test set
36
Logistic regression is used for classification problems
True
37
What curve is used to visualize the trade-off between TPR and FPR?
ROC
38
Why is it called unsupervised learning?
Because it finds patterns without using labeled data
39
In KMeans, each cluster is represented by what?
The centroid
40
What defines a good clustering?
Tight clusters with closely bunched samples
41
What does inertia measure in clustering?
How far samples are from their cluster centroids
42
Principal components in PCA are directions of maximum variance
True
43
What are the goals of PCA?
Visualizing data, data reduction, and feature extraction
44
What is the first step of PCA called and what does it do?
De-correlation, aligns data and centers to mean zero
45
What is intrinsic dimension in PCA?
Number of PCA features with significant variance
46
What does the first principal component represent?
The direction in which the data varies the most
47
In a word frequency array, what does each column represent and what is its value?
Each column is a word and the value is its frequency
48
What is a csr_matrix and why is it used?
Sparse matrix that stores only non-zero values to save space