Test 2 Flashcards
(136 cards)
The goal of PCA is to find a high-dimensional representation of the data that maintains as much information as possible.
F
Machine learning algorithms are readily available in our tools, such as Altair AI Studio, Azure ML Studio, Python libraries, etc. As a result, it is no longer important to understand the mathematical principles and assumptions behind the algorithms.
F
Linear algebra, calculus, optimization, and probability theory are the four mathematical fields mentioned in the textbook.
T
A derivative is a measure of the sensitivity of a function to changes in the function’s input(s).
T
Vectors and matrices are the building blocks in linear algebra.
T
The transpose of a matrix is the matrix with its rows and columns inverted.
T
Master data management does not require data governance.
F
Data classification is a way to define the various levels of confidentiality/security required by the organization.
T
Symmetric encryption is a good way for a bank to authenticate your account credentials.
F
Information security involves protecting data from unauthorized access.
T
Master data is the contextual data about the organization/entity used to increase the informativeness of transaction data.
T
Modeling in data science involves creating representations of real-world phenomena.
T
In information security, defense in depth is the concept that an organizations uses multiple layers of security to protect sensitive/valuable data assets
T
Multiple linear regression improves the power of your analysis by quantifying the cumulative effect of all features.
T
While doing our bivariate analysis, we found that the following attributes had the following r-square values with respect to the label attribute:
age: 0.29
education: 0.14
experience: 0.16
Given this information we should expect a regression model that includes these same three attributes against the same label will have an R-square value of 0.59.
F
R2 represents how well the regression model explains the variance in the label value.
T
Multiple linear regression (MLR) is but one of many algorithms used for multivariate modeling.
T
Scaling techniques such as LogNormal, MinMax, Z-score, Tanh, and Logistic are used to adjust the values of numeric variables.
T
K-means clustering is more robust to outliers than k-medoids clustering
F
Cluster analysis is a form of supervised learning.
F
Lower values of the Calinski-Harabasz criterion indicate a better clustering solution.
F
K-medoids clustering identifies an actual data point for each cluster that is most centrally located.
T
Clustering requires us to specify a label attribute.
F
A 2-nearest neighbor model is more likely to overfit than a 20-nearest neighbor model.
T