Supervised Machine Learning Flashcards

(28 cards)

1
Q

Ordinary Least Squares (OLS)

A

A regression technique that minimizes the sum of squared differences between actual and predicted values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Objective of OLS

A

Minimize the cost function .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Linear Regression Equation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Independent Variable

A

Variables (features) used to predict the target variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Dependent Variable

A

The target variable or output being predicted (e.g., RH - Relative Humidity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Dataset Used

A

Air Quality UCI dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Cleaning Steps

A

Drop NaNs, fix commas in floats, remove columns with >10% missing values, replace -200 with median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Date Parsing

A

Convert ‘Date’ column into datetime objects and extract Year, Month, Day, and Day Name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pandas .groupby()

A

Used to group data by ‘Month’ for 2004 and 2005 separately.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sklearn LinearRegression

A

Used to fit a linear model to predict Relative Humidity (RH).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Train-Test Split

A

Performed with 33% test size using train_test_split().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regression Coefficients

A

Values assigned to features representing their contribution to prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Model Evaluation Metrics

A

R², MSE, RMSE, MAE (from sklearn.metrics).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Statsmodels OLS

A

Alternative to sklearn to get detailed regression summary including p-values, standard errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Interpreting Coefficients

A

Shows how each feature impacts the target RH, e.g., AH has a large positive coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Heatmap (correlation matrix)

A

Used to visualize correlations between features (with seaborn.heatmap).

17
Q

OLS Summary Output

A

Displays R², F-statistic, coefficients, confidence intervals, etc.

18
Q

Matrix Equation for OLS

A

, where X = feature matrix, B = coefficient vector, E = target vector.

19
Q

Matrix Dimensions

A

Product valid only if cols of first = rows of second; resulting matrix has rows of first, cols of second.

20
Q

Matrix Multiplication (NumPy)

A

Done using np.matmul() or @ operator.

21
Q

Identity Matrix (Mentioned)

A

Matrix with 1s on the diagonal, used to solve systems of linear equations (to be covered in next lesson).

22
Q

Creating Matrices (NumPy)

A

Use np.matrix() or nested lists, e.g., np.matrix(“8 7;3 -5”).

23
Q

Invalid Multiplication

A

Multiplying matrices with mismatched dimensions leads to ValueError.

24
Q

Feature Columns (Air Data)

A

PT08.S1(CO), C6H6(GT), NMHC(GT), NOx(GT), NO2(GT), O3(GT), T, RH, AH, Year, Month, Day

25
Feature Selection for Model
All features except RH used as predictors.
26
Reason for RH prediction
Relative Humidity chosen as target to be predicted via multiple regression.
27
Assumption in OLS
Errors (ε) are normally distributed with mean 0 and constant variance.
28
Multicollinearity Note
Condition number in statsmodels output hints at possible multicollinearity.