MFDS 2 endsem Flashcards
(11 cards)
SSE, MSE and MAE
SSE - sum of residuals squared
- lower value = best fit (not size independent)
MSE - mean of squared residuals
- Penalizes large errors more due to squaring
MAE - average of absolute differences between actual and predicted values
- more robust to outliers
- treats all errors equal
What is VIF and its importance?
VIF - it is a diagnostic tool in multiple linear regression used to detect multicolinearity
1 Detects multicolinearity
- coefficient estimates become unstable and unreliable
2 Improves interpretability
- hard to isolate effect of predictor
3 Prevents overfitting
- multicolinearity increases complexity without adding new info
Variable selection techniques
Help determine most relevant predictors for a model
1 Filter
- based on statistical analysis (ANNOVA, Chi-square test)
2 Wrapper
- predictive model to evaluate different subsets of variables (Forward selection, Backward elimination)
3 Embedded
-perform var select as a part of model training (Lasso regression, Tree-based)
4 Dimensionality reduction
- Transform variables to reduced sets (PCA)
What are the types of time series?
1 Based on no of variables
Univariate (date wrt time)
Multivariate (stock values wrt time)
2 Based on behaviour
Stationery (atm temperature wrt time)
Non Stationery (House pricing wrt time)
3 Based on Seasonality
Seasonal (winter clothing)
Trend (fashion, stock rise)
4 Based on type of data collection
Continuous (Breathing)
Discrete (Shopkeeper profit)
Irregular (Earthquake - randsom obs)
What is Stationerity and auto-correlation
Stationery - statistical properties stay the same
Auto-correlation - relation/similarities between time series and lagging components
PACF (partial auto correlation factor)
Provide an example where seasonal decomposition is necessary
Seasonal Decomp - Seasonal decomposition is the process of breaking down a time series into its distinct components.
Need :
Improve forecast accuracy
Increase interpretability
Isolate seasonal effects
Understand structure of the data
Difference between stationery and non stationery time series
Mean
Variance (const, changing-increasing)
Auto-covariance (depends only on lag)
Trend
Seasonality
Forecasting
Example
Optimization and its types
In ML and DS, optimization refers to the process of adjusting model parameters to minimize or maximize an objective function
Convex (LR - minimize convex function)
Gradient Based (Neural Network - derivative of loss function)
Gradient free (Generic Algorithm - when derivative not available)
Constraint (opt under constraints - Lagrange’s Mutlipliers, Quadratic program)
Un-Constraint (No contraints - LR)
Discrete (Takes only integer values of 0 and 1 - Travelling salesman problem)
Continuous (Takes all real values - Neural Network, Gradient Descent)
Gradient descent method
Minimizes certain quadratic function at some point
xnew = xold - af’(xold)
alpha = learning rate
ex. f(x) = x^2+4x+4
5 iterations (0,5.00)
with alpha = 0.1
Contrained Optimization and its techniques
minimize or maximize objective function subject to constraints (equality or inequality)
1 Lagranges multipliers (contraint->unconstraint, for equality constraint)
2 Karush Kuhn Tucker (genrailization of LM, for nonlinear inequality constraint)
3 Penalty method (add penalty for constraint violation, large penalty -)
4 Barrier method (barrier constraints on objective function to prevent solution from infeasible region)
Meta Heuristic Optimization
Algorithms for performing optimization
1 Genetic Algorithm
(population of solutions, ex. feature selection)
2 Particle Swarm opt
(particles fly in solution space, ex. Clustering Problems)
3 Simulated Annealing
(Accepts worse solution, ex. Travelling Salesman Problem)
4 Differential Evolution
(Works with vectors, recombination and mutation, ex. Parameter Optimization)
5 Ant Colony Opt
(build solutions based on pheromone trails, ex. shortest dist on graphs)