1 Flashcards
s (25 cards)
What is the importance of data transformation in the preprocessing pipeline?
Data transformation converts raw data into a format suitable for analysis or machine learning. It helps standardize data, handle outliers, normalize scales, and encode categorical variables, ensuring that algorithms can process the data efficiently and accurately. Without transformation, models may misinterpret features, leading to poor performance or biased results.
What are the challenges of data preprocessing in real-world machine learning projects and how can data quality be ensured?
Real-world data is often messy, containing missing values, inconsistencies, outliers, and irrelevant features. Challenges include handling incomplete or noisy data, integrating data from multiple sources, and ensuring privacy and security.
What is the impact of improper handling of missing data and what are common imputation methods?
Improper handling of missing data can introduce bias, reduce model accuracy, or cause algorithms to fail. Common imputation methods include:
- Mean/Median/Mode Imputation: Simple but may distort distributions
- K-Nearest Neighbors (KNN) Imputation: Uses similarity between samples; more accurate but computationally expensive
- Regression Imputation: Predicts missing values using other features
- Dropping Missing Values: Only suitable if missingness is rare and random
The choice of method affects model performance; inappropriate imputation can lead to misleading patterns or overfitting.
What is the role of feature engineering and what are some examples?
Feature engineering, in data science, refers to manipulation — addition, deletion, combination, mutation — of your data set to improve machine learning model training, leading to better performance and greater accuracy.
data binning.
Compare label encoding and one-hot encoding for categorical data.
Label encoding assigns each category a unique integer (best for ordinal data); one-hot encoding creates a new binary column for each category (best for nominal data). Label encoding is simple and memory-efficient but imposes an ordinal relationship
What is an activation function and what is its purpose in neural networks?
An activation function is the output of neurons given an input or set of inputs; it acts as a switch for artificial neurons. Its purpose is to introduce non-linearity and control neuron activation.
What are the main types of gradient descent and when should each be used?
gradient descent is an optimization algorithm used to minimize a function (normally a cost/loss function in machine learning) by iteratively moving through the direction of the steepest descent.It’s foundational for training models like neural networks and linear regression.
vanilla gradient descent(batch)(review all the notes)
use the entire dataset to compute the descent in each iteration.
pros:stable, precise result; cons: heavy computation, slow, may get stuck in a local minima.
Stochastic Gradient Descent.(student review notes with random flashcards)
use an random start point per iteration to compute the whole dataset.( a chef to taste the soup while cooking so it could adjust immediately)
pros: fast updates, can escape local minima due to noise; cons: noisy convergency may overshoot the global minima.
What is the difference between random search and grid search for hyperparameter tuning?
grid search is exhaustively try every combination of the provided hyper parameter values in order to find the best for this model.
random search is samples from the entirety of the search space.
search space is a space where each dimension represents a hyper parameter and each point represents one model configuration.
How are neural networks trained?
Step 1: Input data is divided into mini-batches and passed through the network (forward pass).
Step 2: The network produces outputs, which are compared to the true labels.
Step 3: A loss function calculates the error between the network’s output and the true label.
Step 4: The “Error” is obtained—this is the difference between the prediction and the target.
step 5: repeat to reduce the error. and optimize the model.
Cross Entropy is the distance between what the model believes the output distribution should be & what the original distribution is.
What are the main differences between artificial neurons and neural networks?
Artificial neuron is the basic unit mathematical model to process the input and produces the output, neural networks are group of AN connected and organized in different layers.
What are the main types of artificial neural network topologies?
Feedforward neural networks connect neurons of one level to the next without backward connections.
Recurrent neural networks have feedback connections and are suitable for sequence data.
What is imputation of missing values and what are the main methods?
1) listwise deletion : delete data with missing values
2). pairwise deletion: delete data when its missing a variable which requires specific analysis.
3)hot-deck imputation: replace the missing values with similar data from the same datasets
4)cold-deck imputation: replace the missing values with the similar data in different datasets.
5)mean substitution: replace the missing values with the mean of the variables for all other cases.
What is the intuition behind neural network training?
Training involves forward pass (input passed through network)
What is the “one versus all” strategy for multiclass classification?
its a strategy for multiclasses, provides a way to use binary classification for a series of yes or no predictions across multiple possible labels.
During training, the model runs through a sequence of binary classifiers, training each to answer a separate classification question.
pear (is it an apple yes or no; is it an orange yes or no; is it a pear yes or no ; is it a grape yes or no)
What is backpropagation and how does it work?
Backpropagation propagates error backward through the network to compute how much each weight contributed to the error.
propagates the error to the input of a ANN. We cannot directly compute the derivative of the loss function with respect to the outputs of the network.
Backpropagation is a local process: neurons are completely unaware of the complete topology of the network.
What is the simplest form of neural network and what does it do?
The perceptron is the simplest neural network—a single artificial neuron that makes yes/no decisions for linearly separable data. Its limitation is that it cannot handle non-linearity.
How can neural networks be trained?
- Forward Propagation:
- Input data is passed through the network’s layers, with each neuron applying weights and an activation function to produce an output.
- Loss Calculation:
- The output is compared to the true target values using a loss function (e.g., mean squared error for regression, cross-entropy for classification).
- Backpropagation:
- The gradient of the loss function with respect to each weight is computed and propagated backward through the network.
- Weight Update:
- The weights are adjusted to reduce the loss, typically using an optimization algorithm like stochastic gradient descent (SGD), Adam, or others.
What is the measure used for calculating error in neural networks?
Cross-entropy loss is commonly used.
Cross-entropy loss quantifies the difference between the predicted probability distribution (your model’s output) and the actual true distribution (the correct labels)
For each data point, it compares the predicted probability for the correct class with the actual label (which is 1 for the true class and 0 for all others).
* The loss is calculated using a logarithmic penalty:
* If the model is confident and correct, the loss is small.
* If the model is confident but wrong, the loss is large.
* If the model is uncertain, the loss is moderate.
What are the two main inputs to the cross-entropy loss measure?
The prediction (output from the neural network) and the actual label (ground truth) are the two main inputs.
How does mini-batch gradient descent address the limitations of stochastic gradient descent?
Mini-batch gradient descent addresses key limitations of Stochastic Gradient Descent (SGD) by balancing computational efficiency with stable convergence.
Mini-batch GD strikes a practical balance—retaining SGD’s speed while mitigating its instability through batch-averaged gradients, making it the default choice for modern deep learning.
What are the limitations of multi-layer perceptrons and how can they be overcome?
A Multi-Layer Perceptron (MLP) is a foundational type of artificial neural network (ANN) consisting of an input layer, one or more hidden layers, and an output layer. Each layer is fully connected, with neurons applying nonlinear activation functions (e.g., ReLU, sigmoid) to process data. MLPs excel at learning complex, non-linear relationships and are widely used for tasks like classification, regression, and pattern recognition
-
Overfitting:
- MLPs with excessive hidden neurons/layers tend to memorize training data, reducing generalization.
- Example: High accuracy on training data but poor performance on test data.
2. Computational Intensity: - Training deep MLPs requires significant computational resources, especially with large datasets.
3. Hyperparameter Sensitivity: - Performance heavily depends on tuning parameters like learning rate, layer size, and activation functions.
- Regularization (L1/L2):
- L1 regularization (Lasso) adds a penalty proportional to the absolute values of the weights, which can help eliminate less important features.
- L2 regularization (Ridge) adds a penalty proportional to the square of the weights, encouraging smaller weights and reducing model complexity.
- Dropout:
- Randomly deactivates a subset of neurons during each training iteration, preventing the network from relying too heavily on specific neurons and thus reducing co-adaptation.
- Early Stopping:
- Monitors validation performance during training and halts the process when performance on the validation set begins to degrade, preventing the model from overfitting to the training data.
- Model Simplification:
- Reduces the complexity of the network by decreasing the number of hidden layers or neurons, thereby limiting the model’s capacity to memorize noise in the training data.
- Data Augmentation:
- Increases the effective size of the training set by generating new, slightly modified instances of the data (especially useful for image or sequence data), forcing the model to learn more robust features.
What are CNNs and RNNs and what are they used for?
CNN: Convolutional Neural Network
-
Meaning:
A Convolutional Neural Network (CNN) is a type of deep learning model designed primarily for processing grid-like data, such as images
RNN: Recurrent Neural Network(used for context matters ml model, like NPL, Speech recognition…)
-
Meaning:
A Recurrent Neural Network (RNN) is a deep learning architecture designed for processing sequential or time-series data[2][4][6]. -
How it works:
- Recurrent connections: Neurons have connections that loop back, allowing the network to maintain a “memory” of previous inputs.
- Hidden state: Stores information from previous steps and combines it with current input.
- Processing: Data is processed one step at a time, with each step influenced by the sequence history.
What is CRISP-DM?
cross-industry standard process for data mining
a non-proprietary, documented, and freely available data mining model.
it could be divided into 6 steps:
- business understanding
- data understanding
- data preparation
- modeling
- evaluation
- deployment
What is data collection?
Data collection is the process of gathering information on targeted variables in an established system.