IFN580 Week 2: Data to Features (7%) Flashcards
(14 cards)
What are the two main types of variables in machine learning?
Categorical and numerical
What are the subtypes of categorical variables?
Binary, nominal, ordinal
What are the subtypes of numerical variables?
Discrete, continuous
What is a binary variable?
Two categories; yes/no, 0/1, true/false
What is a nominal variable?
Unordered categories: states, names of things (e.g., colour)
What is an ordinal variable?
Ordered categories: low/medium/high, small/medium/large
What is a discrete variable?
Variables that are whole numbers, countable (e.g., test score, postal code)
What is a continuous variable?
Variables that are “whole numbers” (e.g., temperature, weight, height)
What is data pre-processing?
Techniques used to clean data, in order to improve its quality
What is noise in data?
Incorrect data, indicated by outliers
What is missing data?
Data that is not present or that has been deleted during noise detection
What methods can be used to correct inconsistent data?
Binning, clustering, rregression
What is feature engineering?
The process of creating or modifying features to improve model performance
Some examples of feature engineering include?
One-hot encoding, scaling