Definitions Flashcards
(62 cards)
Machine learning
Field of study that gives computer the ability to learn without being explicitly programmed
Supervised Learning
A type of machine learning where the model learns from labeled data — each input has a known correct output — to make predictions on new data.
Unsupervised Learning
A type of machine learning where the model learns from unlabeled data, finding hidden patterns or groupings without knowing the correct output in advance.
Semi-Supervised Learning
A type of machine learning that uses a small amount of labeled data and a large amount of unlabeled data to improve learning accuracy.
Rainforcement learning
An agent learns to make decisions by interacting with an environment and receiving feedback through rewards or penalties
Classification
A machine learning task where the model learns from labeled data to assign new inputs to predefined categories.
Regression
A machine learning task where the model learns from labeled data to predict a continuous numerical value for new inputs.
Clustering
The model detected many inputs and grouped them where similar inputs are placed together based on patterns in the data
Anomaly detection
Finding irregular data that doesn’t match regular patterns.
Association Rule Learning
Finding patterns or rules that show how items are related to each other in data.
Batch Learning
A learning method where the model is trained on the entire dataset at once, usually offline, and updated only when retrained with new data.
Mini-Batch (Online) Learning
A method where the model is trained on small batches of data, allowing it to learn and update continuously as new data arrives.
Dataset
A collection of data is treated as a single unit by a computer.
- Oxford dictionary.
A collection of data used for some specific machine learning purpose.
- the encyclopedia of machine learning (kakas, 2010).
Labels
The output variables (the target value) the model aims to predict
Features
The input variables used by the model to make predictions
Training data
The dataset used to train a machine learning model, containing both features and labels.
Testing data
The dataset used to evaluate the performance of a trained machine learning model.
Validation data
A dataset used to tune the model’s parameters and prevent overfitting during training.
CSV Format:
A file format used to store data in a tabular form, where each row represents an instance and each column is a feature.
Textual data
Data represented in the form of text, including words, sentences, or documents. It needs to be converted into numbers when working with machine learning models.
Numerical data
Data that consists of numbers, used for mathematical calculations and statistical analysis.
Bivariate datasets
Datasets with two variables, used to analyze the relationship between them.
Multivariate datasets
Datasets with more than two variables, used to analyze relationships among multiple variables.
Correlation Datasets
Datasets used to measure the relationship between two or more variables, where the variables must be numerical to calculate correlation.