GRAHHHHHH Flashcards
(32 cards)
What is Data Mining?
The process of sorting through large data sets to identify patterns and relationships that can help solve business problems.
Involves methods at the intersection of machine learning, statistics, and database systems.
Aims to extract information from a data set and transform it into a comprehensible structure for further use.
Data Analytics vs Data Mining
Data analytics is the process of interpreting data to find trends and patterns.
Data mining is the process of extracting valuable information from a large dataset.
Machine Learning in Data Mining
Helps in identifying patterns, predicting outcomes, and extracting meaningful insights from large datasets, which are essential steps in the data mining process.
Data Mining Process Pyramid
Data Sources
Data Preprocessing
Data Exploration
Data Mining
Data Presentation
Decision Making
Data Preprocessing Tools and Methods
Sampling
Transformation
Cleaning
Feature Extraction
Sampling
Selects a representative subset from a large population of data.
Transformation
Manipulates raw data to produce a single input.
Cleaning
Imputation: Synthesizes statistically relevant data for missing values.
Normalization: Organizes data for more efficient access.
Feature Extraction
Pulls out a relevant feature subset that is significant in a particular context.
Data Preprocessing Steps (PCR-TEV)
Data Profiling
Data Cleansing
Data Reduction
Data Transformation
Data Enrichment
Data Validation
Data Mining Techniques
Classification
Regression
Clustering
Anomaly Detection
Time Series Analysis
Neural Networks
Decision Trees
Classification
Categorizes data into predefined classes based on features or attributes.
Regression
Predicts numeric or continuous values based on the relationship between input variables and a target variable.
Clustering
Groups similar data instances together based on intrinsic characteristics.
Anomaly Detection
Identifies rare or unusual data instances that deviate significantly from expected patterns.
Time Series Analysis
Analyzes and predicts data points collected over time.
Neural Networks
AI models inspired by the human brain, composed of interconnected nodes (neurons) and layers that learn from data.
Decision Trees
Graphical models that represent decisions and their possible consequences.
Data Mining Tools
Python
R
Rapid Miner
SQL
Knime
Weka
CrISP-DM
Cross Industry Standard Process for Data Mining
CrISP-DM
A process model with six phases that describes the data science life cycle.
Ensures that business goals remain at the center of the project.
Provides an iterative approach with frequent opportunities to evaluate progress.
Phases of CrISP-DM
Business Understanding
Data Understanding
Data Preparation
Modelling
Entity-Relationship Data Model
Relational Data Model
Object-Oriented Data Model
Evaluation
Deployment
Business Understanding
Understand project objectives and requirements from a business perspective.
Data Understanding
Data Collection
Data Quality Problems
Initial Insights of Data