GRAHHHHHH Flashcards

(32 cards)

1
Q

What is Data Mining?

A

The process of sorting through large data sets to identify patterns and relationships that can help solve business problems.

Involves methods at the intersection of machine learning, statistics, and database systems.

Aims to extract information from a data set and transform it into a comprehensible structure for further use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Analytics vs Data Mining

A

Data analytics is the process of interpreting data to find trends and patterns.
Data mining is the process of extracting valuable information from a large dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Machine Learning in Data Mining

A

Helps in identifying patterns, predicting outcomes, and extracting meaningful insights from large datasets, which are essential steps in the data mining process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Mining Process Pyramid

A

Data Sources
Data Preprocessing
Data Exploration
Data Mining
Data Presentation
Decision Making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Preprocessing Tools and Methods

A

Sampling
Transformation
Cleaning
Feature Extraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sampling

A

Selects a representative subset from a large population of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Transformation

A

Manipulates raw data to produce a single input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cleaning

A

Imputation: Synthesizes statistically relevant data for missing values.
Normalization: Organizes data for more efficient access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Feature Extraction

A

Pulls out a relevant feature subset that is significant in a particular context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data Preprocessing Steps (PCR-TEV)

A

Data Profiling
Data Cleansing
Data Reduction
Data Transformation
Data Enrichment
Data Validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Mining Techniques

A

Classification
Regression
Clustering
Anomaly Detection
Time Series Analysis
Neural Networks
Decision Trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Classification

A

Categorizes data into predefined classes based on features or attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Regression

A

Predicts numeric or continuous values based on the relationship between input variables and a target variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Clustering

A

Groups similar data instances together based on intrinsic characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Anomaly Detection

A

Identifies rare or unusual data instances that deviate significantly from expected patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Time Series Analysis

A

Analyzes and predicts data points collected over time.

17
Q

Neural Networks

A

AI models inspired by the human brain, composed of interconnected nodes (neurons) and layers that learn from data.

18
Q

Decision Trees

A

Graphical models that represent decisions and their possible consequences.

19
Q

Data Mining Tools

A

Python
R
Rapid Miner
SQL
Knime
Weka

20
Q

CrISP-DM

A

Cross Industry Standard Process for Data Mining

21
Q

CrISP-DM

A

A process model with six phases that describes the data science life cycle.

Ensures that business goals remain at the center of the project.

Provides an iterative approach with frequent opportunities to evaluate progress.

22
Q

Phases of CrISP-DM

A

Business Understanding
Data Understanding
Data Preparation
Modelling
Entity-Relationship Data Model
Relational Data Model
Object-Oriented Data Model
Evaluation
Deployment

23
Q

Business Understanding

A

Understand project objectives and requirements from a business perspective.

24
Q

Data Understanding

A

Data Collection
Data Quality Problems
Initial Insights of Data

25
Data Preparation
Covers all activities to construct the final dataset from the initial raw data.
26
Modelling
Evaluates, selects, and applies appropriate modeling techniques. Hierarchical Data Model: Composed of links between records. Entity-Relationship Data Model: Widely adopted for relational databases. Relational Data Model: Data stored in columns and tables. Object-Oriented Data Model: Represents data and relationships in a single structure.
27
Evaluation
Builds and chooses models based on selected loss functions, ensuring they generalize against unseen data, and cover all key business issues.
28
Deployment
Deploys a code representation of the model into an operating system, including mechanisms to score or categorize new data.
29
(GDPR)
General Data Protection Regulation
30
Key Principles
Lawfulness, Fairness, and Transparency: Data must be processed legally, fairly, and transparently. Purpose Limitation: Data should be collected for a specific purpose and not used for anything else. Data Minimization: Organizations should only collect the data they actually need. Accuracy: Personal data must be accurate and kept up to date. Storage Limitation: Data should not be kept longer than necessary. Integrity and Confidentiality: Data must be kept secure and protected from unauthorized access. Accountability: Organizations must take responsibility for complying with GDPR.
31
Key Rights of Individuals
Right to Access: Individuals can request a copy of their data. Right to Rectification: They can ask for corrections to inaccurate data. Right to Erasure (Right to be Forgotten): They can request data deletion. Right to Restrict Processing: They can limit how their data is used. Right to Data Portability: They can receive their data in a usable format. Right to Object: They can refuse certain types of data processing, like direct marketing. Rights Related to Automated Decision-Making: Individuals can challenge automated decisions affecting them.
32
Why does GDPR Matter?
Strengthens consumer privacy rights. Ensures transparency and accountability in data handling. Influences global data protection laws, including in the U.S., UK, and other regions.