Data Mining - Lecture Introduction Flashcards

1
Q

What is data mining?

A

The creative process that provides results useful for decision making.

This process can include statistics, machine learning and programming among others, and is often used in big data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 5 V’s of big data?

A
  1. Volume
  2. Velocity
  3. Variety
  4. Veracity
  5. Value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meant by volume in big data?

A

The amount of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is meant by velocity in big data?

A

The speed at which data is being generated and changed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is meant by variety in big data?

A

The different types of data being generated (text, dates, numbers etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is meant by veracity in big data?

A

The accuracy or truthfullness of a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is meant by value in big data?

A

Data only has value is it is turned into something useful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an algorithm?

A

A specific procedure used to implement a specific data mining technique.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is nominal data?

A

Data that serve as labels (often textual). There is no ordering in the values and there is no ranking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is ordinal data?

A

Nominal data with an order between the values.

hot > mild > cool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is interval data?

A

Data with numerical values where there is an order and there are set and specific intervals between the values. There is no defined zero point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is ratio data?

A

Data with numerical values, where there is an order and there are set and specific intervals between the values. There is also a defined zero point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 10 steps in the data mining process?

A
  1. Develop an understanding of the purpose of the project.
  2. Obtain the dataset to be used in analysis
  3. Explore, clean and preprocess the data
  4. Reduce the data dimension, if necessary.
  5. Determine the data mining task
  6. Partition the data (for supervised tasks)
  7. Choose the data mining technique
  8. Use algorithms to perform the task
  9. Interpret the results of the algorithms
  10. Deploy the model (run it on real records)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is CRISP-DM?

A

Cross Industry Standard Process for Data Mining.
Also gives steps.

  1. Business understanding
  2. Data understanding
  3. Data preparation
  4. Model Building
  5. Testing and Evaluating
  6. Deployment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is SEMMA?

A

Also gives steps for the process.

  1. Sample
  2. Explore
  3. Modify
  4. Model
  5. Assess

Is a cycle. Similar to CRISP’s step 4-5-6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Student number?

A

2064381