Prelims - honpritz Flashcards

honpritz (86 cards)

1
Q

Data encoders, gatherers

A

Collector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Treat, prepare data

A

Data engineer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Performs the modeling, testing, and validation

A

Modeler or Data Scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Do the decision making

A

Business analyst

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Steward

A

Collector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Modeler

A

Data Scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

It is a multi - disciplinary field that uses scientific method, processes, algorithms, computations, and systems in order to extract understanding and insights from a structured and/or unstructured data.

A

Data Science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

is the mother of invention.

A

NECESSITY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What era:

REPORT WRITING
Goal: Automation

A

1970s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What era:

CENTRALIZED SYSTEM
Goal: ERP (Enterprise Resource Planning)/ MIS (Management Info System)

A

1980s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Goals of the 1980s Centralized system

A
  • Enterprise Resource Planning
  • Management Info System
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What era:

Business Intelligence
Goal: Apps for everyone
Applications for personal use were invented and made to share (not YET to analyze)

A

1990s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Goal: Apps for everyone
Applications for personal use were invented and made to share (not YET to analyze)

A

1990s Business Intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What era:

INTERNET & DATA MINING

A

2000s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What era:

BIG DATA &
Data Science (used for real-time analysis)

A

2010

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The value in the data haystack is guided by your knowledge of the ____ - not the ___ or ____

A

domain; tools or techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

the combination of al skillsets needs to find the value in the data

A

Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Data under Business Intelligence

A
  • Standard reports (What happened?)
  • Ad Hoc, Drill down (Where exactly is the problem?)
  • Alerts (What needs attention?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data under Predictive analytics

A

Predictive modeling
“What is the next best action?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Data under Prescriptive analytics

A

Optimization
“What is the best thing that can happen?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Evolution of analytics

A

Descriptive → Diagnostic → Predictive → Prescriptive → Cognitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What happened? Describes historical data: Helps understand how things are going

A

Descriptive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why did it happen?
Helps understand unique drivers; Segmentation, Statistical, & Sensitivity analysis

A

Diagnostic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What could happen? Forecast future performance, events a n d results

A

Predictive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How to make it happen? Analysis that suggest a prescribed action
Prescriptive
26
What to do, why &how? Proactive action Learn at scale Reason with purpose Interact naturally
Cognitive
27
Data Science & Analytics: in health care
- Medical Image analysis - Machine Learning in Disease Diagnosis - Genetics & Genomics - Drug Development - Virtual assistance for patients and customer support
28
Finding useful pattern in a data.
Data Mining
29
it is the process of knowledge discovery, machine learning and predictive analytics.
Data Mining
30
Data Mining
* Extracting Meaningful Patterns. * Building Representative Models. * Combination of Statistics, Machine Learning, and Computing * Algorithms
31
DATA MINING: Types of Learning Models
- Supervised - Unsupervised
31
Data Mining is NOT about:
* Descriptive statistics. * Exploratory visualization. * Dimensional slicing * Hypothesis testing * Queries
32
directed data mining
Supervised Learning Model
33
The model generalizes the relationship between the input and output variables.
Supervised Learning Model
34
Undirected data mining
Unsupervised Learning Model
35
The objective of this class of data mining techniques is to find patterns in data based on the relationship between data points themselves
Unsupervised Learning Model
36
DATA MINING: Groups of Learning Models
* Classification Models * Regression Models * Clustering Models * Anomaly Detection * Time Series Forecasting * Association * Text and Sentiment Analysis
37
DATA MINING: Steps
- Business Understanding - Data Understanding - Data Preparation - Modeling - Testing and Evaluation - Deployment
38
the process of preparing data for analysis by removing or modifying incorrect, incomplete, irrelevant, duplicated, or improperly formatted data.
Data Cleaning
39
variables of a given data set; Represented by columns
Attributes
40
Cases or observations of a given data set Represented by rows
Examples
41
Functions or building blocks that create processes for data analysis
Operators
42
Parts of the RapidMiner Interface
- Canvas / Process Panel - Repository / Source Tabs - Operators / Analysis Tabs - Parameter Tabs - Description Tabs
43
Working area for building processes
Canvas or the Process Panel
44
Storage within rapid miner studio for data and rapid miner processes
Repository / Source Tabs
45
Building blocks used to create rapidminer processes
Operators / Analysis Tabs
46
Settings that modify operator behavior
Parameter Tabs
47
context-sensitive help for selected operator
Help
48
work area for accessing specific functionality
Views
49
Methods of Importing Data
- From Repository - "Read Excel" Operator
50
many different string values (for example: red, green, blue, yellow)
polynomial
51
exactly two values (for example: true/false, yes/no)
binomial
52
a fractional number (for example: 11.23 or -0.0001).
real
53
a whole number (for example: 23, -5, or 11,024,768).
integer
54
both date and time (for example: 23.12.2014 17:59).
date_time
55
Operator used for filtering cases
Filter Examples
56
Operator used for removing all cases with missing values
Filter Examples
57
Operator used for imputing missing data
Replace Missing Values
58
To remove "white spaces" in the encoding, use the _____ operator.
TRIM
59
To remove "duplicates" in the encoding, use the _________ operator.
Remove Duplicates
60
To recode miscoded values, use the ______ operator.
REPLACE
61
Use the ________ operator to select the attributes that you need for analysis.
Select Attributes
62
Use the _____ operator to tag the attribute that will be use as the label (Target Variable) or any other role it will act in the analysis.
Set Role
63
If two data sets are needed to be merged in order to make an analysis, use the ____ operator.
Join
64
Joining Two Data Sets: In the parameter tab, use _____ as join type.
Inner
65
graphical representation of data
Data Visualization
66
techniques used to communicate insights from data through visual representation.
Data Visualization
67
to distill large datasets into visual graphics to allow for easy understanding of complex relationships within the data
Data Visualization
68
to analyze massive amounts of information and make data-driven decisions.
Data Visualization
69
Visualization Technique: to compare counts, percentage, or other measures (average) for different discrete categories of data
Bar Graph
70
Visualization Technique: to observe trend
Line Graph
71
Visualization Technique: shows the relative contribution that different categories contribute to an overall total
Pie Graph
72
Visualization Technique: the frequency distribution of continuous attribute
Histogram
73
(Bar vs Histo) presents categorical attribute
Bar graph
74
(Bar vs Histo) represents numerical attribute
75
(Bar vs Histo) represents numerical attribute
histogram
76
(Bar vs Histo) have spaces between bars
Bar graph
77
(Bar vs Histo) do not have spaces between bars
Histogram
78
Visualization Technique: plots two numerical attributes
Scatterplot
78
Visualization Technique: graphical representation of the quartiles
Boxplots
79
process performed to decide which examples are kept ad which are removed
Data filtering
79
Visualization Technique: a graphical representation of data where the individual values contained in a matrix (map) are represented as colors.
Heat maps
80
replaces missing values by the attribute's minimum, maximum, or average value.
Missing Value Imputation
81
Imputation method is selected in the ?
Default
82
Use the ______ operator to create a RapidMiner data set from the process
Store
83
Use the ______ operator to store the data in a format you want.
Write ***