IDSA PRELIMS Flashcards

1
Q

What are new techniques to solve problems?

A

Data Science & Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the different roles in analytics?

A

Collector/ Data Steward
Business Analyst
Modeler/Data Scientist
Data Engineer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the APEC Analytics Competencies?

A
  • Domain Knowledge & Application
  • Data Management & Governance
  • Operational Analytics
  • Data Visualization & Presentation
  • Research Methods
  • Data Engineering Principles
  • Statistical Techniques
  • Data Analytics Methods & Algorithms
  • Computing
  • 21st Century Skills
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Who has the best domain knowledge?

A

Steward
Analyst
Manager

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Who has the best data governance?

A

Steward
Manager

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Who has the best operational analytics?

A

ALL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Who has the best data visualization?

A

Analyst
Manager

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Who has the best research methods?

A

Scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Who has the best data engineering?

A

Engineer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Who has the best statistical techniques?

A

Scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Who has the best methods and algorithms?

A

Scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Who has the best computing?

A

Scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Who has the best 21st century skills?

A

ALL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the components of the Data Science Skillset?

A

Substantive Expertise
Math and Sciences Knowledge
Hacking Skills
Substantive Expertise
Traditional Research
Machine Learning
Danger Zone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data science requires the intersection of what abilities?

A

Hacking skills
Math and Science Statistics
Substantive Expertise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Necessary for working with massive amounts of electrical data

A

Hacking skills

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Crucial for generating motivating questions and hypotheses and interpreting results

A

Substantive expertise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Allows a data scientist to choose appropriate methods and tools in order to extract insight from data

A

Math & Statistics knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Stems from combining hacking skills with math and statistics knowledge, but does not require scientific motivation

A

Machine learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Lies at the intersection of knowledge of math and statistics with substantive expertise in a scientific field

A

Traditional Research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Combined with substantive scientific expertise without rigorous methods can beget incorrect analyses

A

Danger Zone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Data Science or Data Analytics: Uses big data

A

Both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Data Science or Data Analytics: Healthcare, gaming, travel, industries with immediate data needs

A

Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Data Science or Data Analytics: Macro

A

Science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Data Science or Data Analytics: To ask the right questions

A

Science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Data Science or Data Analytics: Machine learning, AI, Search engine, engineering, corporate analytics

A

Science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Data Science or Data Analytics: To find actionable data

A

Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Data Science or Data Analytics: Micro

A

Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the mother of innovation?

A

Necessity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the goal of report writing?

A

Automation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are the goals of a centralized system?

A

ERP - Enterprise Resource Planning

MIS - Management Info System

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Goals: Apps for everyone

A

Business Intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Where is data science and analytics seen?

A

Education
Environment
Healthcare

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

process of knowledge discovery, machine learning and predictive analytics.

A

Data Mining

35
Q

Data mining is NOT about?

A
  • Descriptive statistics
  • Exploratory visualization
  • Dimensional slicing
  • Hypothesis testing
  • Queries
36
Q

Data Mining involved extracting ____, building _____ and is a combination of ____, _____, ____ .

A
  • Extracting Meaningful Patterns.
  • Building Representative Models.
  • Combination of Statistics, Machine Learning, and Computing Algorithms
37
Q

Types of Learning Models in Data Mining?

A

Supervised/ Directed
Unsupervised/ Undirected

38
Q

What model of data mining: generalizes the relationship between the input and output variables.

A

Supervised

39
Q

What model of data mining: to find patterns in
data based on the relationship between data points themselves

A

Unsupervised

40
Q

DATA MINING: Groups of Learning Models?

A
  • Classification Models (S)
  • Regression Models (S)
  • Clustering Models (S/US)
  • Anomaly Detection (US)
  • Time Series Forecasting (US)
  • Association (US)
  • Text and Sentiment Analysis (US)
41
Q

DATA MINING: Steps?

A

 Business Understanding  Data Understanding
 Data Preparation
 Modeling
 Testing and Evaluation
 Deployment

42
Q

Data cleaning is the process of preparing data for analysis by removing or modifying?`

A

incorrect, incomplete, irrelevant, duplicated, or improperly formatted data.

43
Q

Parts of Rapidminer interface

A

Repository
Canvas
Operators/Analysis tabs
Parameter tabs
Description tabs

44
Q

How to import data on Rapidminer?

A

File –> import data
or click the repository tab

45
Q

Types of data when importing?

A

polynomial

binomial

real

integer

date_time

date

time

46
Q

What type of data:
many different string values (for example: red, green, blue, yellow)

A

polynomial

47
Q

What type of data: (for example: 23.12.2014 17:59).

A

date_time

48
Q

What type of data:
a fractional number (for example: 11.23 or -0.0001).

A

real

49
Q

What type of data: (for example 23.12.2014).

A

Data

50
Q

What type of data: (for example 17:59).

A

Time

51
Q

What type of data: a whole number (for example: 23, -5, or 11,024,768).

A

Integer

52
Q

What type of data: exactly two values (for example: true/false, yes/no)

A

Binomial

53
Q

After importing data, the data will appear in the ______ tab.

A

Results

54
Q

To find the basic statistics of each attributes, click _____.

A

Statistics

55
Q

In filtering cases, You may add more criteria by clicking ____.

A

Add Entry.

56
Q

In missing value imputation data preparation, Instead of filtering, you may?

A

remove all cases with missing values, using the condition class, instead of Add Filters.

57
Q

To impute missing data, in the operator tab, search for ____, then drag and drop on the line connecting the Filtering Examples and the res knob.

A

Replace Missing Values

58
Q

In dealing with miscoded data, To remove “white spaces” in the encoding, use the ____ operator.

A

TRIM

59
Q

In Dealing with miscoded data, Connect the _____ and the ______.

A

Out node of the Retrieve Customer operator and second res of the result knob

60
Q

To remove “duplicates” in the encoding, use the _____ operator.

A

Remove Duplicates

61
Q

To recode miscoded values, use the _____operator.

A

REPLACE

62
Q

You may impute missing values using _______ operator in other attributes.

A

REPLACE MISSING VALUES

63
Q

Use the ______ operator to select the attributes that you need for analysis.

A

Select Attributes

64
Q

Set role operator is used when?

A

to tag the attribute that will be use as the label (Target Variable) or any other role it will act in the analysis.

65
Q

Join operator is needed when?

A

If two data sets are needed to be merged in order to make an analysis

66
Q

Connect the first data set or its result in the (right/left) node of the Join operator and the other data set at the (right/left) node.

A

Left; right

67
Q

What are the steps of data preparation in RapidMiner?

A
  1. Importing Data
  2. Data Preparation
  3. Data Filtering
  4. Missing Value Imputation
  5. Dealing with Miscoded Entries
  6. Selecting and Setting Roles of Attributes
  7. Combining Data Sets
  8. Data Cleaning
68
Q

What is data visualization?

A

graphical representation of data

techniques used to communicate
insights from data through visual
representation.

69
Q

What are the objectives of data visualization?

A
  • to distill large datasets into visual
    graphics to allow for easy understanding
    of complex relationships within the data
  • to analyze massive amounts of information
    and make data-driven decisions.
70
Q

What are the common visualization techniques?

A
  • Bar Graph
  • Line Graph
  • Pie Graph
  • Histogram
  • Scatterplot
  • Boxplot
  • Heatmap
71
Q

What Common Visualization Technique: to compare counts, percentage, or other measures (average) for different discrete categories of data

A

Bar Graph

72
Q

T or F: Bar Graphs in RapidMiner are aggregated data

A

T

73
Q

In creating bar graphs in RapidMiner, Set the ________
and use the _____ function.

A

Group by Stage;
Average aggregate

74
Q

What Common Visualization Technique: to observe trend

A

Line Graph

75
Q

What Common Visualization Technique: shows the relative contribution that different categories contribute to an overall total

A

Pie Graph

76
Q

What Common Visualization Technique: the frequency distribution of continuous attribute

A

Histogram

77
Q

Bar graph presents ____ attribute while histogram
represents ____ attribute .

A

categorical
numerical

78
Q

T or F: Histograms have spaces in between

A

F

79
Q

T or F: In creating a histogram, CHECK the reverse axis to keep the order of the values.

A

F; do not check

80
Q

T or F: There can be a histogram for two or more variables

A

T

81
Q

What Common Visualization Technique: plots two numerical attributes

A

Scatterplot

82
Q

What Common Visualization Technique: graphical representation of the quartiles

A

Boxplots

83
Q

What Common Visualization Technique: graphical representation of data where the individual values contained in a matrix (map) are represented as colors.

A

Heat maps