Lecture 2 Flashcards

1
Q

Data?

A

A collection of facts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is data obtained?

A

As the result of experiences, observations, or experiments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does data consist of?

A

Numbers
Words
Images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data source reliability?

A

Confidence and belief in this data source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data content accuracy?

A

The right data for the job

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data accessibility?

A

Can we easily get to the data when we need to?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data security and privacy?

A

Allow people with authority only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data richness?

A

All the required data elements are required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data consistency?

A

Accurately collected and combined/merged

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data currency?

A

Up to date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data granularity?

A

The variables be defined at the lowest level of detail for the intended use of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data validity?

A

Match/mismatch between the actual and expected data values of a given variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data relevancy?

A

The variables in the data set are all relevant to the study being conducted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Structured Data?

A

Targeted for computers to process

Numeric versus Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Unstructured/Textual Data?

A

Targeted for humans to process/digest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Semi-Structured Data?

A

XML
HTML
Log files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Categorical Structured Data?

A

Nominal

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Numerical Structured Data?

A

Interval

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Unstructured Data contents?

A

Textual
Multimedia
XML/JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does data preprocessing include?

A

Data consolidation
Data cleaning
Data transformation
Data reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Variables?

A

Dimensional Reduction

Variable Selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Cases/Samples?

A

Sampling

Balancing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Data consolidation subtasks?

A

Access and collect the data
Select and filter the data
Integrate and unify the data

24
Q

Data consolidation popular methods?

A

SQL queries
Software agents
Web services
Domain expertise

25
Data cleaning subtasks?
Handle missing values in the data Identify and reduce noise in the data Find and eliminate erroneous data
26
Data cleaning, handling missing data popular methods?
Fill in the missing values with the most appropriate values
27
Data cleaning, identifying and reducing noise in the data popular methods?
Identify the outliers in data with simple statistical techniques or with cluster analysis
28
Data cleaning, finding and eliminating erroneous data popular methods?
Identify the erroneous values in data, such as odd values, inconsistent class labels, odd distributions
29
Data transformation subtasks?
Normalize the data Discretize or aggregate the data Construct new attributes
30
Data transformation, normalizing data popular methods?
Reduce the range of values in each numerically valued variable to a standard range by using a variety of normalization or scaling techniques
31
Data transformation, discretize or aggregate data popular methods?
Convert the numeric variables into discrete representations using range-or-frequency-based binning techniques
32
Data transformation, construct new attributes popular methods?
Derive new and more informative variables from the existing ones using a wide range of mathematical functions
33
Data reduction subtasks?
Reduce number of attributes Reduce number of records Balance skewed data
34
Data reduction, reduction number of attributes popular methods?
Principal component analysis Independent component analysis Chi-square testing Correlation analysis
35
Data reduction, reduction of number of records popular methods?
Random sampling Stratified sampling Expert-knowledge-driven purposeful sampling
36
Data reduction, balancing skewed data popular methods?
Oversample the less represented or undersample the more represented classes
37
Statistics?
A collection of mathematical techniques to characterize and interpret data
38
Descriptive statistics?
Describing the data as it is
39
Inferential statistics?
Drawing inferences about the population based on sample data
40
Mean Absolute Deviation?
Average absolute deviation from the mean
41
Regression?
A part of inferential statistics The most widely known and used analytics technique in statistics Used to characterize relationship between explanatory and response variable
42
What can regression be used for?
Hypothesis testing | Forecasting
43
Correlation vs Regression?
Correlation is a single statistic or data point, where regression is the entire equation with all of the data points that are represented with a line
44
How to develop linear regression models?
Scatter plots | Ordinary least squares method
45
Regression Modelling Assumptions?
``` Linearity Independence Normality Constant Variance Multicollinearity ```
46
What is a report?
Any communication artifact prepared to convey specific information
47
Functions that report can fulfill?
``` To ensure proper departmental functioning To provide information To provide the results of an analysis To persuade others to act To create an organizational memory ```
48
What is a business report?
A written document that contains information regarding business matters
49
Purpose of business report?
To improve managerial decisions
50
Source of business report?
Data from inside and outside the organization
51
Format of business report?
Text + tables + graphs/charts
52
Distribution of business report?
In-print Email Portal
53
Steps of business report distribution?
Data acquisition -> Information generation -> Decision making -> Process management
54
Types of Business Reports?
Metric Management Reports Dashboard-Type Reports Balanced Scorecard - Type Reports
55
Data Visualization?
The use of visual representations to explore, make sense of, and communicate data
56
Information visualization?
Aggregation, summarization, and contextualization of data
57
Types of dimension reduction?
Variable Selection Principle Components Multi-dimensional scaling