Exploratory Data Analaysis Flashcards

1
Q

Exploratory Data Analysis

A
 an approach/philosophy for data analysis 
that employs a variety of techniques 
(mostly graphical) to maximize insight 
into a data set;
uncover underlying structure;
extract important variables;
detect outliers and anomalies;
test underlying assumptions;
develop parsimonious models; and
determine optimal factor settings.
 set of techniques
 the flexibility to respond to the patterns revealed by successive iterations in the discovery process is an important attribute
 Free to take many paths in revealing mysteries in the data
 Emphasizes visual representations and graphical techniques over summary statistics
 a philosophy as to how we dissect a 
data set; 
 what we look for; 
 how we look; and 
 how we interpret. 
  heavily uses the collection of techniques that we call "statistical graphics", but it is not identical to statistical graphics.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

EDA and Visualization

A
  • Exploratory Data Analysis (EDA) and Visualization
    are very important steps in any analysis task.
  • get to know your data!
    ◦ distributions (symmetric, normal, skewed)
    ◦ data quality problems
    ◦ outliers
    ◦ correlations and inter-relationships
    ◦ subsets of interest
    ◦ suggest functional relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Previously Designed Techniques for Displaying Data

A
◦ frequency tables
◦ bar charts (histogram)
◦ pie charts
◦ stem and leaf displays
◦ boxplots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

More detailed explanation of EDA

A

“Exploratory Data Analysis refers to the
critical process of performing initial
investigations on data so as to discover
patterns,to spot anomalies,to test
hypothesis and to check assumptions
with the help of summary statistics and
graphical representations.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

History of EDA

A
 The seminal work in EDA 
is Exploratory Data Analysis, Tukey, 
(1977). Over the years it has 
benefitted from other noteworthy 
publications such as Data Analysis 
and Regression, Mosteller and Tukey 
(1977), Interactive Data Analysis, 
Hoaglin (1977), The ABC's of EDA, 
Velleman and Hoaglin (1981) and has 
gained a large following as "the" way 
to analyze a data set.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

EDA Techniques

A

 Most are graphical in nature with a few
quantitative techniques.
 The reason for the heavy reliance on graphics is that by its very nature the main role of EDA is to open-mindedly explore, and graphics gives the analysts unparalleled power to do so, enticing the data to reveal its structural secrets, and being always ready to gain some new, often unsuspected, insight into the data. In combination with the natural pattern-recognition capabilities that we all possess, graphics provides, of course,
unparalleled power to carry this out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Proportions

A
The proportion among elements in 
the collection belonging in a given 
category is defined  as: 
the number of elements belonging in 
the category divided by the total 
number of elements in the collection.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Percent

A
Percent means “per hundred”, 
“by the hundred”, or “out of a 
hundred”. A proportion can be 
converted to a percentage by 
multiplying it by 100.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ratio

A

The ratio of a number x to another number
y expresses the size of one measure x with
respect to the size of another measure y.
• It is written as x:y and is read as “x is to
y”.
• When the measure x is divided by the
measure y, the relationship that x bears
to y is then expressed as a ratio to one.
• The measure y in the denominator is
called the base.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Percent Change

A
When the new amount is less than the 
original amount, the number on top will 
be a negative number and the result will 
be a percent decrease; otherwise, the 
percentage change is positive and is 
called a percent increase.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly