Exploring Data and R Flashcards

1
Q

a preliminary exploration of the data to better understand its characteristic.

A

Data Exploration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

are numbers that summarize properties of the data.

A

Summary Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

is the percentage of time the value occurs.

A

Frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

is the most frequent attribute value.

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

2 MEASURES OF LOCATION

A
  1. Mean
  2. Median
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

is the most common measure of the location of a set of points.

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

alternative of mean since it is very sensitive to outlier.

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 WAYS TO MEASURE SPREAD

A
  1. Range
  2. Variance of Standard Deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

is the difference between max and min.

A

Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

is the most common measure of the spread of a set of points.

A

Variance of Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

is the conversion of data into a visual or tabular format so that the characteristics of the data and the relationships among data items or attributes can be analyzed or reported.

A

Visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

12 VISUALIATION TECHNIQUES / METHODS

A
  1. Representation
  2. Arrangement
  3. Selection
  4. Histogram
  5. Box Plots
  6. Two Dimensional Histograms
  7. Scatter Plots
  8. Contour Plots
  9. Matrix Plots
  10. Parallel Coordinates
  11. Star Plot
  12. Chernoff Faces
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

is a visualization technique which is the mapping of information to a visual format.

A

Representation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

is the placement of visual elements within a display.

A

Arrangement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

is the elimination or the deemphasis of certain objects and attributes.

A

Selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

usually shows the distribution of values of a single variable.

A

Histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

simplified version of a PDF/histogram.

18
Q

shows the joint distribution of the values of two attributes.

A

Two Dimensional Histograms

19
Q

attributes values determine the position.

A

Scatter Plots

20
Q

useful when a continuous attribute is measured on a spatial grid. They partition the planes into regions of similar values.

A

Contour Plots

21
Q

can plot a data matrix.

A

Matrix Plots

22
Q

used to plot the attribute values of high-dimensional data.

A

Parallel Coordinates

23
Q

similar approach to parallel coordinate, but axes radiate from a central point.

24
Q

approach associates each attribute with a characteristic of a face.

A

Chernoff Faces

25
is a language use statistics system. It is an environment within which many classical and modern statistical techniques have been implemented. for
R
26
Who developed R
Ross Ihaka & Robert Gentlemen
27
is a powerful and productive 3rd party user interface for R.
RStudio IDE
28
RSTUDIO USER INTERFACE
* Console Pane * Source Pane * Environment Pane * Files Pane
29
this is where you can type and execute command.
Console Pane
30
a text editor or the script window where you can edit and save a collection of command.
Source Pane
31
contains object like dataset loaded into R as well as history of all commands executed.
Environment Pane
32
open files, view plots, install and load packages.
Files Pane
33
is used for storing data tables. It is a list of vectors of equal length.
Data Frames
34
2 PLOTTING COMMANDS
* High-Level Plotting Function * Low-Level Plotting Function
35
is a plotting commands that creates a new plot on the graphics device.
High-Level Plotting Function
36
is a plotting commands that adds more information to an existing plot, such as extra points, lines, and labels.
Low-Level Plotting Function
37
is the most frequently used plotting function.
plot() Function
38
offers a powerful graphics language for creating elegant and complex plots.
ggplot2 Package
39
Hadley Wickham
created the ggplot2 package.
40
is where ggplo2 package was based on.
Grammar of Graphics