Chapter 1: Exploring data Flashcards

1
Q

Individuals

A

Individuals are the objects described by a set of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variable

A

an attribute that describes a person, place, thing, or idea

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorial Variable

A

categorical variables take on values that are names or labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Quantitative Variable

A

quantitative variables are numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Continuous

A

continuous distribution is one in which data can take on any value within a specified range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Univariate Data

A

a study that looks at only one variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bivariate Data

A

a study that examines the relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Population

A

population refers to the total set of observations that can be made

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sample

A

a sample refers to a set of observations drawn from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Census

A

a study that obtains data from every member of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Distribution

A

The distribution of a statistical data set (or a population) is a listing or function showing all the possible values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Inference

A

inference is the process of using data analysis to deduce properties of an underlying distribution of probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Frequency Table

A

when a table shows frequency counts for a categorical variable, it is called a frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Relative Frequency

A

Relative frequency = Subgroup count / Total count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Table

A

tables showing the values of the cumulative distribution functions, probability functions, or probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Roundoff Error

A

the difference between an approximation of a number used in computation and its exact (correct) value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Pie Chart

A

a circular statistical graphic, which is divided into slices to illustrate numerical proportion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Bar Graph

A

a chart that plots data using rectangular bars or columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Two-way Table

A

a statistical table that shows the observed number or frequency for two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Marginal Distribution

A

marginal distribution is the percentages out of totals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Conditional

A

conditional distribution is the percentages out of some column

22
Q

Segmented Bar Graph

A

a bar graph with two columns. one of them shows a discrete value (i.e. numbers) while the other one compares the values with different bars in different categories

23
Q

Side-by-side Bar Graph

A

the bars are split into colored bar segments

24
Q

Association

A

any relationship between two measured quantities that renders them statistically dependent

25
Simpson's Paradox
when we combine all of the groups together and look at the data in aggregate form, the correlation that we noticed before may reverse itself
26
Dot Plot
a graph for displaying the distribution of numerical variables where each dot represents a value
27
Shape
symmetric, how many peaks it has, if it is skewed to the left or right, and whether it is uniform
28
Mode
a number that appears the most amount of times in a set of data
29
Center
mean or median of the data
30
Spread
how similar or varied the set of observed values are for a particular variable (data item)
31
Range
a simple measure of variation in a set of random variables
32
Outlier
a data point that diverges greatly from the overall pattern of data is called an outlier
33
Symmetric
a symmetric distribution can be divided at the center so that each half is a mirror image of the other
34
Skewed Right
fewer observations on the right (toward higher values) are said to be skewed right
35
Skewed Left
fewer observations on the left (toward lower values) are said to be skewed left
36
Unimodal
distributions with one clear peak are called unimodal
37
Bimodal
distributions with two clear peaks are called bimodal
38
Multimodal
a probability distribution with more than one peak, or “mode"
39
Stemplot
the entries on the left are called stems; and the entries on the right are called leaves
40
Splitting Stems
stem-and-leaf plots that have more than 1 space on the stem for the same interval
41
Back-to-back Stem
back-to-back stem plots are a graphic option for comparing data from two populations
42
Plots
a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables
43
Histogram
columns are positioned over a label that represents a continuous, quantitative variable, and the height of the column indicates the size of the group defined by the column label
44
Mean
the average of the data
45
Median
the middle of all the data points collected
46
Interquartile Range
a measure of variability, based on dividing a data set into quartiles
47
Five-number
gives information about the location (from the median), spread (from the quartiles) and range (from the sample minimum and maximum) of the observations
48
Summary
A summary is a brief statement or restatement of main points
49
Boxplot
a type of graph used to display patterns of quantitative data
50
Standard deviation
a numerical value used to indicate how widely individuals in a group vary
51
Variance
a numerical value used to indicate how widely individuals in a group vary