Introduction To Data Flashcards

(132 cards)

0
Q

3 components of statistics

A

Collect
Analyze
Infer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Study of how best to collect, analyze and draw conclusions from data

A

Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a study, the group that provides the reference point against the treatment group is

A

Control group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Single number summarizing a large amount of data

A

Summary statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The first step in most analyses

A

Effective presentation and description of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Each row in the table is the

A

Case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Each column on the table is a

A

Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Row + column

A

Data matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Another term for case

A

Unit of observation or an observational unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A variable with values that can be added, subtracted or averaged is

A

Numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A numerical value that cannot take non negative numbers is

A

Discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Variables that denotes classification is

A

Categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The possible values of categorical is

A

Level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Categorical variable with levels of natural ordering is

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When two variables show some connection with one another, they are called ___________________ or _____________________ variables.

A

Associated; dependent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If a variable increase and the other decrease, there is

A

Negative association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If the variable increase, and the other increase, this is

A

Positive association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

If two variables are not associated, this is

A

Independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Can a pair of variable be associated and independent at the same time?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Each research question refers to a target

A

Population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A subset of cases which is a small fraction of the population is known as

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Data collected in haphazard fashion is

A

Anecdotal evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

If someone was permitted to pick and choose exactly the included subjects in a sample, this introduces _____________ into a sample.

A

Bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Most basic random sample is

A

Simple random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
In simple random sample, each case in a population has a/an __________ chance of being included
Equal
25
Bias can crop up. If only 30% of people randomly sampled actually responded, it is unclear whether the results are __________________ of the entire population. The _____________ bias can skew results.
Representative / non response
26
When individuals who are easily accessible are more likely included in the sample, this is _____________________.
Convenience sample
27
Explanatory variable might affect
Response variable
28
Association implies causation. True or false.
Not always. False.
29
Two primary types of data collection
Observational studies | Experiments
30
Collecting data in a way that does not directly interfere with how the data arise is
Observational study
31
When researchers want to investigate the possibility of a causal connection, they conduct a/an
Experiment
32
When individuals are randomly assigned to a group, the experiment is called a
Randomized experiment
33
In a two group experiment, the fake treatment is called a
Placebo
34
Causation can only be inferred from a ______________.
Randomized experiment
35
A variable correlated with both the explanatory and response variables
Confounding variable
36
Two forms of observational studies
Prospective | Retrospective
37
What observational study identifies individuals and collects information as events unfold
Prospective
38
What observational study collect data after events have taken place, eg, researchers review past events in medical records
Retrospective
39
Three random sampling techniques
Simple Stratified Cluster
40
Most intuitive form of random sampling
Simple random sampling
41
Fishbowl is
Simple random
42
Divide and conquer sampling strategy
Stratified sampling
43
When similar cases are grouped together, then simple random sampling is employed in each group, this is
Stratified sampling
44
A two-stage simple random sample is
A cluster sample
45
This is similar to stratified sampling but no requirement
Cluster
46
Studies where researchers assign treatments to cases are called
Experiments
47
Four principles of experimental design
Controlling Randomizing Replication Blocking
48
Asking all patients to drink a 12 ounce of water with the pill demonstrates
Control
49
To even out differences and prevent accidental bias, what is done?
Randomization
50
Verifying an earlier finding to make it more accurate requires
Replication
51
If variables influence a response, split the cases in categories, then split the distribution. This is
Blocking
52
The gold standard in data collection is
Randomized experiments
53
When researchers keep the patients uninformed about their treatment, the study is said to be
Blind
54
Fake treatment
Placebo
55
If a fake treatment results in a slight but real improvement in patients, this is
Placebo effect
56
If doctors and researchers, like patients, are unaware of who is or is not receiving treatment, this is
Double blind
57
Provides a case by case view of data for two numerical variables
Scatterplot
58
Scatterplot helps spot
Associations
59
One-variable scatterplot
Dot plot
60
Common way to measure the center of a distribution of data
Mean
61
Sample mean
X with line above where x is the total number of cases or observation units
62
What is the sample size in x = x1 + x2 + xn / n
n
63
The average of all observations in a population is known as ; a subscript represents
mu ; variable the population mean refers to
64
Sample mean may provide a reasonable estimate of _____________. Although not perfect, this provides a _____________.
mu subscript x where mu = average of ALL observations and x = variable ; rough estimate
65
Provides a view of the data density
Histogram
66
Useful when individual values are of interest
Dot plot
67
Useful for highlighting outliers, median and interquartile range
Box plot
68
What determines skew
The long tail
69
Useful for highlighting spatial distribution
Intensity map
70
4 ways to evaluate variables relationship
Direction Shape Strength Outliers
71
3 forms of skewness
Left Symmetric Right
72
4 modalities of skewedness
Unimodal Bimodal Uniform Multimodal
73
2 measures of variability
Variance | Standard deviation
74
Which one is easier to understand? Variance or standard deviation?
Standard deviation
75
Distance of an observation from the mean is
Deviation
76
What is the symbol for sample variance?
S with superscript 2
77
Formula for sample variance?
Square all over n-1
78
The square root of the variance is
Standard deviation
79
Standard deviation is the
Square root of the variance
80
S squared / n-1 =
Sample Variance
81
What is variance?
Average squared distance from the mean
82
Square root of the variance
Standard deviation
83
The greek letter for used for population values
Sigma
84
What is the difference between sample variance and population variance?
Sample variance uses n-1 and population variance uses n
85
Summarizes a data set using five statistics while plotting unusual observations
Box plot
86
The first step in building a box plot is denoting the
Median
87
To find median, arrange variables from
Smallest to largest
88
The second step in building a box plot is
Drawing a rectangle to represent the middle 50% of the data
89
The total length of the box in a box plot is the
Interquartile range (IQR)
90
The two boundaries of the box are called
First quartile and third quartile
91
The more variable the data, the _____________ the standard deviation
Larger
92
25% of the data fall below this value
Q1
93
25% of this data is above this value(vertical box plot)
Q3
94
What is the formula for IQR?
IQR = Q3-Q1
95
In a box plot, the ____________ attempt to capture the data outside of the box
Whisker
96
The whisker is never allowed to go beyond
1.5 x IQR
97
An observation beyond the whisker, aka, unusually distant observations are called
Outliers
98
An observation that appears extreme relative to the rest of the data
Outlier
99
Why is it important to look for outliers?
Insight to interesting data properties Errors in entry or collection of data Reexamine Strong skew identification
100
Extreme observations have little effect on the
Median and IQR
101
Median and IQR are called ______________ estimates
Robust
102
Why are median and IQR robust estimates?
They are only sensitive to the numbers near Q1, the median and Q3.
103
A table that summarizes data for two categorical variables is called a
Contingency table
104
Provides total counts across each row
Row totals
105
Provides total counts down each column
Column totals
106
A table for a single variable is
Frequency table
107
A frequency table replaced with percentages and proportions is called a
Relative frequency table
108
Common way to display a single categorical variable
Box plot
109
Counts divided by their row totals
Row proportions
110
Count divided by column totals
Column proportion
111
A table that summarizes data for two categorical variables is called a
Contingency table
112
Provides total counts across each row
Row totals
113
Provides total counts down each column
Column totals
114
A table for a single variable is
Frequency table
115
A frequency table replaced with percentages and proportions is called a
Relative frequency table
116
Common way to display a single categorical variable
Bar plot
117
Counts divided by their row totals
Row proportions
118
Count divided by column totals
Column proportion
119
When do you use barplots? Histograms?
Barplot Categorical Histogram-numerical variable
120
X axis on histogram is
Numerical
121
X axis on barplot
Category
122
Rescaling of the data using a function
Transformation
123
When much of the data cluster is near zero relative to the larger values of the data set
Natural log transformation
124
Why transform scatterplot
Make the relationship between variables more linear
125
Goals of transformation
See data structure differently Skew reduction to assist in modeling Straighten a nonlinear relationship in a scatterplot
126
To visualize two categorical variables
Segmented bar plot
127
Useful for visualizing conditional frequency distributions
Segmented bar plot
128
To explore relationships between variables in a segmented bar plot, we need to compare
Relative frequencies
129
Segmented bar plot that uses proportion is
Relative frequency segmented bar plot
130
It displays marginal distribution, by using the width of a bar
Mosaic plot
131
Mosaic plot is only used for
Categorical variable