Exam 2 Lecture 1 Flashcards

1
Q

Why are protocols necessary?

A

Once we know the question we want to ask, we have to devise a way to get the answer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is the structure of data important?

A

Once we have a plan to get the answer, we have to have a plan to manage the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

We got ourselves some data! Now, how do we know it’s not junk?

A

Quality checking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Garbage In, Garbage Out

A

In computer science, garbage in, garbage out (GIGO) is the concept that flawed, or nonsense input data produces nonsense output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data are never as easy as they seem… what are the 3 possible biases you can introduce

A
  • Messy data
  • Dirty data
  • Missing data
    You can’t assume anything!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is bias?

A

When the data you have don’t actually represent the parameter you are studying, or the sample you have doesn’t actually represent the population you are interested in. Bias is not ‘by chance’, it’s systematic error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fixable errors that do not make ASSUMPTIONS about what the right answer is + examples

A

Messy data

Examples:
Human error
- Question was clear, checked wrong box
- “Eight” instead of “8”

Computer error
- Zip code- started with 0, computer recorded as ‘8854’

Equipment error
- Noise in signal that can be removed because source of noise is known (ie interference from power line nearby)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are messy data?

A

Fixable errors that do not make assumptions about what the right answer is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are dirty data?

A

Unfixable errors. You cannot deduce what the answer should be. Data must be discarded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Dirty data and examples

A

These are unfixable errors. You cannot deduce what the answer should be. Data must be discarded.

Examples
Equipment problems
- Noise in signal that cannot be explained

Protocol failure
- Some people smoked cigarettes right before breath test

Question problems
- Poor wording that creates confusion
Is an allergic property a key component of your health?
- Wording that biases the answer
How bad do you think marijuana is for you?

Response problems
- Open-ended answers that are hard to categorize/quantify
Sometimes, once when I was at my mom’s but later more often
- Incomplete response options so people don’t know how to answer
Real A: weekly. Actual options: Every day, once a month, once a year.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Even if you are perfect, if your study involves people (like YOU), there will be ____________

A

Problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Unfixable errors are

A

Dirty data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Missing data

A

The absence of clean= dirt

  • Very few datasets are complete. There are always little issues.
  • Sometimes stuff is just missing
    A subject skipped a question, a researcher forgot to weigh subject
  • Sometimes data need to be discarded
    Heart rate monitor failed for 1 person
  • If there are clear reasons and rules for discarding data, OK, but you can’t just discard data because you don’t like it
    You want to discard data from 3 adult subjects that shows their heights are 9ft, 12ft, and 25 in (you can google tallest and shortest adults)
    You cannot discard data from young woman who says she can deadlift 350lbs because you don’t believe it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Having some missing data is usually _________

A

Acceptable, because statistics has a few ways to deal with it.
- You must check that there is A PATTERN to what is missing
- Is the missing stuff RANDOM? If not, then it can introduce BIAS into your results

College athletes vs. non athletes less likely to report drug use
- Your parameter estimates (RESULTS) will not be accurate for college athletes

Non-exercisers may be more likely to overestimate their activity level
- Your parameter estimates (RESULTS) will not be accurate/representative for non-exercisers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Hihg instead of High

A

Dirty, fixable. No doubt what they meant. 100 out of 100 people would say High

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

133 when the highest number is 100

A

Dirty, NOT fixable! 100 is the highest number possible so it is clearly wrong. But it is unclear what this was meant to be.

17
Q

Raw data and what must you do with it

A

Raw data is direct from the source. Before you make any corrections or changes, even if there are ‘errors’
- Raw data must be checked and cleaned

18
Q

Transformed data

A

Sometimes response options didn’t quite work, but you can go back and bin or fix the information so it makes more sense/is more usable for statistical analyses
- Chartreuse, evergreen-> GREEN (details lost, structure gained)
- Green = 1, Blue = 2, Purple = 3 (turns qualitative/unstructured/descriptive data into quantitative data)
- Heart rate in 15 sec? But HR is usually beats per MINUTE, so we multiply by 4 (a constant) -> 15*4=60

19
Q

All stats require

A

Interpretation and a little common sense

20
Q

Population

A

Everyone

21
Q

Parameter

A

What you want to know about a population

22
Q

Sample

A

Some of a population

23
Q

Statistic

A

What you can compute from a sample to estimate a parameter

24
Q

How good a statistic is depends on

A

how good your sample is

25
Q

Accuracy

A

How likely your statistic (estimate) actually reflects your parameter

26
Q

Since you can’t measure everyone, you ________ a _________ by using a _________

A

Estimate a parameter by using a statistic

27
Q

What undermines accuracy?

A

Uncertainty and bias

28
Q

A data point =

A

An exact value + noise + error (we are always trying to minimize noise and error)

29
Q

Noise

A

Data irregularities that have no pattern. Chalk it up to the reality of life. Unavoidable and unpredictable. Always leaves just a little bit of uncertainty.

30
Q

Error

A

Data irregularities that are explainable (but not always avoidable or detectable). There was a problem in what you did or how you did it. Intentional or unintentional, this is a common way that results become biased.

31
Q

Noise and error=

A

Uncertainty and bias

32
Q

Systematic error

A

Your parameter estimate is off because of an ERROR IN YOUR PROTOCOL. Bias everywhere!
Ex: average US fitness by surveying only people from gyms)

33
Q

Measurement error

A

Your parameter estimate is off because of an UNEXPECTED, UNRELATED factor made some data different than the rest. BIAS in some data! Leads to missing data unless you can measure, transform during data cleaning steps
Ex: Today was windy, so running speeds were lower

34
Q

Sampling error

A

Your parameter estimate is off because of A PROBLEM WITH YOUR SAMPLE (includes people that aren’t part of population, or doesn’t include people that are a part of it) BIAS before you even begin!
Ex: Average daily step counts, including people with leg injuries

35
Q

Statistics ___________ error and uncertainty. This helps determine how _________ an estimate is to be correct.

A

Statistics measures error and uncertainty. This helps determine how likely an estimate is to be correct.

36
Q

A data point =

A

An exact value + noise + error

37
Q
A