{ "@context": "https://schema.org", "@type": "Organization", "name": "Brainscape", "url": "https://www.brainscape.com/", "logo": "https://www.brainscape.com/pks/images/cms/public-views/shared/Brainscape-logo-c4e172b280b4616f7fda.svg", "sameAs": [ "https://www.facebook.com/Brainscape", "https://x.com/brainscape", "https://www.linkedin.com/company/brainscape", "https://www.instagram.com/brainscape/", "https://www.tiktok.com/@brainscapeu", "https://www.pinterest.com/brainscape/", "https://www.youtube.com/@BrainscapeNY" ], "contactPoint": { "@type": "ContactPoint", "telephone": "(929) 334-4005", "contactType": "customer service", "availableLanguage": ["English"] }, "founder": { "@type": "Person", "name": "Andrew Cohen" }, "description": "Brainscape’s spaced repetition system is proven to DOUBLE learning results! Find, make, and study flashcards online or in our mobile app. Serious learners only.", "address": { "@type": "PostalAddress", "streetAddress": "159 W 25th St, Ste 517", "addressLocality": "New York", "addressRegion": "NY", "postalCode": "10001", "addressCountry": "USA" } }

statistics and data analysis Flashcards

(39 cards)

1
Q

what is a nominal variable

A

data in the form of labels or names
but without a natural order or ranking
e.g. eye colour
it is a type of categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is an ordinal variable
give an example

A

a named variable
which has a natural order e.g. ‘level of satisfaction’ or ‘degree of pain’ or ‘socioeconomic status’ or ‘workplace status’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is an interval variable
give an example

A

named variable
with a natural order
AND equal intervals between variables
e.g. temperature, credit scores, SAT scores

but NOT a ‘true zero’ value (that’s a ‘ratio’ variable)
for temperature, you can have negative degrees celcius, so there’s no zero that the value cannot go beneath
you cannot have a credit score of negative, or of zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is an ratio variable
give an example

A

named variable
with a natural order
AND equal intervals between variables
AND a ‘true zero’ value
e.g. height, weight, length

e.g. your height could technically be zero but NOT negative
equally with weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to calculate standard error of the mean

A

SD / square root of sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how to calculate BMI

A

kg / (m^2)
weight / height in m / height in m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

obesity class 3 is

A

40+

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

obesity class 2 is

A

35 - 39.9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

obesity class 1 is

A

30 - 34.9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

overweight is BMI

A

25 - 29.9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Normal BMI is

A

18.5 - 24.9

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

underweight BMI is

A

<18.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how to calculate odds ratio

A

Divide the probability that the event will occur by the probability that it will not occur
In other words, it’s a ratio of successes (or wins) to losses (or failures)

Odds ratio = p / (1 - p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to calculate standard deviation

A

square root of the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does a low p-value mean

A

there is strong evidence to reject the null hypothesis
i.e. it’s likely that your results are due to a true effect and not just due to chance

so a low p-value is usually exciting for your experiement :)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is a t-test

A

assesses whether the means of two groups are statistically different from each other

17
Q

if you are comparing 2 different diets (for example), which test would you use?
if you are comparing 3 diets, which test would you use?

A

2 - two sample t-test
3 - ANOVA

18
Q

What does a p value of 0.05 mean?

A

The p value is a proportion: if your p value is 0.05, that means that 5% of the time you would see a test statistic at least as extreme as the one you found if the null hypothesis was true.

19
Q

what is a Forest plot?

A

the way you represent data in a meta-analysis

20
Q

Forest plots vertical line is…
what value is the line at?

A

the line of no effect
if a RR or OR, the line is at 1.0
for AR or SEM, the line is at 0

i.e. no difference between 2 interventions
or no association between an exposure or an outcome

21
Q

Forest plot horizontal line at each point is…

A

the confidence interval for that study in the meta-analysis
(a Wider confidence interval is Worse.
Or a Longer line is Less reliable data)

22
Q

on a Forest plot, a bigger box means…

A

a larger sample size
as these are more weighted, the confidence interval tends to be narrower too.

23
Q

the diamond-shaped box at the bottom of the Forest plot is…
the lateral tips of the diamond represent…

A

the weighted average of all the studies

the lateral tips of the diamond are the confidence interval of the weighted average of all the studies

24
Q

If the diamond at the bottom of a Forest plot touches the vertical line, that means…

A

the results of the meta-analysis show the thing being studies is not statistically significant

25
If the diamond at the bottom of a Forest plot is to the left/right of the no effect line, what does it mean for each side (left vs right)?
LEFT of the line - there are fewer episodes of interest in the treatment group RIGHT of the line - there are MORE episodes of interest in the treatment group
26
when do you used a chi squared test? What is the most commonly used chi2 test?
Pearson's chi2 test. Testing a hypothesis regarding a relationship between 2 CATEGORICAL variables with no rankings
27
what is a t-test
To formally test whether or not there is a statistically significant difference between two POPULATION MEANS. e.g. We want to know if diet A or diet B leads to greater weight loss
28
what does a likelihood ratio actually mean?
assess the utility of a diagnostic test. And how likely it is that a patient has a condition. LR+ is about positive test results LR- is about negative test results You can use LR to calculate post-test odds, meaning how suspicious you are that a person has a disease based on the result.
29
post-test odds =
post-test odds = pre-test odds x LR
30
difference between odds and probability in an example where n=10, diseased = 4, and non-diseased = 6
p(disease) = FRACTION of the total = 40% odds of disease = RATIO of sthg happening vs not happening = 4/6 = 67%
31
as a rule of thumb, according to a video I watched, what was a strong LR+
>5
32
difference between paired and unpaired t-test
Paired t-test: Used to compare the means of two samples when each individual in one sample also appears in the other sample. Unpaired t-test: Used to compare the means of two samples when each individual in one sample is independent of every individual in the other sample.
33
define 'incidence'
The incidence can be thought of as the number of new cases occurring in a given time. The incidence equals the number of newly affected individuals divided by the number of people at risk for the disease for a given duration
34
how to calculate prevalence
Prevalence equals the total number of cases divided by the total number of at risk (differs from the incidence in that incidence is NEWLY affected individuals and over a certain period of time, e.g. a year)
35
Suppose we have two choices of shirt to wear at a party then the degree of freedom is... Now suppose we have to again go to the party and we can not repeat the shirt then the choice of shirt we are left with is One then in this case the degree of freedom is...
1 = there are 2 choices, so there's 1 degree of freedom for the second party, dF = 0, as we do not have any choice on which shirt to choose. in a chi2 test, dF = (number of columns -1) x (number of rows -1)
36
A study has been designed to investigate whether a certain drug plus physiotherapy treatment is better than drug treatment alone in the management of rheumatoid arthritis. After randomising the patients a small proportion of the drug plus physiotherapy group decide to drop out of the study or omit some treatment sessions specified in the research protocol. What is the correct way of analysing the subsequent data? A. Assume the patients have withdrawn their consent B. Exclude these patients from all analysis C. Extend the trial recruitment to make up the numbers D. Include these patient outcomes in the drug plus physiotherapy group E. Interview the patients and report their group separately
D is correct - include them in the data This is the principle of 'intention to treat'. Intention to treat helps to reduce bias by sticking to the original allocation of treatment and analysing the patient in that treatment group even (and concentrate for this bit), even if they don't get the treatment! e.g. it is possible that the physiotherapy intervention was harmful to the patients and this is why they left.
37
false negative calculation
Out of everyone who has the disease, how many tested negative so it's almost like calculating specificity, but instead of true positives on the top you do false negatives that's hard to remember, though, as because it's talking about the TEST result, I thought I needed to put a denominator of everyone who tested negative (but no, that's for negative PREDICTIVE value)
38
is a type 1 error a false positive or a false negative
false positive = type 1 false negative = type 2
39
formula for calculating first quartile
Q1 = (n+1)/4 - th value so if n = 11, then the data point for Q1 is the third number (when put in ascending order) Q2 = (n+1)/2 - th value Q3 = 3(n+1)/4 - th value