Definitions Flashcards
Test statistic
Compares how good the model is (variance explained by model) with how bad it is (variance not explained by the model)
–> signal to noise ratio
—> p-wert errechnung, auf F
independent variable
the proposed cause, the predictor, manipulated variable
dependent variable
the proposed effect, the measured outcome variable
Whats a categorial variable?
divided into distinct categories
*binary: two options
*nominal: 3+ options
*ordinal: options with logical order
Whats a continuous variable?
entities get a distinct score
*interval: equal intervals in variable (no true zero e. g. temperature)
*ratios of scores on scale must make sense (true zero e. g. weight)
standard deviation
measures how spread out /dispersed the data are around the mean
–> standard deviation is the square root of the variance
statistical model
a representation of a real world process designed to predict outcomes and understand relationships between variables
linearity
Linearity means that two variables have a straight-line relationship, where changes in one variable correspond proportionally to changes in the other
additivity
Additivity assumes that the effects of predictors on the outcome combine linearly, with no interactions, so their total effect is the sum of individual effects.
central limit theorem
if the sample is big enough (above 30 data points), the parameter estimations will have a normal distribution
parameters
explain the relationsship between outcome and predictor
–> “b”
p-value
Shows the likelihood of the result if the null hypothesis is true; p < 0.05 means statistically significant.
r (correlation coefficient)
Measures the strength and direction of a linear relationship; ranges from -1 (negative) to +1 (positive).
R² (coefficient of determination)
Indicates the proportion of variance explained by the model; higher values mean better prediction.
Confidence interval (CI)
A range where the true parameter likely falls; a 95% CI includes the true value 95% of the time.
Chi-square statistic
Tests independence or fit in categorical data; larger values show greater deviation from expectations.
F-statistic
f represents the overall significance of the model (comparing variance explained by the model with unexplained variance)
high f means Strong Model Fit
odds ratio
measure of effect size for categorical data
Die Odds Ratio gibt an, wie viel wahrscheinlicher ein Ereignis in einer Gruppe im Vergleich zur Referenzgruppe ist.
(OR = 1: kein Unterschied; OR > 1: höhere Wahrscheinlichkeit; OR < 1: niedrigere Wahrscheinlichkeit).
Für chi :
OR = 1: Kein Unterschied zwischen den Gruppen.
OR > 1: Höhere Wahrscheinlichkeit in der untersuchten Gruppe.
OR < 1: Niedrigere Wahrscheinlichkeit in der untersuchten Gruppe.
standardized residulas
shows how far from the reality the model is (wenn insignifikant also werte zwischen +- 1,96 dann guter model fit)
normality
the sampling distribution of the parameter estimate should be normally distributed
-> symmetrisch und glockenförmig um ihren Mittelwert
z-score
Measures how many standard deviations a value is from the mean.
Vergleichbarkeit herstellen
Ausreißer erkennen
Normalverteilung prüfen
z = (Wert - Mittelwert) / Standardabweichung