begriffe Flashcards

(56 cards)

1
Q

validity

A

extent to which a concept/measurement is well-foundend and likely corresponds accurately to the real world

internal vs external validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

internal validity

A

the obtained effect of x on y for your sample is the correct effect for the sample

-> generalization of causal findings to all cases WITHIN the sample

how to obtain:
-empirical model is correctly specified, estimators are unbiased
-> changes in the dependent variable are attributed to the independent variable (and no other factors ->challenge to eliminate that chance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

external validity

A

obtained effect of x on y in the sample is the correct effeft of x on y in the population P

-> generalization of causal findings to other cases not included in the sample -> the overall population

how to obtain:
-enough cases
-sample represents the population in all relevant characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why is validity important

A

-theory and findings need to show a causal effect for the research to be relevant
-stakeholders need to know whether it also holds for other cases
-in practice: experiments usually of low external but high internal validity or neither perfect internal nor external validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

validity vs reliablility

A

reliability is the degree of precision with wich a specific aspect is measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

advantages of scientific observation

A

systematic approach of observing and generating information
-objectivity as oppose to selective set ob observations
-avoidance of “filling in” information
-verifiability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

population

A

all observational unit to whom the theory is assumed to apply

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

sample

A

a subset of the theoretically-defined population for which data is assessed
for reasons of validity, we want this subset to be representative of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

descriptive statistics

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

inferential statistics

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is data

A

quantified information
information for one single case: date point

manifest variables: directly observable variables (zb body height)
latent variables: abstract concepts only observable through manifest indicators (zb democracy etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data types by source

A

source:
observable world -> observational data
field or lab experiment -> experimental data
an algorithm -> simulated data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

data processing

A

-> to eliminate sources of error

processing includes:
-reduction of measurement error
-addressing of inter-coder reliablitity
-elimination of missing data points
-identification of outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to measure data

A

measurements require
-measurement scale
-measurement unit
-measurement instrument

also includes
-counts
-quantifiactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

types of variables

A

can be descriped by three elements: instrument, measurement unit, scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

variables by scale
categorical variables

A

how are observations arranged?

nominal variables
-numerical values are used as a label or type of attributes
-no intrinsic order between categories
-zb gender, party affiliation: spö=1, övp=2

ordinal variables:
-variables of two ore more catagories which can be ranked
-value and gap is not interpretable
zb smart (no twice as smart)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

variables by scale
metric variables

A

interval variables
-variables have a zero value (usually without a clear meaning)
-distance between attributes has the same meaning

ratio variables
-zero means thet there is nothing of this variable left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

verwende datenset thedata.dta

A

use thedata.dta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

delete all variables and data

A

clear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

zusammenfassen eines datensets

A

describe, short
describe, simple

summarize
sum, detail

tabulate

list

codebook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

excel datenset importieren

A

import excel “…”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

var1 und var2 entfernen

A

drop var1 var2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

alle außer var3 und var4 entfernen

A

keep var 3 var4

24
Q

distribution table of a variable

A

tabulate …

missing values are not depicted, only if:
tabulate .., missing

25
create a variable
generate
26
give variable another name
rename
27
change tha value of a variable for another value
replace
28
add a description to a variable
label variable
29
add a label to a variable value
2 steps needed: label define label values
30
change order of variables in dataset
order
31
measuring unobservables
conceptualization and operationalization are needed -> theoretical definitions ->clarify how concept is measured by sepcifiying indicators and how informaiton is gathered -> systematized good operationalizaiton is linked to your theory zb concept: study success -> attributes : academic achievement // acquired abilites -> components: received prives, amount of prize money // ability to solve problems etc
32
issues of conceptualization
problem of conflation -sub components should be conceptually in line with attributes at the corresponding upper level -> sub-components should not relate to conceptuall different attributes problem of redundancy: -components at the same level should be mutually exclusive
33
minimalist definition of attributes
+ availability of data may be enhanced + no redundancy with other attributes -every case is an instance, no variation -meausre might not reflect the concept well (invalidity) -measure may only be applicable for one situation
34
maximalist definition of attributes
= including too many (irrelevant) attributes potential drawacks of overburdening: -lower usefulness as concept has no empirical referents -tautological and of little analytical use if main dependet variable is already included as an attribute
35
Median
50% -> Wert der Mitte, value located directly in the center of collected data herausfinden: sum var1, detail -> wert bei 50%
36
not normally distributed data
sum var1, detail skewness: positive value indivates that a variable is skweded to the right (outliers) -> if highly skewed to the right, median might be more representative that the mean, because mean is affected by outliers
37
boxplot interpretieren
well suited for ordinal and metric data whisker from minimum value of the sample -> lower quartile (one quarter of the sample lies here) then box with median in the middle whisker showing the upper quartile whiskers are without potential outliers
38
Modus
Modalwert = most common value of a variable
39
mean
arithemtishces mittel average value of a variable
40
bivariate descriptive statistics
shows the relationship between two variables options: -crosstables -comparioson of key measures zb mean -graphical comparison -correlational measures
41
correlation vs causality
correlation: var A and var B are correlated if higher/lower values of variable A coincide with higher/lower values in variable B -> you dont know whter varA influences var B or vice versa negative correlation: if values of var A are lower, values of var B are lower as well positive correlation: if values of var A are higher, values of var B are higher as well causality: direct relationship between var A and var B -> a change in var A leads to a change in var B --> more difficult to determine, needs research design
42
how can data be visualized
amounts zb bar charts, dots, grouped bars distributions zb histogram, boxplots proportions zb pie chart, bars x-y relationships zb scatterplot uncertainty zb error bars, geospatial data map
43
sort bars in stata descending
graph hbar, over (var1, sort (1) descending)
44
adjust bandwith of histogram
hist var1, width (5) vs hist war, width (10) -> more values in one bar of the histogram
45
increase x and y title size
xtitle (, size (large)) ytitle (, size large))
46
commands for tables
tabulate fre
47
color schemes
...., scheme (schemename) assign individual colors bar(1, color ("black"))
48
different types of inferences
descriptive interference: -historical accurafy of scientific information -simply observing sample data statistical inferences -use sample properties to infer properties of a populations -unterstand development ofer time or relationship between variables -focus on understand how uncertain findings are -> t-tests causal inferences: -infer the existence of a causal effect from data analysis
49
hypothesis testing in stata
goal: infer from the sample to the population problem: population is usually unknown and only one (not infinetly many) samples are available need: an estimation of the uncertainty resulting from the use of random sampling solution: mean/standard deviation or proportion value in the sample as an estimate
50
stratum
a subset of elements from the population that share a characteristic (usually sociodemographic zb age, gender)
51
sampling frame
a list of elements in a population that can be identified
52
convenience sample
-use of information from participants who are convenient to access -sampling method does not need to select participants based on any set of criteria -only use this method if representativeness is not of importance for research
53
quota sample
is primarily used when information is to be collected on a specific, definable target opoulation -if it worked well, quota sample privides a structurally identical representation of the population -volunteers could still bias the picture
54
stratified sample
stratified sampling involves random selection within predefined groups (e.g. gender, age) -> people within a stratum are randomly selected -strata is supposed to ensure that the make-up of the population is adequately mirrored
55
simple random sample
selection process takes place randomly -each participant has a chance of being selected
56
survey weights
-when sample deviates from the actual population -suvery weights are estimated variables -> even out the differences between sample and population