stata Flashcards

(31 cards)

1
Q

how can you have detailed information about a variable

A

by writing codebook [variable]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

when you type codebook, what word is the Increment between values; (often 1.000)

A

range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what should we look at after a codebook?

A

-what type of value (cat or num)
-unique values: Number of unique, non-missing values
-missing .: Number and percent of missing values (noted as .)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how to do a boxplot; a bar chart; a line chart; histogram; density plot; frequency tables

A

write:
1. graph box [variable] (if you want a condition add: if…
-> For interval variables
2. graph bar, over ([variable])
-> For nominal, ordinal and interval variable
3. line chart: ONLY for numeric and continuous varaibles
4. histogram [variable]
-> for interval variables
5. kdensity [variable]
-> for interval variables
6. tab (or tabulate) [variable]
-> for categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to compare distribution of 2 variables

A

cross tabulation: tab DV IV, col
-> add ,col if you want %

NOT tabulation bcs tabulation is a frequency table with only 1 variable (with 2 categories)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

difference bwtn sum and codebook

A

*sum V1: gives you Obs, Mean, Std. dev. Min and Max
-> for continuous or numeric variables
*codebook V1: more detailled description

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how to see skewness on a graph

A

PAR RAPPORT to the MEAN

Symmetrical (skew = 0) → balanced on both sides

Positive skew (right-skewed) → tail is longer on the right (more spread values)

Negative skew (left-skewed) → tail is longer on the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to make a cross-tabulation

A

write: tab [DV] [IV]
-> so DV in the row and IV in the column
only when Y is categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how to make a comparison table

A

write: tabulate [IV], summarize([DV])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

visualize cross-tabulation relationships, we often use bar charts. The general syntax is:

A

graph bar [DV], over([IV])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

to visualize mean comparisons, we often use box plots. the general syntax is:

A

graph box [DV], over([IV])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

explain how to classify infos after having done codebook

A

*sort:
-Organizes your dataset by the values of one or more variables
-sort [variable]
-ex: age: from youngest to oldest SO if write sort gender age: Sorts first by gender, then by age within each gender

*list:
-Displays selected variables and observations in the Results window.
-list [variable1] [variable2]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are X, Y and Z

A

X: IV(s)
Y: DV
Z: CV(s) controlled variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to recode when =/ categories, useful to say yes/no
how to recode missing values into true variables?

A

yes=1 and no=0
recode [variable] (1/3=0 “not a large threat”) (4=1 “large threat”) (miss=.), generate [variable_new]
…(miss=.)
don’t forget the coma after and generate (new name so that you don’t modify the true variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how to have the CI

A

proportion [variable]
attention: read in line the 2 bornes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does nofreq nokey means

A

have a better look at the cross-tab
get ride of the frequency in a table (only categories and percentage)

17
Q

how to make a mean comparison?

A

tab independent_variable, sum(dependent_variable)

18
Q

how to investigate the relation bt X and Y over (i.e. depending on) another variable (i.e. another X)

A

bysort educ_high: tab threat_from_china partyid3, column, nofreqnokey
* sort the data by educ_high (divide ppl into for ex low and high education)
*tab threat_from_china partyid3 (creates a two-way table (cross-tabulation) of threat_from_china (rows) by partyid3 (columns))

–> creates multiple cross-tabulations of threat_from_china by partyid3, but separately for each value of the variable educ_high

19
Q

what graph do we use for cross tab

20
Q

chi-square test, only for xxx variables; so what do we do if we don’t have this kind of variable?

A

categorical
-> if we have numerical variable, we create 2 categories thanks to the mean (ex: importance of imigration: 3% no, 27% a bit, 30% yes, 30% high) so we need to find the mean to create 2 categories: Yes or No

21
Q

how to find the mean of a variable (useful for chi-square a numerical variable)

A

sum [variable] or mean [variable]

22
Q

how to create a binary variable based on the mean?

A

ex (with mean=11.8, variable= immigration_salient):
gen immigration_salient = 1 if general_issue > 11.8
replace immigration_salient = 0 if general_issue < 11.8

23
Q

how to add labels to the values that we just created (by creating a binary variable)

A

ex (with salient_sbl= new name of the variable)
label define salient_sbl 0 “Not important” 1 “Important”

24
Q

how to run the chi-square test

A
  1. look at the distribution of frequency to calculate expected frequencies*
    -> tab [variable1] [variable2], chi2
  2. we calculate difference between observed and expected frequencies to calculate chi-square and get some additional information (see the “key” box)
    -> tab [variable] [variable], col expected cchi2 chi
  3. see pr=p and if<0.05
25
how to do the chi-square if we have numerical variables
1. mean general_issue 2. gen immigration_salient = 1 if general_issue > 11.8 replace immigration_salient = 0 if general_issue < 11.8 3. tab right_ex immigration_salient, chi2
26
how to run ANOVA test
1. ex: we want to know whether there are significant differences in terms of attitudes toward immigration in different countries depending on their level/quality of democracy. -> oneway general_issue polity2 2. we check if the conditions for ANOVA are here (make sure your data are collected independently — for example, different individuals in each group, no repeated measures): -> oneway general_issue polity2, tab 3. Do the ANOVA test -> anova general_issue polity2 4. look at R-square and F: R² = 0 → The grouping variable explains none of the variability in the outcome. R² = 1 → The grouping variable explains all the variability. Higher R² → Better explanatory power of the independent variable(s) 5. graph boxplot -> graph box general_issue, over(polity2)
27
correlation test
*pwcorr [DV] [IV] *to get the pearson correlation coefficient: correlate [var1] [var2] *to get the significance (p-value): pwcorr [var1] [var2], sig -> it is the values 0,... or -0, ... *scatter plot: scatter [DV] [IV]
28
bivariate regression and scatter plot with regression line
*reg [Y] [X] * Scatterplot with regression line: graph twoway (lfitci general_issue GTD_total) (scatter general_issue GTD_total) -> lfitci adds linear fit + confidence interval (95%)
29
regression
reg [Y:DV] [X:IV] *to get the coefficient (b) (For every one-unit increase in X, Y changes by b units): first column, written coef just before sd *for the p-value -> on the side you can see Prob> F *to get the R² -> on the side you can see R-squared
30
multivariate OLS
reg [DV] [X1 X2...]
31
logit model
*rescale Y into a dummy *logit [Y] [X1 X2 X3...]