stata Flashcards by Joséphine De Ch

how can you have detailed information about a variable

by writing codebook [variable]

How well did you know this?

Not at all

Perfectly

when you type codebook, what word is the Increment between values; (often 1.000)

range

How well did you know this?

Not at all

Perfectly

what should we look at after a codebook?

-what type of value (cat or num)
-unique values: Number of unique, non-missing values
-missing .: Number and percent of missing values (noted as .)

How well did you know this?

Not at all

Perfectly

how to do a boxplot; a bar chart; a line chart; histogram; density plot; frequency tables

write:
1. graph box [variable] (if you want a condition add: if…
-> For interval variables
2. graph bar, over ([variable])
-> For nominal, ordinal and interval variable
3. line chart: ONLY for numeric and continuous varaibles
4. histogram [variable]
-> for interval variables
5. kdensity [variable]
-> for interval variables
6. tab (or tabulate) [variable]
-> for categorical variable

How well did you know this?

Not at all

Perfectly

how to compare distribution of 2 variables

cross tabulation: tab DV IV, col
-> add ,col if you want %

NOT tabulation bcs tabulation is a frequency table with only 1 variable (with 2 categories)

How well did you know this?

Not at all

Perfectly

difference bwtn sum and codebook

*sum V1: gives you Obs, Mean, Std. dev. Min and Max
-> for continuous or numeric variables
*codebook V1: more detailled description

How well did you know this?

Not at all

Perfectly

how to see skewness on a graph

PAR RAPPORT to the MEAN

Symmetrical (skew = 0) → balanced on both sides

Positive skew (right-skewed) → tail is longer on the right (more spread values)

Negative skew (left-skewed) → tail is longer on the left

How well did you know this?

Not at all

Perfectly

how to make a cross-tabulation

write: tab [DV] [IV]
-> so DV in the row and IV in the column
only when Y is categorical

How well did you know this?

Not at all

Perfectly

how to make a comparison table

write: tabulate [IV], summarize([DV])

How well did you know this?

Not at all

Perfectly

visualize cross-tabulation relationships, we often use bar charts. The general syntax is:

graph bar [DV], over([IV])

How well did you know this?

Not at all

Perfectly

to visualize mean comparisons, we often use box plots. the general syntax is:

graph box [DV], over([IV])

How well did you know this?

Not at all

Perfectly

explain how to classify infos after having done codebook

*sort:
-Organizes your dataset by the values of one or more variables
-sort [variable]
-ex: age: from youngest to oldest SO if write sort gender age: Sorts first by gender, then by age within each gender

*list:
-Displays selected variables and observations in the Results window.
-list [variable1] [variable2]

How well did you know this?

Not at all

Perfectly

what are X, Y and Z

X: IV(s)
Y: DV
Z: CV(s) controlled variable

How well did you know this?

Not at all

Perfectly

how to recode when =/ categories, useful to say yes/no
how to recode missing values into true variables?

yes=1 and no=0
recode [variable] (1/3=0 “not a large threat”) (4=1 “large threat”) (miss=.), generate [variable_new]
…(miss=.)
don’t forget the coma after and generate (new name so that you don’t modify the true variable)

How well did you know this?

Not at all

Perfectly

how to have the CI

proportion [variable]
attention: read in line the 2 bornes

How well did you know this?

Not at all

Perfectly

what does nofreq nokey means

Study These Flashcards

have a better look at the cross-tab
get ride of the frequency in a table (only categories and percentage)

how to make a mean comparison?

Study These Flashcards

tab independent_variable, sum(dependent_variable)

how to investigate the relation bt X and Y over (i.e. depending on) another variable (i.e. another X)

Study These Flashcards

bysort educ_high: tab threat_from_china partyid3, column, nofreqnokey
* sort the data by educ_high (divide ppl into for ex low and high education)
*tab threat_from_china partyid3 (creates a two-way table (cross-tabulation) of threat_from_china (rows) by partyid3 (columns))

–> creates multiple cross-tabulations of threat_from_china by partyid3, but separately for each value of the variable educ_high

what graph do we use for cross tab

Study These Flashcards

graph bar

chi-square test, only for xxx variables; so what do we do if we don’t have this kind of variable?

Study These Flashcards

categorical
-> if we have numerical variable, we create 2 categories thanks to the mean (ex: importance of imigration: 3% no, 27% a bit, 30% yes, 30% high) so we need to find the mean to create 2 categories: Yes or No

how to find the mean of a variable (useful for chi-square a numerical variable)

Study These Flashcards

sum [variable] or mean [variable]

how to create a binary variable based on the mean?

Study These Flashcards

ex (with mean=11.8, variable= immigration_salient):
gen immigration_salient = 1 if general_issue > 11.8
replace immigration_salient = 0 if general_issue < 11.8

how to add labels to the values that we just created (by creating a binary variable)

Study These Flashcards

ex (with salient_sbl= new name of the variable)
label define salient_sbl 0 “Not important” 1 “Important”

how to run the chi-square test

Study These Flashcards

look at the distribution of frequency to calculate expected frequencies*
-> tab [variable1] [variable2], chi2
we calculate difference between observed and expected frequencies to calculate chi-square and get some additional information (see the “key” box)
-> tab [variable] [variable], col expected cchi2 chi
see pr=p and if<0.05

how to do the chi-square if we have numerical variables

1. mean general_issue 2. gen immigration_salient = 1 if general_issue > 11.8 replace immigration_salient = 0 if general_issue < 11.8 3. tab right_ex immigration_salient, chi2

how to run ANOVA test

1. ex: we want to know whether there are significant differences in terms of attitudes toward immigration in different countries depending on their level/quality of democracy. -> oneway general_issue polity2 2. we check if the conditions for ANOVA are here (make sure your data are collected independently — for example, different individuals in each group, no repeated measures): -> oneway general_issue polity2, tab 3. Do the ANOVA test -> anova general_issue polity2 4. look at R-square and F: R² = 0 → The grouping variable explains none of the variability in the outcome. R² = 1 → The grouping variable explains all the variability. Higher R² → Better explanatory power of the independent variable(s) 5. graph boxplot -> graph box general_issue, over(polity2)

correlation test

*pwcorr [DV] [IV] *to get the pearson correlation coefficient: correlate [var1] [var2] *to get the significance (p-value): pwcorr [var1] [var2], sig -> it is the values 0,... or -0, ... *scatter plot: scatter [DV] [IV]

bivariate regression and scatter plot with regression line

*reg [Y] [X] * Scatterplot with regression line: graph twoway (lfitci general_issue GTD_total) (scatter general_issue GTD_total) -> lfitci adds linear fit + confidence interval (95%)

regression

reg [Y:DV] [X:IV] *to get the coefficient (b) (For every one-unit increase in X, Y changes by b units): first column, written coef just before sd *for the p-value -> on the side you can see Prob> F *to get the R² -> on the side you can see R-squared

multivariate OLS

reg [DV] [X1 X2...]

logit model

*rescale Y into a dummy *logit [Y] [X1 X2 X3...]

stata Flashcards

(31 cards)