stata Flashcards
(31 cards)
how can you have detailed information about a variable
by writing codebook [variable]
when you type codebook, what word is the Increment between values; (often 1.000)
range
what should we look at after a codebook?
-what type of value (cat or num)
-unique values: Number of unique, non-missing values
-missing .: Number and percent of missing values (noted as .)
how to do a boxplot; a bar chart; a line chart; histogram; density plot; frequency tables
write:
1. graph box [variable] (if you want a condition add: if…
-> For interval variables
2. graph bar, over ([variable])
-> For nominal, ordinal and interval variable
3. line chart: ONLY for numeric and continuous varaibles
4. histogram [variable]
-> for interval variables
5. kdensity [variable]
-> for interval variables
6. tab (or tabulate) [variable]
-> for categorical variable
how to compare distribution of 2 variables
cross tabulation: tab DV IV, col
-> add ,col if you want %
NOT tabulation bcs tabulation is a frequency table with only 1 variable (with 2 categories)
difference bwtn sum and codebook
*sum V1: gives you Obs, Mean, Std. dev. Min and Max
-> for continuous or numeric variables
*codebook V1: more detailled description
how to see skewness on a graph
PAR RAPPORT to the MEAN
Symmetrical (skew = 0) → balanced on both sides
Positive skew (right-skewed) → tail is longer on the right (more spread values)
Negative skew (left-skewed) → tail is longer on the left
how to make a cross-tabulation
write: tab [DV] [IV]
-> so DV in the row and IV in the column
only when Y is categorical
how to make a comparison table
write: tabulate [IV], summarize([DV])
visualize cross-tabulation relationships, we often use bar charts. The general syntax is:
graph bar [DV], over([IV])
to visualize mean comparisons, we often use box plots. the general syntax is:
graph box [DV], over([IV])
explain how to classify infos after having done codebook
*sort:
-Organizes your dataset by the values of one or more variables
-sort [variable]
-ex: age: from youngest to oldest SO if write sort gender age: Sorts first by gender, then by age within each gender
*list:
-Displays selected variables and observations in the Results window.
-list [variable1] [variable2]
what are X, Y and Z
X: IV(s)
Y: DV
Z: CV(s) controlled variable
how to recode when =/ categories, useful to say yes/no
how to recode missing values into true variables?
yes=1 and no=0
recode [variable] (1/3=0 “not a large threat”) (4=1 “large threat”) (miss=.), generate [variable_new]
…(miss=.)
don’t forget the coma after and generate (new name so that you don’t modify the true variable)
how to have the CI
proportion [variable]
attention: read in line the 2 bornes
what does nofreq nokey means
have a better look at the cross-tab
get ride of the frequency in a table (only categories and percentage)
how to make a mean comparison?
tab independent_variable, sum(dependent_variable)
how to investigate the relation bt X and Y over (i.e. depending on) another variable (i.e. another X)
bysort educ_high: tab threat_from_china partyid3, column, nofreqnokey
* sort the data by educ_high (divide ppl into for ex low and high education)
*tab threat_from_china partyid3 (creates a two-way table (cross-tabulation) of threat_from_china (rows) by partyid3 (columns))
–> creates multiple cross-tabulations of threat_from_china by partyid3, but separately for each value of the variable educ_high
what graph do we use for cross tab
graph bar
chi-square test, only for xxx variables; so what do we do if we don’t have this kind of variable?
categorical
-> if we have numerical variable, we create 2 categories thanks to the mean (ex: importance of imigration: 3% no, 27% a bit, 30% yes, 30% high) so we need to find the mean to create 2 categories: Yes or No
how to find the mean of a variable (useful for chi-square a numerical variable)
sum [variable] or mean [variable]
how to create a binary variable based on the mean?
ex (with mean=11.8, variable= immigration_salient):
gen immigration_salient = 1 if general_issue > 11.8
replace immigration_salient = 0 if general_issue < 11.8
how to add labels to the values that we just created (by creating a binary variable)
ex (with salient_sbl= new name of the variable)
label define salient_sbl 0 “Not important” 1 “Important”
how to run the chi-square test
- look at the distribution of frequency to calculate expected frequencies*
-> tab [variable1] [variable2], chi2 - we calculate difference between observed and expected frequencies to calculate chi-square and get some additional information (see the “key” box)
-> tab [variable] [variable], col expected cchi2 chi - see pr=p and if<0.05