data analysis Flashcards
when is linear regression used?
to improve correlation when measuring associations between continuous exposures and outcomes
how can you get a more representative sample?
more data
what does statistics allow for?
it allows us to take all data in and summarise it in a way that is understandable and useful
what are the two main properties of data that we want to capture through statistics?
where quantitative data sits in numerical space and what categorical data is more or less common, what the values look like and understand the relationship
what does the analysis done depend on?
how is the data recorded and how is the data distributed and the research question - does it answer what it is meant to
how is categorical data usually recorded?
as text or labels
what is ordinal data?
when it is ordered or ranked
how can you present categorical data?
counts, percentages, tables and graphs
what alters how you present data?
who you are presenting the data to
in what order does STATA follow commands?
command name, then argument for command and then further options after comma
what are arguments?
they are variables to determine how the command is run i.e. bar
when should you add graphics to the bar chart?
only if they provide more information and help to understand the information already given
what are the methods for testing relationships?
logistic regression and T tests and chi squared - this is where we have one categorical and one continuous variable
what is numerical data?
it is when the data is data is in numbers - can count or measure the values
what is discrete?
when the numerical data is whole numbers
how can you summarise the size of numerical values?
mean and median
how can you summarise the spread of numerical values?
variance, SD and IQR
how can we report some sort of extreme in numerical values?
modal value, the minimum and the maximum
what do you need to consider when analysis numerical data?
the specific reason for comparing groups or populations
what can you get from the simple plot graph?
the range and the mode and understand how the data fits together
what do histograms show?
how common the values are relative to each other - where the typical or most common values fall
what would you use in normal distribution?
it is symmetrical so mean and SD
how do you calculate SD?
you find each value and subtract the mean and then square each result. Add them altogether and divide by one less than the total number of values and take square root
what is the mean?
the sum of all values/total number of values