Data analysis Flashcards
(25 cards)
What are the types of main variables?
Numerical and Categorical
What are the types of numerical variables?
Continuous and discrete
What are continuous variables?
Real numbers, e.g. height in metres
What are discrete variables?
Integer numbers, e.g. number of inhabitants in a town
What are the types of categorical variables?
Nominal and ordinal
What are nominal variables?
Unordered categories, e.g. colours
What are ordinal variables?
Ordered categories, e.g. clothes sizes
What are descriptive statistics?
Analysis of data that helps describe, show or summarise data
What is wanted to summarise?
The central tendency of the data
The variability of the data
What is central tendency?
A value that is used to describe the centre of the data, e.g. the mean, median or mode
What is the range?
The difference between the highest and lowest data point, often reported as the highest minus the lowest
What is the interquartile range (IQR)?
Represents from 25th to 75th percentile
Contains approximately one half of the observations
What is standard deviation?
Conveys how widely or tightly the data is distributed from the centre
What do different standard deviations (SD) show?
Low - Data points are close to the mean
High - Data points are spread out
What does SD show with normally distributed data?
68% is within 1 SD either side of the mean
95.5% falls within 2 standard deviations either side of the mean
What does data analysis show?
Shows obvious data patterns
Shows potential problems (outliers, correlations of independent values)
Helps to find problems (typos, mismatch of units)
What is the use of plotting data?
Looking for trends/patterns
Checking distribution
Whether data conforms to assumptions of a test
How does a boxplot work?
The box has 50% of all the data
The bottom is the 1st quartile and the top is the 3rd quartile, with in between being the IQR
The solid line indicates the median, whilst the dashed line is the mean
The t-shaped whiskers are the highest and lowest point within 1.5x the IQR
Anything further is an outlier
What is standard error?
Standard deviation/ sqR sample size
What do scatter plots show?
X-axis - Numerical
Y-axis - Numerical
What do scatter plots and box plots show?
X-axis - Categorical
Y-axis - Numerical
What does a bar chart show?
X-axis - Categorical
Y-axis - Frequency
What does a histogram show?
X-axis - Numerical
Y-axis - Frequency
What does a contingency table show?
X-axis - Categorical
Y-axis - Categorical