summarising and displaying data Flashcards

1
Q

—- are scales w underlying defined
unit.
example:
– A count (number of children)
– An accepted unit
* Years
* Metres
* Euros
these scales can be —- or —-

A

numeric scales
continuous or discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

true or false:
-Many things cannot have a defined unit
as :Depression, satisfaction, pain
-We recognise that people can be satisfied, or in pain, to a
greater or lesser extent
-The problem is measuring these concepts without a defined
unit

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

—– Used to measure relative quantity

A

ordinal scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

age measured in years, unit of days are examples of

A

defined units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

– Severity of pain: mild, moderate, severe
–Alcohol consumption: none, low, high
–Quality of life score: 0, 1, 2,….,10
are examples of:

A

ordinal scales ( check slide 12)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Numeric and ordinal scales are labels that tell us —- and the more basic example is — by which —- is the basis of measurement

A

how much
what
classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Labelling schemes that classify people or things or events are —-
examples are:

A

nominal measurement scales
– Disease classification schemes e.g ICD 10 (International Classification
of Diseases)
– Eye color: Blue, green, brown, hazel, gray
– Types of activity: sitting, walking, cycling, swimming, other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

nominal measurement scales tells us— of thing something is and its based on ——

A

what kind
agreed classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Some scales have only two labels these are called —-

A

dichotomous scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

– Eye color: Blue, green, brown, hazel, gray
– Types of activity: sitting, walking, cycling, swimming, other
are examples of

A

nominal measurement as blood groups types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

– Disease status: Presence or absence of disease
– Lab test result: Positive or negative
– Mortality : alive or dead status
– Exam result: Pass or fail
are examples of

A

dichotomous scales - simplest sort

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

types of variables summary:
1- —- variables
– Defined units, tell us how much in an absolute sense
– Can be continuous or discrete
Categorical variables
*—– scales
– Tell us how much, but in a relative rather than absolute sense
*—– scales
– Classify. Tell us what rather than how much
– Called —- scale when only two values

A

numeric
ordinal
nominal
dichotomous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Knowing the measurement scale of data informs us as to how we should —- and — it

A

display and summarise it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Summaries are — than the original because of what they leave out
* So any summary is a —- of the original
things can go wrong by:
1- We present aspects of the data that lead to the wrong conclusion
2- We leave out some important aspect of the data, leading to the reader drawing the wrong conclusion
- In practice, data analysts will examine the data in —– ways to make sure to avoid these pitfalls when reporting on them

A

smaller
simplification
different wats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The most basic summary statistic is a —-

A

frequency as count or percent ( check the graph of stacked histogram ) and we can use a frequency table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

rule of thumbs:
—- for precise information
—- for patterns and understanding

A

numbers
graphs

17
Q

A simple graph displaying
frequencies of categories is —-
– —- is preferable but often they
presented —-

A

bar graph
horizontal
vertical

18
Q

When the data are measured on a
continuous scale but we have
relatively small amounts of data, we
can display the data as —

A

dots aka a dot plot this can be used for heights of women and men from a small study
– For men or women with the
same height, the dots are shown
beside each other

19
Q

With —- amounts of data, we don’t need to rely on the summaries, we can simply show all the data in a plot
* But with —- datasets, the dots become too numerous and we rely more and more on summaries

A

small
larger

20
Q

death in intensive care unit:
Patients had their risk of death calculated using —– scores
* These scores combines — to produce an —- of the—- of death
* The study also looked at length of stay
* These two variables - length of stay and APACHE-II scores
- the dots show will be —–

A

APACHE-II
risk indicators
overal prediction
chance
predicted risk of
death (APACHE-II scores)
( check slide 27 pls , 28)

21
Q

Summarising the risk scores using % cut-offs :
- These summaries don’t show us — the data, but they give us a good idea of —
- they show —-
- and give some idea of how scores – around that

A

all
key marker
middle point/halfway
vary

22
Q

—– is a value representing a cut off of a specified percentage of the data

A

percentiles but also called quantiles ( check graph 29 plsss)

23
Q

—– is the half-way point of the data values.
– Strictly speaking, half of the values lie —— the median
– The — percentile!

A

median
lie at or below
50th
( check slide 31 PLSSSS)

24
Q

—- is the average and it indicates approcimaently where the data is located on the number line.
and its calculated as:

A

mean
“Sum up the individual values then
divide by the number of them”
mean can be misleading tho ( check the bar graph 35 )

25
---- A "tail" of exceptionally long stay times push the mean up ----- a detailed summary of the objectives, methods, results, and conclusions of a full study report and these statistics that maintain their properties even if the underlying distributional assumptions are incorrect.
outliers ( check slide 36 for more info pls). robust summary
26
The mean is sensitive to ---- while median is affected by ----
outliers robust summary measures such as median while outliers have little effect ( the median is a robust statistic because it has a breakdown point of 50%) – Omitting the four highest values moves the median from 3·55 to 3·50 – that's a change of about an hour (which is very small in comparison with the effect on the mean) * This explains the differences we see when we look at medians instead of means
27
ranges gives us an idea of --- which is not always a good idea and it needs 2 pieces of info which are: --- and --- these two values are most likely to be ---- cases or ---- - range is not --- it while be affected by ---
variability biggest and smallest values atypical cases and errors robust outliers (The range of length of stay is 82 days, but it's only 38 days if we ignore the longest-staying patient, and 19 days if we ignore the three longest-staying patients)
28
* A quarter of all patients scored 17 or less, and three quarters scored 66 or less * So the middle 50% of patients scored between 17 and 66 – That's a range of 49 (66 – 17) * This is called the ----- which will be --- to outliers bc they will occur at --- and not ----
interquartile range (abbreviated as IQR) extreme middle
29
----- average of the squared differences from the mean , a measure of how far a set of numbers ----- – No-one apart from professional statisticians understand it fully - Example: fasting blood sugar was checked for 10 employees – The results are : 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10 mmol/l Mean=7.75 Variance = (5.5−7.75)2+(6−7.75)2 + (6.5−7.75)2 …………… (10−7.75)2 9 =2.063
variance spread out
30
- Square root of the variance is ---- – It is in the same units as the --- * --- SD indicates data points tend to be very close to the mean (and to each other) * ---- SD indicates that the data points are very spread out from the mean and from each other. - blood sugar ex:
standard deviation SD original value small large 𝑆𝐷 = square root of this 2.063 = 1.44 check slide 46 37 plssss)
31
box plots are useful for ---- and present --- key summary statistics for each group which are: - shown in a ---- - they display the ---- by building a box around ---- and ---
comparing groups 5 The minimum, 25th percentile, 50th percentile (median), 75th percentile and maximum simple visual display interquartile range 25th and 75th
32
biomedical example : -Mass spectrometry experiments where proteins are ---- in ---- samples from patients - Prior to identifying biomarkers of interest: 1– Boxplots for each sample can be used to identify ---- with sample preparation or with calibration of the mass spectrometer – Based on this, samples may then be ---- or--- – Note the whiskers, extending to min & max
quantified biological problems excluded re-aligned (normalization)
33
----- is a data point which is abnormally distant from the rest of the data - we can modify a --- to show outliers as: – using a ---- that is based on the IQR, we change the length of the whiskers* – Individual points ---- the whiskers are shown as outliers - We can then further investigate the nature of the outliers: – Often they are valid observations: reporting ----- is recommended
outliers box plot detection rule outside robust summary statistics check slide 51 52 53
34
* In the examples for Length of stay in ICU and BMI, there appeared to be an excess of high values – An excess of low or high values is called ---- these may be visualised as ---- * A special case of data without skewness is the ----
skewness dotplots, boxplots and histograms normal distribution
35
true or false: Importance of the normal distribution is bc s that it fits many natural phenomena * Many things we measure are approximately normal – e.g. blood pressure & height * But nothing is truly normal (it is a mathematical concept)
true