summarising and displaying data Flashcards by jood kh

—- are scales w underlying defined
unit.
example:
– A count (number of children)
– An accepted unit
* Years
* Metres
* Euros
these scales can be —- or —-

numeric scales
continuous or discrete

How well did you know this?

Not at all

Perfectly

true or false:
-Many things cannot have a defined unit
as :Depression, satisfaction, pain
-We recognise that people can be satisfied, or in pain, to a
greater or lesser extent
-The problem is measuring these concepts without a defined
unit

true

How well did you know this?

Not at all

Perfectly

—– Used to measure relative quantity

ordinal scales

How well did you know this?

Not at all

Perfectly

age measured in years, unit of days are examples of

defined units

How well did you know this?

Not at all

Perfectly

– Severity of pain: mild, moderate, severe
–Alcohol consumption: none, low, high
–Quality of life score: 0, 1, 2,….,10
are examples of:

ordinal scales ( check slide 12)

How well did you know this?

Not at all

Perfectly

Numeric and ordinal scales are labels that tell us —- and the more basic example is — by which —- is the basis of measurement

how much
what
classification

How well did you know this?

Not at all

Perfectly

Labelling schemes that classify people or things or events are —-
examples are:

nominal measurement scales
– Disease classification schemes e.g ICD 10 (International Classification
of Diseases)
– Eye color: Blue, green, brown, hazel, gray
– Types of activity: sitting, walking, cycling, swimming, other

How well did you know this?

Not at all

Perfectly

nominal measurement scales tells us— of thing something is and its based on ——

what kind
agreed classification

How well did you know this?

Not at all

Perfectly

Some scales have only two labels these are called —-

dichotomous scales

How well did you know this?

Not at all

Perfectly

– Eye color: Blue, green, brown, hazel, gray
– Types of activity: sitting, walking, cycling, swimming, other
are examples of

nominal measurement as blood groups types

How well did you know this?

Not at all

Perfectly

– Disease status: Presence or absence of disease
– Lab test result: Positive or negative
– Mortality : alive or dead status
– Exam result: Pass or fail
are examples of

dichotomous scales - simplest sort

How well did you know this?

Not at all

Perfectly

types of variables summary:
1- —- variables
– Defined units, tell us how much in an absolute sense
– Can be continuous or discrete
Categorical variables
*—– scales
– Tell us how much, but in a relative rather than absolute sense
*—– scales
– Classify. Tell us what rather than how much
– Called —- scale when only two values

numeric
ordinal
nominal
dichotomous

How well did you know this?

Not at all

Perfectly

Knowing the measurement scale of data informs us as to how we should —- and — it

display and summarise it

How well did you know this?

Not at all

Perfectly

Summaries are — than the original because of what they leave out
* So any summary is a —- of the original
things can go wrong by:
1- We present aspects of the data that lead to the wrong conclusion
2- We leave out some important aspect of the data, leading to the reader drawing the wrong conclusion
- In practice, data analysts will examine the data in —– ways to make sure to avoid these pitfalls when reporting on them

smaller
simplification
different wats

How well did you know this?

Not at all

Perfectly

The most basic summary statistic is a —-

frequency as count or percent ( check the graph of stacked histogram ) and we can use a frequency table

How well did you know this?

Not at all

Perfectly

rule of thumbs:
—- for precise information
—- for patterns and understanding

Study These Flashcards

numbers
graphs

A simple graph displaying
frequencies of categories is —-
– —- is preferable but often they
presented —-

Study These Flashcards

bar graph
horizontal
vertical

When the data are measured on a
continuous scale but we have
relatively small amounts of data, we
can display the data as —

Study These Flashcards

dots aka a dot plot this can be used for heights of women and men from a small study
– For men or women with the
same height, the dots are shown
beside each other

With —- amounts of data, we don’t need to rely on the summaries, we can simply show all the data in a plot
* But with —- datasets, the dots become too numerous and we rely more and more on summaries

Study These Flashcards

small
larger

death in intensive care unit:
Patients had their risk of death calculated using —– scores
* These scores combines — to produce an —- of the—- of death
* The study also looked at length of stay
* These two variables - length of stay and APACHE-II scores
- the dots show will be —–

Study These Flashcards

APACHE-II
risk indicators
overal prediction
chance
predicted risk of
death (APACHE-II scores)
( check slide 27 pls , 28)

Summarising the risk scores using % cut-offs :
- These summaries don’t show us — the data, but they give us a good idea of —
- they show —-
- and give some idea of how scores – around that

Study These Flashcards

all
key marker
middle point/halfway
vary

—– is a value representing a cut off of a specified percentage of the data

Study These Flashcards

percentiles but also called quantiles ( check graph 29 plsss)

—– is the half-way point of the data values.
– Strictly speaking, half of the values lie —— the median
– The — percentile!

Study These Flashcards

median
lie at or below
50th
( check slide 31 PLSSSS)

—- is the average and it indicates approcimaently where the data is located on the number line.
and its calculated as:

Study These Flashcards

mean
“Sum up the individual values then
divide by the number of them”
mean can be misleading tho ( check the bar graph 35 )

---- A "tail" of exceptionally long stay times push the mean up ----- a detailed summary of the objectives, methods, results, and conclusions of a full study report and these statistics that maintain their properties even if the underlying distributional assumptions are incorrect.

outliers ( check slide 36 for more info pls). robust summary

The mean is sensitive to ---- while median is affected by ----

outliers robust summary measures such as median while outliers have little effect ( the median is a robust statistic because it has a breakdown point of 50%) – Omitting the four highest values moves the median from 3·55 to 3·50 – that's a change of about an hour (which is very small in comparison with the effect on the mean) * This explains the differences we see when we look at medians instead of means

ranges gives us an idea of --- which is not always a good idea and it needs 2 pieces of info which are: --- and --- these two values are most likely to be ---- cases or ---- - range is not --- it while be affected by ---

variability biggest and smallest values atypical cases and errors robust outliers (The range of length of stay is 82 days, but it's only 38 days if we ignore the longest-staying patient, and 19 days if we ignore the three longest-staying patients)

* A quarter of all patients scored 17 or less, and three quarters scored 66 or less * So the middle 50% of patients scored between 17 and 66 – That's a range of 49 (66 – 17) * This is called the ----- which will be --- to outliers bc they will occur at --- and not ----

interquartile range (abbreviated as IQR) extreme middle

----- average of the squared differences from the mean , a measure of how far a set of numbers ----- – No-one apart from professional statisticians understand it fully - Example: fasting blood sugar was checked for 10 employees – The results are : 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10 mmol/l Mean=7.75 Variance = (5.5−7.75)2+(6−7.75)2 + (6.5−7.75)2 …………… (10−7.75)2 9 =2.063

variance spread out

- Square root of the variance is ---- – It is in the same units as the --- * --- SD indicates data points tend to be very close to the mean (and to each other) * ---- SD indicates that the data points are very spread out from the mean and from each other. - blood sugar ex:

standard deviation SD original value small large 𝑆𝐷 = square root of this 2.063 = 1.44 check slide 46 37 plssss)

box plots are useful for ---- and present --- key summary statistics for each group which are: - shown in a ---- - they display the ---- by building a box around ---- and ---

comparing groups 5 The minimum, 25th percentile, 50th percentile (median), 75th percentile and maximum simple visual display interquartile range 25th and 75th

biomedical example : -Mass spectrometry experiments where proteins are ---- in ---- samples from patients - Prior to identifying biomarkers of interest: 1– Boxplots for each sample can be used to identify ---- with sample preparation or with calibration of the mass spectrometer – Based on this, samples may then be ---- or--- – Note the whiskers, extending to min & max

quantified biological problems excluded re-aligned (normalization)

----- is a data point which is abnormally distant from the rest of the data - we can modify a --- to show outliers as: – using a ---- that is based on the IQR, we change the length of the whiskers* – Individual points ---- the whiskers are shown as outliers - We can then further investigate the nature of the outliers: – Often they are valid observations: reporting ----- is recommended

outliers box plot detection rule outside robust summary statistics check slide 51 52 53

* In the examples for Length of stay in ICU and BMI, there appeared to be an excess of high values – An excess of low or high values is called ---- these may be visualised as ---- * A special case of data without skewness is the ----

skewness dotplots, boxplots and histograms normal distribution

true or false: Importance of the normal distribution is bc s that it fits many natural phenomena * Many things we measure are approximately normal – e.g. blood pressure & height * But nothing is truly normal (it is a mathematical concept)

true

summarising and displaying data Flashcards

(35 cards)