chapter 1 Flashcards Preview

Statistics 1 > chapter 1 > Flashcards

Flashcards in chapter 1 Deck (21):
1

sample data values x1,..,xn are the observed values of a simple random sample of size n from the population if

each sample member is chosen independently of the other sample members and each population member is equally likely o be included in the sample

2

Exploratory Data Analysis (EDA):

refers to a collection of techniques for initial exploration of a data set

3

a data set:

write {x1,...,xn} for a data set where the n values are arranged in typically time order i.e. first value seen is x1 and last is xn

4

order statistics

write { x(1),...,x(n)} for the order statistics of the sample- the data set is rearranged so the values are increasing in size and x(1)

5

smallest value and largest value

x(1) and x(n)

6

x(i)

the ith smallest value and has rank i

7

what is the difference between xi and x(i)

xi are observations of IID r.v. and x(i) are neither I or ID

8

median

the 'middle observation' in the ranked order x(1)

9

properties of median

pro: not sensitive to extreme values because there are as many samples larger than the median as there are smaller samples
con: hard to calculate when combining two samples as cannot sum or multiply the subpopulation medians

10

sample mean:

xbar= (x1+...+xn)/n

11

properties of samples mean

pro: easy to calculate, easy to combine two samples, easy to derive statistical properties
con: sensitive to extreme values

12

trimmed sample mean

to remedy the sensitivity of the mean to extreme values or outliers.
we define the change % trimmed mean as follows:
- first take k= floor values [n x change/100] (floor value x) means take largest value less or equal than x
-remove the smallest k values and largest k values of the sample
-calculate the sample mean of the remaining values

13

sample variance s^2=

(sum j=1 to n(xj-xbar)^2)/n-1 = ((sumj=1 to n xj^2)-nxbar^2)/n-1

14

what does sumj=1 to n (xj-xbar)=

0

15

hinges

the lower hinge H1 is the median of the set {data values with rank <= rank of sample median}

the upper hinge H3 is the median of the set {data values with rank >= rank of sample median}

16

quartiles

lower quartile Q1 is the data value with rank (n+1)/4: Q1=x((n+1)/4)
upper quartile Q3=x(3(n+1)/4) if both of these are integers.

If these ranks are not integers then let k= [floor values (n+1)/4] and define Q1=x(k) +(n+1)/4-k)[x(k+1)-x(k)]

17

five number summary

refers to the median, the upper and lower hinges and the maximum and minimum

18

interquartile range IQR

Q3-Q1

19

outliers

points more than 1.5x(H3-H1) which equals roughly 1.5 x IQR away from the hinges

20

we say the distribution of the data is skewed to the right if

H3 minus median > median minus H1 i.e. histogram has a long right tail

21

we say the distribution of the data is skewed to the left if

H3 minus median < median minus H1