Flashcards in chapter 1 Deck (21):

1

## sample data values x1,..,xn are the observed values of a simple random sample of size n from the population if

### each sample member is chosen independently of the other sample members and each population member is equally likely o be included in the sample

2

## Exploratory Data Analysis (EDA):

### refers to a collection of techniques for initial exploration of a data set

3

## a data set:

### write {x1,...,xn} for a data set where the n values are arranged in typically time order i.e. first value seen is x1 and last is xn

4

## order statistics

### write { x(1),...,x(n)} for the order statistics of the sample- the data set is rearranged so the values are increasing in size and x(1)

5

## smallest value and largest value

### x(1) and x(n)

6

## x(i)

### the ith smallest value and has rank i

7

## what is the difference between xi and x(i)

### xi are observations of IID r.v. and x(i) are neither I or ID

8

## median

### the 'middle observation' in the ranked order x(1)

9

## properties of median

###
pro: not sensitive to extreme values because there are as many samples larger than the median as there are smaller samples

con: hard to calculate when combining two samples as cannot sum or multiply the subpopulation medians

10

## sample mean:

### xbar= (x1+...+xn)/n

11

## properties of samples mean

###
pro: easy to calculate, easy to combine two samples, easy to derive statistical properties

con: sensitive to extreme values

12

## trimmed sample mean

###
to remedy the sensitivity of the mean to extreme values or outliers.

we define the change % trimmed mean as follows:

- first take k= floor values [n x change/100] (floor value x) means take largest value less or equal than x

-remove the smallest k values and largest k values of the sample

-calculate the sample mean of the remaining values

13

## sample variance s^2=

### (sum j=1 to n(xj-xbar)^2)/n-1 = ((sumj=1 to n xj^2)-nxbar^2)/n-1

14

## what does sumj=1 to n (xj-xbar)=

### 0

15

## hinges

###
the lower hinge H1 is the median of the set {data values with rank <= rank of sample median}

the upper hinge H3 is the median of the set {data values with rank >= rank of sample median}

16

## quartiles

###
lower quartile Q1 is the data value with rank (n+1)/4: Q1=x((n+1)/4)

upper quartile Q3=x(3(n+1)/4) if both of these are integers.

If these ranks are not integers then let k= [floor values (n+1)/4] and define Q1=x(k) +(n+1)/4-k)[x(k+1)-x(k)]

17

## five number summary

### refers to the median, the upper and lower hinges and the maximum and minimum

18

## interquartile range IQR

### Q3-Q1

19

## outliers

### points more than 1.5x(H3-H1) which equals roughly 1.5 x IQR away from the hinges

20

## we say the distribution of the data is skewed to the right if

### H3 minus median > median minus H1 i.e. histogram has a long right tail

21