Descriptive statistics Flashcards Preview

Bioinformatic > Descriptive statistics > Flashcards

Flashcards in Descriptive statistics Deck (22)
Loading flashcards...
1
Q

measures: mean, trimmed mean, median, quantile, IQR, modus, sd
sem, cv - formula

A

Mean; Median; Modus are central
mean: sensitive against outliers

trimmed mean: take away 10% on both sides mean(v1,trim=0.1): less sensitive against outliers

median not sensitive against outliers is a robust measure (exactly the number in the middle)

Quartil: Aufteilung der Daten in viertel (oberes und unteres Quartil als Box bei Boxplot

Modus: welcher Wert kommt am häufigsten vor? (keinen gesonderten Befehl in R), Maximum der Daten liegt da

SD: Wurzel der Varianz
normal data:
* 2/3 of data are within 1 SD
* 95% of data are within 2 SD

coefficient of variation (CV): CV% = 100*sd(x)/mean(x)
–> used to compare different magnitudes

• standard error of the mean

  • SEM = sd(x)/sqrt(N)
  • how close are we to the true population mean
  • more measurements ñ closer

IQR: inter quartil range

2
Q

How to report results in general

A
  • problem
  • test name
  • sample size
  • test statistic
  • P value
  • condence interval
3
Q

Contigency tables

A

• tables with count items
• each count is a number of cases of a certain level or sharing a given combination of levels
• normally used on factors/categorical data
• but on continuous data can be used with cut, use a “good”break –> hint: use quantiles for cutting
frequency: number of times a category is counted summc nc= n
relative frequency: sample proportion for each possiböe category (summ p categories= 1) Pcategory1= n1/n

4
Q

Tabulating Categorical Data, ftable

A
  • table function for tabulating one or two variables
  • ftable function for tabulating more than two variables
  • dim for exploring dimensions of a table
  • sum to count the number of all items
5
Q

1D, 2D, 3D table

A

access of multidimensional tables as with matrices and dataframes using rectangular braces and n 1 commas. n is the number of dimensions.

6
Q

Independence table

A

• number of observations if there would be no
dependencies between the variables
• Expected= (Rowtotal * Columntotal )/ Total

7
Q

Pearson residuals

A

–> A normalized measure for the distance to the expected data.

8
Q

prop.table

A

Express Table Entries As Fraction Of Marginal Table

9
Q

Graphics 1D

A

pie, barplot, dotchart

10
Q

Graphics 2D

A

assocplot, mosaicplot, fourfoldplot
• exploring the relationship between two variables
• mosaicplot –> absolute numbers visualized
• assocplot –> residuals shown (zeigt ob mehr oder weniger, wie groß die Abnahme)

11
Q

generate probability functions

A
  • r: random number generator
  • p: probability function (cumulative probability function c.d.f)
  • d: density function (point probability)
  • q: quantile function (inverse c.d.f)
12
Q

Poisson Distribution

A
  • binominal distribution has an upper limit
  • if we through 50 times, the maximum achievable value is 50
  • count numbers that are without theoretical limits (spatial or temporal) often follow a Poisson distribution
  • lower limit is zero, but no upper limit
  • parameter as the rate of occurence within a certain time or space
  • count cells in a grid, number of visits of doctors at a
13
Q

Poisson Distribution

A
  • binominal distribution has an upper limit
  • if we through 50 times, the maximum achievable value is 50
  • count numbers that are without theoretical limits (spatial or temporal) often follow a Poisson distribution
  • lower limit is zero, but no upper limit
  • parameter as the rate of occurence within a certain time or space
  • count cells in a grid, number of visits of doctors at a patient ..
14
Q

Chisq Distribtion

A

distribution for chi^2 of tables without dependecies for the variables
df=(nr of levels var1-1)*(number of levels var2-1)

15
Q

Distributions numerival vs categorical data

A
• Numerical data
- Uniform
- Normal
- T
- (Wilcox)
• Categorical data
 - Bernoulli - 1 trial, upper limit
- Binominal - unlimited trials, upper limit
- Poisson - no upper limit
- Chisq - two variables
16
Q

chisq.test

A

• same p-value as prop.test for 2x2 tables
• but no CI computed
• can be used for more than two levels (as for 3x2
tables, one of the two variables has 3 levels)
• compare the output chi^2 with tabulated values –> p.value

17
Q

prop.test

A

test for differences between the groups (?)

18
Q

fisher.test

A
  • permutations –> slower than prop.test

* required if one expected value in 2x2 table is <= 5

19
Q

Odds/Odds ratios

A
Odds:
• event did occur / event did not occur
• ranges from 0 till Inf
• probability of 0.5 == odds of 1.0
• probability of 0.33 == odds of 0.5
• probability of 0.75 == odds of 3
Formula: odds= probability/(1-probability)
Odds ratios
• again from a 2x2 times contingency table
• odds1 / odds2 = odds ratio
• 0.19 (AZT) / 0.39 (Placebo) = 0.49
OR= O1/O2
20
Q

ectsizes

A
  • Cohens w
  • Cohens h
  • Odds Ratio
  • Relative Risk
  • Numbers needed to treat (NNT)
21
Q

Cohens W

A

Cohens w is the square root of the proportions basedchi^2 value:

w=√∑(n; i=1) (po,i-pe,i)^2/pe,i

It is useful also for larger contingency tables.
• po;i observed proportion in cell i
• pe;i expected proportion in cell i

22
Q

α and Typ I/II errors

A
  • α is a decision threshold or signicance level
  • mainly used in science: α = 0.05
  • but this is completely arbitrary (!)
  • lowering α –> less false positives, more false negatives
  • increasing α –> less false negatives but more false positives
  • rejecting α with true H0 –> type I error
  • accepting α with false H0 –> type II error
  • α sets the probability of getting a type I error