Biostats Flashcards
Adjusted Rate (adjustment)
a summarizing procedure for a statistical measure in which the effects of differences in composition of the populations being compare have been minimized by statistical methods. Often performed on rates or relative risk, commonly because of differing age distributions in populations that are being compared. The mathematical procedure commonly used to adjust rates for age differences is direct or indirect standardization. Example: adjustment by regression analysis and by standardization.
Alpha (α)
the probability of a type I error, the error of rejecting a true null hypothesis (declaring a difference exists when it does not)
Alternative Hypothesis
- A supposition, arrived at from observation or reflection, that leads to refutable predictions. 2. Any conjecture cast in a form that will allow it to be tested and refuted.
Analysis of Variance (ANOVA)
The separation of variance attributable to one cause from the variance attributable to others. By partitioning the total variance of a set of observations into parts due to particular factors, for example, sex, treatment group, etc, and comparing variances (mean squares) by way of F-tests, differences between means can be assessed. The simplest analysis of this type involves a one-way design, in which N subjects are allocated, usually at random, to the k different levels of a single factor. The total variation in the observations is then divided into a part due to differences between level means (the between groups sum of squares) and a part due to the differences between subjects in the same group (the within groups sum of squares, also known as the residual sum of squares). These terms are usually arranged as an analysis of variance table.
If the means of the populations represented by the factor levels are the same, then within the limits of random variations, the between groups mean square and within groups mean square, should be the same. Whether this is so can, if certain assumptions are met, be assessed by a suitable F-test are that the response variable is normally distributed in each population and that the populations have the same variance. Essentially an example of a generalized linear model with an identity link function and normally distributed errors.
Bayes’ theorem
A procedure for revising and updating the probability of some event in the light of new evidence. The theorem originates in an essay by the Reverend Thomas Bayes. In its simplest form the theorem may be written in terms of conditional probabilities as,
pr (Bj| A)=
where Pr( A | Bj ) denotes the conditional probability of event A conditional on event Bj and B1 , B2 ,…,Bk are mutually exclusive and exhaustive events. The theorem gives the probabilities of the Bj when A is known to have occurred. The quantity Pr( Bj ) is termed the prior probability and Pr( Bj | A ) the posterior probability . Pr( A | Bj ) is equivalent to the (normalized) likelihood , so that the theorem may be restated as posterior (prior) x (likelihood).
Beta (β)
The probability of a type II error, the error of failing to reject a false null hypothesis, i.e. declaring that a difference does not exist when in fact it does.
Bias
In general terms, deviations of results or inferences from the truth, or processes leading to such deviation. More specifically, the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated, or does not test the hypothesis to be tested. In estimated usually measured by the difference between a parameter estimate and its expected value. An estimator for which is said to be unbiased .
Binary Variable (Binary Observation)
Observations which occur in one of two possible states, these often being labeled 0 and I. Such data is frequently encountered in medical investigations; commonly occurring examples include ‘dead/alive’, ‘improved/not improved’ and ‘depressed/not depressed.’ Data involving this type of variable often require specialized techniques for their analysis such as logical regression.
Binomial Distribution
The distribution of the number of ‘successes’, X, in a series of n- independent Bernoulli trials where the probability of success at each trial is p and the probability of failure is q = 1- p . Specifically the distribution is given by
Pr(X=x) = n!/x!(n-x)![q^(n-x)], x = 0, 1, 2 ……, n
The mean, variance, skewness and kurtosis of the distribution are as follows: mean = np variance = npq skewness = ( q - p )/( npq ) 1/2 kurtosis = 3-(6/n)+1/npq
Biostatistics
A branch of science which applies statistical methods to biological problems. The science of biostatistics encompasses the design of biological experiments, especially in medicine and health sciences.
Bivariate
outcomes belong to two categories, e.g. yes/no, acceptable/defective “bivariate binomial distribution”.
Blinded Study (Blinding)
A procedure used in clinical trials to avoid the possible bias that might be introduced if the patient and/or doctor knew which treatment the patient is receiving. If neither the patient nor doctor are aware of which treatment has been given the trial is termed double-blind. If only one of the patient or doctor is unaware, the trial is called single-blind. Clinical trials should use the maximum degree of blindness that is possible, although in some areas, for example, surgery, it is often impossible for an investigation to be double-blind.
Bonferroni correction
A procedure for guarding against an increase in the probability of a type I error when performing multiple significance tests. To maintain the probability of a type I error at some selected value (α), each of the m tests to be performed is judged against a significance level (α/m ). For a small number of simultaneous tests (up to five) this method provides a simple and acceptable answer to the problem of multiple testing. It is however highly conservative and not recommended if large numbers of tests are to be applied, when one of the many other multiple comparison procedures available is generally preferable.
Case-Control Study
(Syn: case comparison study, case compeer study, case history study, case referent study, retrospective study) The observational epidemiologic study of persons with the disease (or other outcome variable) of interest and a suitable control (comparison, reference) group of persons without the disease.
Categorical Data
Categorical data represent types of data which may be divided into groups. Examples of categorical variables are race, sex, age group, and educational level. While the latter two variables may also be considered in a numerical manner by using exact values for age and highest grade completed, it is often more informative to categorize such variables into a relatively small number of groups.
Censored Information
An observation (Xi) on some variable of interest is said to be censored if it is known only that Xi =Li ( left-censored) or Xi =Ui ( right-censored) where Li and Ui are fixed values. Such observations arise most frequently in studies where the main purpose variable is time until a particular event occurs (for example, time to death) when at the completion of the study, the event of interest has not happened to a number of subjects.
Central Limit Theorem
If a random variable Y has population mean µ and population variance σ2, then the sample mean, , based on n observations, has an appropriate normal distribution with a mean µ and variance σ2/ n , for sufficiently large n. The theorem occupies an important place in statistical theory. In short, the Central Limit Theorem states that if the sample size is large enough, the distribution of sample means can be approximated by a normal distribution, even if the original population is not normally distributed.
Chi-Square Distribution
The Chi-Square distribution is based on a normally distributed population with variance σ2, with randomly selected independent samples of size n and computed sample variance s2 for each sample. The sample statistic X2= ( n – 1) s2/σ2. The chi-square distribution is skewed, the values can be zero or positive but not negative, and it is different for each number of degrees of freedom. Generally, as the number of degrees of freedom increases, the chi-square distribution approaches a normal distribution.
Chi-square statistic
A statistic having, at least approximately, a chi-squared distribution.
Chi-square test for trend
A test applied to a two-dimensional contingency table in which one variable has two categories and the other has k ordered categories, to assess whether there is a difference in the trend of the proportions in the two groups. The result of using the ordering in this way is a test that is more powerful than using the chi-squared statistic to test for independence.
clinical trial (phases 1-4)
Syn: therapeutic trial) A research activity that involves the administration of a test regimen to humans to evaluate its efficacy and safety. The term is subject to wide variation in usage, from the first use in humans without any control treatment to a rigorously designed and executed experiment involving test and control treatments and randomization. Several phases of clinical trials are distinguished:
Phase I trial Safety and pharmacologic profiles. The first introduction of a candidate vaccine or a drug into a human population to determine its safety and mode of action. In drug trials, this phase may include studies of dose and route of administration. Phase I trials usually involve fewer than 100 healthy volunteers.
Phase II trial Pilot efficacy studies. Initial trial to examine efficacy usually in 200 to 500 volunteers; with vaccines, the focus is on immunogenicity, and with drugs, on demonstration of safety and efficacy in comparison to other existing regimens. Usually but not always, subjects are randomly allocated to study and control groups.
Phase III trial Extensive clinical trial. This phase is intended for complete assessment of safety and efficacy. It involves larger numbers, perhaps thousands, of volunteers, usually with random allocation to study and control groups, and may be a multicenter trial.
Phase IV trial With drugs, this phase is conducted after the national drug registration authority (e.g., the Food and Drug Administration in the United States) has approved the drug for distribution or marketing. Phase IV trials may include research designed to explore a specific pharmacologic effect, to establish the incident of adverse reactions, or to determine the effects of long-term use. Ethical review is required for phase IV clinical trials, but not for routine post marketing surveillance.
coefficient of variation (CV)
he measure of spread for a set of data defined as
100 x standard deviation / mean
CV = s/x bar(100) = sample
CV = σ/µ(100) = population
Originally proposed as a way of comparing the variability in different distributions, but found to be sensitive to errors in the mean. Simpler definition: The ratio of the standard deviation to the mean. This is meaningful only if the variable is measured on a ratio scale.
Cohort Study
(Syn: concurrent, follow-up, incidence, longitudinal, prospective study) The analytic method of epidemiologic study in which subsets of a defined population can be identified who are, have been, or in the future may be exposed or not exposed, or exposed in different degrees, to a factor or factors hypothesized to influence the probability of occurrence of a given disease or other outcome.
complementary event
Mutually exclusive events A and B for which Pr(A) + Pr(B) = 1
where Pr denotes probability.