Experimental Design Flashcards

1
Q

Questions to conduct analysis

A
  • How to arrange data? - Do we conduct one-way ANOVA? - Is the response continuous or discrete - What is the model assumption - What is the distribution for the error, still normal? - Consider blocking factors and interactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Factor

A

Variable whose influence upon a response variable is being studied in an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Factor Level

A

Numerical values or settings for a factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Trial (or run)

A

application of a treatment to an experimental unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Treatment or level combination

A

set of values for all factors in a trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Experimental unit

A

Object to which a treatment is applied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Randomization and reasons for doing this

A

using a chance mechanism to assign treatments to experimental units or run order

Reasons for randomization: Reduces the chance that an unaticipated variable effect will confuse the results of the experiment (protects against unknown variables). Most of these unanticipated effects manifest themselves over time. Also it reduces the biases that an experimenter may impose on a design. Lastly it ensures validity of the estimate of experimental error and provides a basis for inference in analyzing the experiments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

5 categories of Experimental design

A
  1. Treatment comparisons
  2. Variable screening
  3. Response Surface Modeling
  4. System optimization
  5. System robustness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Treatment Comparisons

A

Purpose is to compare several treatments of a factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Variable Screening

A

Have a large number of factros, but only a few are important. Experiment should identify the important few.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Response Surface Exploration

A

After important factors have been identified, their impact on the system is explored; regression model building

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

System Optimization

A

Interested in determining the optimum conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

System Robustness

A

Wish to optimize a system and also reduce the impact of uncontrollable (noise) factors. (example: car running well on different road conditions and with different driving habits)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Systematic Approach to experimentation

A
  • State the objective of the study
  • Choose the response variable
  • Choose factors and levels
  • Choose experimental design (plan)
  • Perform the experiment
  • Analyze the data
  • Draw conclusions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Three fundamental principles of experimental design

A

Replication, Randomization, Local control of error (blocking and covariates)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define replication, difference between it and repetition

A

Each treatment is applied to units that are representative of the population. This helps to reduce variance and increase power to detect significant differences.

Repetition would be the repetition of a measurement any number of times on one unit. Replication is replicating the measurement process with a new unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define Randomization and list its advantages

A

Use of a chance mechanism to assign teratments to units or to run order.

It has the following advantages:

  • protects against latent variables or “lurking” variables
  • Reduces influence of subjective bias in treatment assignments
  • ensures validity of statistical inference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define blocking, notes on effective blocking strategies

A

A block refers to a collection of homogeneous units (example: hours, batches, lots, etc) .

Effective blocking: larger between-block variations than within-block

“block what you can and randomize what you cannot”

Run and compare treatments within the same blocks. Use randomization within blocks to eliminate block-block variation and reduce variability of treatment effects on estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define learning effect

A

Advantage given to the unit/person in an experiment. Mitigated with balanced randomization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define balanced randomization

A

To randomly choose or go through randomization so that equal numbers of treatments to units are sustained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Two things that scientific method require

A
  1. Data collection
  2. Data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Three basic methods of collecting data and explanations

A
  1. Retrospective studies (historical data)- least expensive and quickest way, data is readily available, data mining, but less than optimal for research goals/ questionable reliability
  2. Observational studies- uses observational studies to monitor processes (beware the heisenberg uncertainty principle), employs simple random sampling(most common)/ stratified random sampling/ systematic sampling
  3. Designed experiments- intentionally disturb the process and observe the results, manipulate factors reach equilibrium and observe response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Difference between experimental and observational unit

A

The experimental unit is the smallest unit to which we apply a treatment combination.

The observational unit is the unit upon which we make the measurement. (May or may not be the experimental unit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Define experimental error and observational error

A

Experimental error measures the variability among the experimental units. May be thought of as background noise, represents variability from trying to repeat the application of the specific combination of the factor levels

Observational error measures the variability due to the observational units. Is part of the experimental errror but only a part. (Think of baking pies in two different ovens)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Basic idea of local control of error
Reduce the random error among the experimental units. Control or account for anything which might affect the response other than the factors.
26
OLS Estimation for simple linear regression (the model, what to minimize, and the different values and their variances)
27
R2 formulas
RegrSS/CTSS = 1 - (RSS/CTSS)
28
Another way to express RegrSS
29
What is another way to express MSE?
RSS/ (n-p) sig hat
30
SE(B1 hat)
sqrt (MSE/ Sxx)
31
What does det[(x'x)-1] represent?
It is proportional to the reciprocal of the volume of the confidence ellipsoid for the estimated coefficients
32
Principle of Parsimony
Occam's razor: "entities should not be multiplied beyond necessity". So choose fewer variables with sufficient explanatory power. This is a desirable modeling strategy.
33
Explain a One-way layout design in words
A single-factor experiment with k levels (treatments)
34
Linear model for the one-way layout
35
ANOVA for one-way layout estimated model
36
Describe over-paramterization
When there are k types of observations but regression parameters are greater than k. When fitting the model, (X'X)-1 will not exist because it is not full rank since X'X is singular. Constraints will be needed to make X'X a nonsingular matrix.
37
What are the two types of constraints for an over-parameterized model?
1) Allowing the sum of the treatments be equal to zero (zero sum) 2) Allowing one of the treatments to be zero (dropping it from the model matrix, X, called a baseline constraint)
38
Descibe the purpose of multiple comparisons test and describe the two methods associated with it
After a global F-test of the treatments and rejecting the null hypothesis, the multiple comparisons test identifies which pairs of treatments are statistically significant. 1) Bonferroni Method- alpha/2 level for the t test of N-k degrees of freedom is divided by k' = kchoose2 2) Tukey Method- measurement of statistical difference between treatments if t critical value exceeds the upper alpha quantile of the studentized range distribution with k and N-k degrees of freedom
39
Process to find B for linear and quadratic effects
40
P1(x) and P2(x) and equation for finding y with orthogonal polynomials
41
Explanation of one-way random effects model
Basically a one-way fixed effects model but now the operators considered come from a pool (population) of operators. This gives rise to the variance components, variance between the operators and within the operators
42
Do you do multiple comparison tests with a random effects model? Why or why not?
No because of the two error terms. You should find the expected value of these two terms
43
What is the variance of the average in a random effects model?
MSTr/(nk)
44
Before using hypothesis testing and confidence intervals, what model assumptions must be made?
1) Have all important effects been captured? 2) Are the errors independent and normally distributed? 3) Do the errors have constant variance?
45
What are the three major properties of residuals? And what is the variance of any residual?
E(r)=0 r and yhat are independent r ~ Multi Norm (0, sig^2( I - H ) ) Var(r) = sig^2 (1-hii)
46
What four residual plots will help show model assumptions?
Plot ri vs yhati Plot ri vs xi Plot ri vs time sequence, i Plot ri vs replicates grouped by treatment
47
What should you do if there is a large number of replicates per treatment? What is helpful about this method?
Use a Box-whisker plot. It enables the location, dispersion, skewness, and extreme values of the replicated observations to be displayed in a single plot
48
IQR, IQR whiskers, implications for outliers and skewness
IQR = Q3 - Q1 Whiskers= [Q1 - 1.5\*IQR, Q3 + 1.5\*IQR] Anything outside the whisker bounds is considered an outlier. If Q1 and Q2 are not symmetric about the median then this implies skewness.
49
Explain the purpose and process of the normal probability plot
Purpose: to test if the residuals follow a normal distribution Process: Obtain ordered residuals which each have probability pi = (i - .5)/N. Then plot pi vs r(i) which should be relatively S shaped if the residuals are somewhat normally distributed. However, typically there is a transformation of these probabilities that makes the desired shape to be a straight line (think qq-plot).
50
Name some experiments with more than one factor
Paired comparison design, randomized block design, two-way and multi-way layout, latin and graeco latin square design, balanced and incomplete block design (BIBD), split-plot design, ANCOVA
51
Definitions of paired comparison design and unpaired design
Paired comparison design: can be looked at as a RBD with block size 2. Considers two homogenous units and within each block two treatments are randomly assigned. Unpaired design: The treatment size is still two, but now the units are not homogenous and therefore the experiment will have more degrees of freedom. Because it acounts for between sample variance, this design has lower power than the paired comparison design
52
t values for paired design
53
t value for unpaired design (two-sampled t-test)
54
Define (complete) randomized block design
k treatments are randomly assigned to each block (of k units) with b blocks and bk=N total sample size. For effective design, the units within each block should be more homogenous than units between blocks.
55
Model for randomized block design (mixed effects models)
56
F and t stats for RBD
57
ANOVA for RBD
58
Explain a two-way layout experiment
It involves two treatment factors with fixed levels. There is an interest in assessing the interaction effect between the two treatments
59
Show the model and estimation for the two- way layout
60
F test for two-way layout with sum of squares formulations
61
ANOVA for two way layout
62
Describe Multi-way layout designs
Like the two-way layout but expanded to 2 or more factors (treatments) and 2 or more treatments levels for each factor
63
Using zero-sum constraints for the three-way layout, show the predicted multi-way layout model and the corresponding formulations
64
ANOVA for three way layout
65
Explain a Latin square design
Each of the k Latin letters (ie treatments) appears once in each row and once in each column (these are the two blocking factors of the experiment)
66
Show the estimated model and formulation for latin square design
67
ANOVA for latin square design
68
Explain a Graeco-Latin square design
Basically a super position of two Latin square designs. Useful for studying four factors (3 blocking 1 treatment or 2 blocking 2 treatment)
69
Show model and ANOVA for Graeco latin square design
70
Explain BIBD (Balanced incomplete block design)
The number of treatments, t, is greater than the block size, k. Also this is balanced because each pair of treatments (or trio, quadruplet, etc) appears the same number of times (denoted by lambda)
71
What are the two basic relations for BIBD (involving b(number of blocks), k(block size), r(number of treatment replications), t(number of treatments), and lambda(l, the number of times pairs appear))
bk=rt r(k-1)=l(t-1)
72
Explain split plot design, when and why it should be used and potential advantages/disadvantages.
A split plot should be used for situations where certain factors are hard to change. These hard to change factors would be considered whole plot factors and within each whole plot factor level would have subplot factors. Advantages include cost/time effectiveness. Disadvantages include loss in precision in the whole plot treatment comparison
73
Explain ANCOVA, when and why it should be used, and potential advantages/disadvantages
ANCOVA should be used when auxillary covariates are available. In an experiment, it may be impractical to create blocks (think continuous variables) so ANCOVA can be used if correlation between covariate and treatment is high. Essentially we know the covariate term is important, but it is an uncontrollable source of error. In application can be viewed as a fusion of one-way treatment comparisons and simple linear regression. Advantages include reducing bias and improving sesitivity/reducing error from originial models. Disadvantages include fitting two models where covariate term was accounted for (this might not always be the case).
74
Explain a 2-level Full Factorial Design, when and why it might be used, and potential advantages/disadvantages
Used to see the inclusion or absense of levels for 2-level factors and their collective effect on a response. Would be used for exploratory analysis where linear trends are expected. Advantages include reporducibility and wider inductive basis because of symmetry of experiments. 2-level full factorial experiments are great for preliminary studies and are cost effective. They highlight interactions as well as isolotory effects. Disadvantages include when there are multiple factors (10 factors means 2^10 - 1 runs) and also the inability to observe polynomial terms because of only two levels.
75
3 key properties of full factorial design
Balance- each factor level appears in the same number of runs Orthogonality- all paired level combinations for factors appear the same number of times Replication- identical treatments applied to similar experimental units
76
Explain in words a main effect
The diffence in average value for all observations between the maximum range levels of a factor
77
Explain in words an interaction
The change in average response, when changing the level of one factor, depends on the level setting of another factor. There are synergistic and antagonistic interactions.
78
Equations for an interaction
79
Conditional main effect equation
M E(BIA+) = z(B + IA+) - z(B -IA+).
80
Equations for Box Cox Transformations
81
Split plot SS
82
Split plot ANOVA
83
Split plot model and hypotheses
84
Reasons for power transformation
(i) It gives the most parsimonious model, that is, with the fewest terms, particularly the omission of higher-order terms like cubic effects and interactions. (Ii) There are no unusual patterns in the residual plots. (iii) The transformation has good interpretability.
85
Three fundamental principles for factorial effects
1) Effect Hierarchy principle: lower order effects are more likely to be important than higherorder effects and effects of the same order are equally likely to be important 2) Effect Sparsity principle; the number of relatively important effects in a factorial experiment is small 3) Effect heredity principle: in order for an interaction to be significant, at least one of its parent factors should be significant
86
Steps to construct a normal plot for factorial effects
1) Order factorial effect estimates 2) Plot ordered factorial effect estimates against corresponding inverse normal coordinates for (i-.5)/N for i=1,...,N 3) Under Ho al factorial effects=0 so normal plot should be a straight line 4) Any point which falls off the line is considered significant
87
Half normal plots and their advantages
88
Lenth's method (Individual Error rate version)
89
Nominal the best
90
Deriving the use of log sample variance for dispersion analysis and why this is used
Used because the log transformation transforms multiplicative relationships to additive ones, making them easier to model statistically. Its also easy to transform the sample variance back to its orignial value by exponentiating it.
91
Describe procedures for blocking and optimal arrangement of 2k factorial designs for 2q blocks and any assumptions made.
For the 2q blocks the block size should divide into the run size of the experiment Usually one of the higher ordered factorial effects needs to represent the assignment of blocks because of the effect hierarchy principle The block effect estimate will be the main effect of the blocks For more blocks create more blocking equations One major assumption is that the block-by-treatment interactions are negligible. The assumption generally states that the mean response when considering a certain treatment do not depend on the block. Without this, factorial effects would not be estimable by blocking relations.
92
What makes a good blocking scheme? Define terms confounding, abberation, and estimability.
Confounding: Setting up a relation which connects one design factor with another (in our case a block, eg: B=123 is a confounding relation). Literally means "confused". Abberation: For any blocking scheme b, let gi(b) be the number of i-factor interactions tahat are confounded with block effects. Let r be the smallest i for any 2 blocking schemes such that gr(b1) does not equal gr(b2). Then if gr(b1)r(b2) then blocking scheme 1 has less aberration than b2 Estimability: Estimability of order e is determined by finding the lowest order of interactions confounded by block effects, named e+1. Therefore estimability of order e ensures that all factorial effects of order e are estimable in the blocking scheme. The best blocking schemes are ones that ensure estimability of order 1 and minimum abberation among all blocking schemes.
93
Explain 2 level fractional factorial designs, when and why they should be used, and potential advantages/disadvantages
2 level fractional factorial designs are a subset of full factorial designs. They have less run size and must use aliasing equations to account for loss in balance/orthogonality achieved by the full factorial designs. We write this as 2k-p where k represents the number of factors and p represents the fraction of reduced runs. Advantages include efficiency both in cost and time. Like a full factorial design it is reproducible and uses symmetry as the basis of its design. Disadvantages include complexity of aliasing and scheme selection and the full space of the experiment is not explored.
94
Within the context of 2 level fractional factorial designs, define these terms: Aliasing relation, word, resolution Also, how many df, aliasing relations, runs
Aliasing relation: Describes whatever factor combination is being confounded. Denoted, for example, I=ABC=BCD. There are 2k-p-1 aliasing relations as well as degrees of freedom. Word: Any confounded factor combination Resolution: The smallest word in the defining contrast subgroup. It is desireable to have maximum resolution for fractional factorial designs. There are 2k-p runs in a 2k-p experiment.
95
Rules for Resolution IV and V Designs for 2 level fractional factorial designs. Define clear and strongly clear.
Clear: A factorial effect is clear if none of its aliases are main effects/ interactions Strongly clear: a factorial effect is strongly clear if none of its aliases are main effects, 2 way, or 3 way interactions 1) In any Res IV design, all main effects are clear 2) In any Res V design, all main effects are strongly clear and 2 factor interactions are clear 3) Among Res IV designs, those with largest number of clear 2-factor interactions are best
96
Variance of a factorial effect
97
Steps for analysis of fractional factorial designs using regression
98
Describe the problem of aliased ambiguities and briefly state plans to resolve them
Aliased ambiguities occur when factorial effects are significant but they cannot be distinguished from the experimental data because they are confounded with one another. Plans include: Using domain knowledge to see some effects are not actually likely to be significant, use hierarchy principle to assume away higher order effects, to explore follow up experimentation using fold-over techniques and optimal design criterions.
99
What is the fold-over technique? Advantages/disadvantages?
The fold-over technique flips over the design matrix and finds the new aliasing relations (this doubles run size). A new factor represents the two halves of the combined designs (+,-). Use the augmented design matrix to dealias the effects believed to be important. Then analyze this design. This method is effective for analyzing all the main effects or one main effect and all its interactions for a resolution III design from the original experiment. There are problems since this is sort of a limited scope of dealiasing and also the number of runs must be doubled. There are more effecient ways to accomplish this.
100
Describe an optimal design approach and two criteria for this approach
An optimal design apporoach is a technique for follow up experiments to dealias ambiguities for the best model identified using a particular optimal design criterion. The model in use for optimal design should contain 1) All effects and their aliases judged significant a priori 2) A block variable that accounts for differences in average value of the response over different time periods from the original experiment and the follow up experiment 3) An intercept D-optimal criterion: maxd |Xd'Xd| where d=1,....,2\*2p where p is the number of regressors in the regression equation Ds-optimal criterion: maxd|X2'X2-X2X1(X1'X1)-1X1'X2| Can think of these in terms of regression. |X'X| is proportional to the reciprocal of the volume of the confidence ellipsoid for the estimated coefficients so that maximizing d is proportional to minimizing the volume of this confidence ellipsoid (ie more precise estimation).
101
What should be used to evaluate the effectiveness of a fractional factorial design?
Minimum aberration criterion supplemented by the number of clear effects
102
Explain larger the better and smaller the better problems
Larger the better problems: 1) Find factor settings that maximize E(y) 2) Find other factor settings that minimize Var(y) Smaller the better problems: 1) Find factor settings that minimize E(y) 2) Find other factor settings that minimize Var(y)
103
What are some practical considerations that make it desirable to study factors with more than two levels?
1) Factors may effect the response in a non-monotone fashion. More levels allow the curvature effect to be understood. 2) If a qualitative factor has multiple levels that need to be understood (eg three separate settings on a machine) 3) If there is an initial setting in an optimization problem, then it would make sense to study the space around that setting. Therefore multiple levels would be needed.
104
Suppose we have a 3k full factorial experiment for factors A,B,C with three replicates. How many degrees of freedom are there for the terms?
105
Explain the linear-quadratic system for 3 level fractional factorial design
106
What does it mean for two factors to be partially aliased?
The pair of effects has an angle between 0 and 90 degrees.
107
Explain an orthogonal array
108
Why use an orthogonal array?
Orthogonal arrays have better run size economy (less runs) and flexibility of factor level combinations
109
How to determine OA run size
110
What is RSM?
Response surface methodology uses experimentation, modeling, data analysis, and optimization to understand the surface of the response
111
What are the three types of points for CCD in RSM?
Central composite design: Corner points, axial points, center points
112
Explain robust parameter design
Choose control fator settings to make response less sensitive (ie more robust) to noise variation, exploiting control-by-noise interactions