Module 3 - Data, Learning, Systematic Relationships Flashcards
2 approaches to finding a systematic relationship
- Graphical
- Quantitative
Use engineering judgement to ask:
Should there be a relationship?
Scatterplot plotting process
- Plot values of one var against another
- Loo for trend in data (nature of trend = lin? exp? quad?; degree of scatter = indicates that variance isn’t cst over range)
In scatterplots, look for…
- Trend betw. “independent” variables and dep variables
- Trend betw supposedly independent variables (indicates these quants may be correlated - codependencies, imp when mult x variables)
- Correlation can produce poor model estimation results
T/F: If scatterplot data arbitrarily placed on graph, the experiment was designed and thought out
F: trends and patterns indicate designed experiment
T/F: Use square graphs to put model on comparable basis
T
Covariance
Expected value of joint distribution of X and Y:
Cov(X,Y) = E{(X-mux)(Y-muy)}
Sign of covariance
Indicates sign of slope of systematic LINEAR relationship
T/F: Correlation and covariance are non-lin’r relationships
F: they are lin’r
Correlation
Dimensionless covariance:
Corr(X,Y) = p(X,Y) = Cov(X,Y)/(sigmaX*sigmaY)
Properties of correlation
- Dimensionless (no units)
- Range (-1 <= p(X,Y) ,= 1)
- Close to -1 = strong lin’r with -ve slope
- Close to 1 = strong lin’r with +ve slope
T/F: Correlation gives NO info abt actual numerical value of slope
T
If we have N pairs of observations of X and Y values… (covariance & correlation)
Sample covariance:
R = 1/(N-1) SUM (Xi - Xbar)(Yi-Ybar)
Sample correlation:
r = R/sXsY
Sample correlation and covariance characteristics
- Random fluctuations in data will produce random flucts in computed values
- Random variables
- Estimates of true covariance and correlation
- Work with values as guides without computing conf intervals
Don’t assume ________ equals _________.
Correlation
Causality
Rule of thumb for standard normal random variable
95% of values of Normal histogram occur within +/- 2 st. devs. of mean
Another name for bivariate distribution
Joint distribution (betw 2 variables)
T/F: Joint distributions will have covariance matrix, and diagonals are equal to variance of X (top left) and Y (bottom right)
T
What happens to bivariate distribution as correlation increases?
distribution is stretched along X = Y line, contours more elliptical
T/F: If X and Y are not strongly correlated, the distribution will be stretched
F: more circular, less of a trend
The multivariate Normal distr describes frequency with which vectors of values X1, Y1, X2, Y2,… Xn, Yn occur
F: X1, X2, X3… Xn
Bivariate UNIFORM probability distribution
Take non-0 values over certain interval: change of getting value in interval is same everywhere (contour is a single square)
Linear model
Linear in parameter(s)
Distinguish lin’r from nonlin’r regression models
Take first derivative wrt parameters - does derivative depend on parameters?