Module 3 - Data, Learning, Systematic Relationships Flashcards

1
Q

2 approaches to finding a systematic relationship

A
  1. Graphical
  2. Quantitative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Use engineering judgement to ask:

A

Should there be a relationship?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scatterplot plotting process

A
  1. Plot values of one var against another
  2. Loo for trend in data (nature of trend = lin? exp? quad?; degree of scatter = indicates that variance isn’t cst over range)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In scatterplots, look for…

A
  • Trend betw. “independent” variables and dep variables
  • Trend betw supposedly independent variables (indicates these quants may be correlated - codependencies, imp when mult x variables)
  • Correlation can produce poor model estimation results
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T/F: If scatterplot data arbitrarily placed on graph, the experiment was designed and thought out

A

F: trends and patterns indicate designed experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

T/F: Use square graphs to put model on comparable basis

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Covariance

A

Expected value of joint distribution of X and Y:

Cov(X,Y) = E{(X-mux)(Y-muy)}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sign of covariance

A

Indicates sign of slope of systematic LINEAR relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T/F: Correlation and covariance are non-lin’r relationships

A

F: they are lin’r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Correlation

A

Dimensionless covariance:

Corr(X,Y) = p(X,Y) = Cov(X,Y)/(sigmaX*sigmaY)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Properties of correlation

A
  • Dimensionless (no units)
  • Range (-1 <= p(X,Y) ,= 1)
  • Close to -1 = strong lin’r with -ve slope
  • Close to 1 = strong lin’r with +ve slope
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

T/F: Correlation gives NO info abt actual numerical value of slope

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If we have N pairs of observations of X and Y values… (covariance & correlation)

A

Sample covariance:

R = 1/(N-1) SUM (Xi - Xbar)(Yi-Ybar)

Sample correlation:

r = R/sXsY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample correlation and covariance characteristics

A
  • Random fluctuations in data will produce random flucts in computed values
  • Random variables
  • Estimates of true covariance and correlation
  • Work with values as guides without computing conf intervals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Don’t assume ________ equals _________.

A

Correlation

Causality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Rule of thumb for standard normal random variable

A

95% of values of Normal histogram occur within +/- 2 st. devs. of mean

17
Q

Another name for bivariate distribution

A

Joint distribution (betw 2 variables)

18
Q

T/F: Joint distributions will have covariance matrix, and diagonals are equal to variance of X (top left) and Y (bottom right)

19
Q

What happens to bivariate distribution as correlation increases?

A

distribution is stretched along X = Y line, contours more elliptical

20
Q

T/F: If X and Y are not strongly correlated, the distribution will be stretched

A

F: more circular, less of a trend

21
Q

The multivariate Normal distr describes frequency with which vectors of values X1, Y1, X2, Y2,… Xn, Yn occur

A

F: X1, X2, X3… Xn

22
Q

Bivariate UNIFORM probability distribution

A

Take non-0 values over certain interval: change of getting value in interval is same everywhere (contour is a single square)

23
Q

Linear model

A

Linear in parameter(s)

24
Q

Distinguish lin’r from nonlin’r regression models

A

Take first derivative wrt parameters - does derivative depend on parameters?

25
Fundamental framework
There is always: - Fundamental behaviour (deterministic) - Little bit of random noise
26
For the linear model, the observations vector/matrix/table form rows and columns are:
ROWS: # runs COLUMNS (in matrix): variables by which we're evaluating response (ie. T, V, P)
27
Least Squares Estimation
Minimize sum of squared prediction errors (or min square lengths betw model prediction (line) and observed value)
28
Residual eqn
epsilon = y - y(hat)
29
T/F: For lin'r model, values of slope and intercept estimate will NEVER depend on each other
F: they will often
30
Assumptions of LSE
1. Values of explanatory variables (x's) are known exactly 2. Model eqn form provides an adequate representation of the data ("model is correct") 3. Noise variance is cst over range of data collected 4. Noise in each obs is statistically ind from noise in other obs 5. Typically assume noise is Normally distr