Regression Flashcards

1
Q

Regression

A

Regression can be defined as a method or an algorithm in Machine Learning that models a target value
based on independent predictors. It is essentially a statistical tool used in finding out the relationship
between a dependent variable and an independent variable. This method comes to play in forecasting
and finding out the cause and effect relationship between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Regression techniques differ based on:

A
  1. The number of independent variables
  2. The type of relationship between the independent and dependent variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

data used

A

Regression is basically performed when the dependent variable is of a continuous data type. The
independent variables, however, could be of any data type — continuous, nominal/categorical etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

regression methods do..

A

Regression methods find the most accurate line describing the relationship between the dependent
variable and predictors with least error. In regression, the dependent variable is the function of the
independent variable and the coefficient and the error term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Correlation

A

is a measure of the strength of a linear relationship between two quantitative variables
(e.g. price, sales)

  • Correlation is positive when the values increase together
  • Correlation is negative when one value decreases as the other increases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Correlation can have a value

A
  • 1 is a perfect positive correlation
  • 0 is no correlation (the values don’t seem linked at all)
  • -1 is a perfect negative correlation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

cross tabs

A

Cross tabs help us establish a relationship between two variables. This relationship is exhibited in a tabular form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Column percentages

A

(these are percentages within the columns, so that each column’s
percentages add up to 100%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

in cross tabs when the variables are not ordered..

A

where both variables are not ordered, we can simply refer to the strength of the
correlation without discussing its direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Scatterplots

A
  • A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different numeric
    variables.
  • The position of each dot on the horizontal and vertical axis indicates values for an individual
    data point.
  • Scatter plots are used to observe relationships between variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What type of correlation is shown here?

A

This is a negative correlation. As we move along the x-axis toward the greater numbers,
the points move down which means the y-values are decreasing, making this a negative correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Pearson’s r

A
  • The Pearson correlation coefficient is used to measure the strength of a linear association between
    two variables.
  • where the value r = 1 means a perfect positive correlation and the value r = -1 means a
    perfect negative correlation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Requirements for Pearson’s correlation coefficient are as follows: Scale of measurement should be
interval or ratio

A
  • Variables should be approximately normally distributed
  • The association should be linear
  • There should be no outliers in the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does this test do?
Pearson’s r

A
  • The Pearson product-moment correlation coefficient (or Pearson correlation coefficient, for short) is a
    measure of the strength of a linear association between two variables and is denoted by ‘r’.
  • Basically,
    a Pearson product-moment correlation attempts to draw a line of best fit through the data of two
    variables,
  • Pearson correlation coefficient, r, indicates how far away all these data points are to
    this line of best fit (i.e., how well the data points fit this new model/line of best fit)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What values can the Pearson correlation coefficient take?

A
  • The Pearson correlation coefficient, r, can take a range of values from +1 to -1.
  • A value of 0 indicates
    that there is no association between the two variables.
  • A value greater than 0 indicates a positive
    association; that is, as the value of one variable increases, so does the value of the other variable.
  • A value less than 0 indicates a negative association; that is, as the value of one variable increases, the
    value of the other variable decreases.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can we determine the strength of association based on the Pearson correlation coefficient?

A
  • The stronger the association of the two variables, the closer the Pearson correlation coefficient, r, will
    be to either +1 or -1 depending on whether the relationship is positive or negative, respectively.
  • Achieving a value of +1 or -1 means that all your data points are included on the line of best fit – there
    are no data points that show any variation away from this line.
  • Values for r between +1 and -1 (for
    example, r = 0.8 or -0.4) indicate that there is variation around the line of best fit. The closer the value
    of r to 0 the greater the variation around the line of best fit.
17
Q

if we use a simple linear regression model where y depends on x, then the regression line
of y on x is:

A

y = a + bx

18
Q

regression constant

A

The two constants a and b are regression parameters. Furthermore, we denote the
variable b as byx and we term it as regression coefficient of y on x.

19
Q

least square method is suitable for

A
  • We can call it the best fit as
    the result comes from least squares.
  • This method is the most suitable for finding the value
    of y on x i.e. the value of a dependent variable on an independent variable.
20
Q

The standard form of the regression equation of variable x on y is:

A

[ x – x¯ ]/Sx = r[ y – y¯ ]/Sy

21
Q

a regression line

A

: In statistics, a regression line is a line that best describes the behaviour of a set of data. In other
words, it’s a line that best fits the trend of a given data.

22
Q

The regression line formula is like the following:

A

(Y = a + bX + u)

23
Q

The multiple regression formula looks like this

A

(Y = a + b1X1 + b2X2 + b3X3 + … + btXt +u.)

u is the residual regression

24
Q

purpose of regression line

A
  • Regression lines are very useful for forecasting procedures.
  • The purpose of the line is to describe the
    interrelation of a dependent variable (Y variable) with one or many independent variables (X variable).
  • By using
    the equation obtained from the regression line an analyst can forecast future behaviours of the dependent
    variable by inputting different values for the independent ones. Regression lines are widely used in the financial
    sector and in business in general
  • Financial analysts employ linear regressions to forecast stock prices, commodity prices and to perform
    valuations for many different securities.
  • companies employ regressions for the purpose of
    forecasting sales, inventories and many other variables that are crucial for strategy and planning.
25
Q

Correlation

A

Correlation is a statistical technique which tells us how strongly the pair of variables are linearly related and
change together.
- . It does not tell us why and how behind the relationship but it just says the relationship exists.
- Example: Correlation between Ice cream sales and sunglasses sold.

26
Q

Causation

A
  • Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a
    causal relationship between the two events. This is also referred to as cause and effect.