Flashcards in GEOG364 Final Deck (141):

1

## runs count

### a one dimensional autocorrelation measure

2

## joins count

### a two dimensional autocorrelation measure

3

## spatial autocorrelation generally explained

###
the correlation of a variable to itself through space

similarity in position vs similarity in attributes

4

## free sampling and example

###
the outcome is always random and not determined by previous results

example being flipping a coin

5

## non-free sampling and example

###
when the outcome is affected by the previous result

example being a card being picked from a deck. each card taken affects the probability of the next card

6

## 4 factors that can dramatically influence spatial autocorrelation results

###
a sample size smaller than 30

one category of values occurs in less than 20% of the data

the region is elongated and has few joins

there are a couple of features with many joins and some with very few

7

## name a limitation of joins counts

###
it does not work for numeric data

numbers can be reclassed as "high/low," but this throws away much information

8

## what are the two alternatives to use so for joins/counts to measure spatial autocorrelation

###
moran's i

geary's c

9

## in general, what does moran's i and geary's c measure?

### they compare the differences in neighbors compared differences in values in the entire study area

10

## in moran's i or geary's c what does it mean if the difference between neighboring features is less than between all other features

### it would mean that the neighboring features could be considered clustered

11

## which spatial autocorrelation uses squared differences between adjacent cases

### geary's c

12

## which spatial autocorrelation measure uses a covariance term

### moran's i

13

## name two similarities between geary's c and moran's i FORMULAs

###
they both divide by total "w" to account for the number of pairs of cases

they both divide by a variance term in order to account for range of data

14

## explain what -1, 0, and 1 would mean in a spatial autocorrelation analysis

###
it would mean you are using moran's i

-1 means negative autocorrelation and the data is dispersed

0 means there is no autocorrelation and pattern is random

1 would mean positive autocorrelation and attributes are clustered

15

## explain what 0, 1, and 2 would mean in a spatial autocorrelation analysis

###
it would mean you are using geary's c

0 means positive autocorrelation and values are clustered

1 means no autocorrelation with random values and no apparent pattern

2 means negative spatial autocorrelation with dispersed value (high-low)

16

## match the numbers of moran's i to geary's c

###
-1 = 2 = negative spatial auto

0 = 1 = no autocorrelation

1 = 0 = positive autocorrelation

17

## what does the w represent in a spatial autocorrelation analysis?

###
the weight given to a measure to set adjeacency

for example, what distance/time/cost would make two features neighbors?

18

## what is the alternative method for etsting significance when etsting geary's c or moran's i?

### the monte carlo simulation

19

## what does monte carlo simulation do?

### it generates a sample distribution for a given test statistic. this test statistic can then be used to assess significance

20

## global statistics

### value summarizes a characteristic for an entire study region

21

## why is it important to use measures of autocorrelation in a region?

### spatial homogeneity does not exist over global regions/entire study area

22

## what do you call it when autocorrelation is low in one area of a region and high in another

###
spatial heterogeneity

23

## LISA

###
local indicators of spatial autocorrelation

local versions of geary's c and moran's i

24

## what does LISA measure that is different than geary's c or moran's i?

### LISA measures levels of particular clusters, not overall clustering

25

## what is the preffered tets of choice for local clustering measures

### moran's i

26

## name 4 objectives of a regression analysis

###
to determine whether a relationship exists

to describe the nature of the relationship mathematically

to assess the degree of accuracy with which the model represents the relationship

in the case of multiple regression, to understand the relative importance of individual independent vairables

27

## regression VS correlation

###
correlation provides us with the extent of a relationship between two variables

a regression analysis provides us with the nature of that relationship

28

## y in regression

### the dependent variable

29

## x in regression

### the independent variable

30

## a and b in regression

### the correlation coefficients

31

## e in regression

### the random error or residual that the model does not account for

32

## what is the line and what does it show in regression

### the line is a statistical model that shows the expected mean value of y for each value x

33

## how do we create a regression line

### by applying a least square criterion

34

## what does the least square criterion do

### it chooses the line that minimizes the differences between the line it creates and the data points that are given

35

## what are the 4 steps to a regression analysis

###
1. specify independent and dependent variables

2. use sample data to estimate a and b in the model

3. estimate model error and check assumptions

4. evaluate the statistical usefulness of the model

36

## can regression describe causality?

### NO, it only helps describe the nature of the relationship

37

## what are the 4 assumptions made in a regression analysis

###
1. mean error is 0

2. variance of the error is constant across x values

3. error is normally distributed

4. no relationship exists between y and the residual/error

38

## is regression an extension of correlation or is correlation an extension of regression?

### regression is an extension of correlation

39

## what does ANOVA measure

### is measures the variance and overall significance of a regression model

40

## how does the size of residuals affect a regression model

### smaller residuals mean that the line is a good fit and the model is accurate

41

## what is the range for r squared values

### 0-1

42

## what does an r squared value of 0 or 1 mean

###
o means the line is excellent and there is no difference

1 means the line is horribly off and there is large differences

43

## what may r squared look like in a software output

### ESS

44

## what does standard error of estimates show

###
it estimates the standard deviation of the errors/residuals

how close are the observed values to the line?

how many values fall within 95% of the value of the fitted line

45

## what is a regression model not good for?

### estimating a value outside the range of observed value EXTRAPOLATION

46

## what is the difference between multiple regression and simple regression

### multiple regression uses multiple independent variable

47

## name an example of a multiple regression

### a linear trend surface

48

## multicollinearity

###
an assumption in a multiple regression analysis

assuming that independent variables do not exhibit high correlation among each other

49

## what is trend surface analysis an example of

### how regression analysis can be applied to spatial problems

50

## what does ANOVA stand for

### analysis of variance

51

## what is a synonym of ANOVA

### statistical analysis, but ANOVA goes over the top

52

## what does ANOVA address?

### different types of variance and then relates them to overall variance

53

##
how could you apply ANOVA to following regression

predicting plant growth by fertilizer application

### you could additionally asks whether different types of fertilizer has varying effects on plant growth

54

## what is a name for two or more categorical predictor variables in ANOVA

### factors

55

## in terms of columns and rows what does ANOVA compare?

### the difference between variables within one column to the overall variation between two different columns

56

## name the 4 probability distributions

###
normal

z

t

f

57

## what probability distribution does ANOVA use

### f distribution

58

## what is the f statistic

### a measure for the ratio of the first sample variance to the second sample variance

59

## what is a HUGE factor in determining variance values

### sample size

60

## the larger the sample size, the _____ the sample variance

### smaller

61

## how do we determine degrees of freedom

### sample size minus 1

62

## when is ANOVA very useful

###
when predictor variables are categorical

gender, regions, beer labels

63

## what two groups is variance split into in an ANOVA distribution

###
within group variance and between group variance

INTRA AND INTER

64

## why are interaction effects important

### think about the gender and beer goggles example. beer has a very different effect on different genders. but when looking at the beer goggles effect free of gender the results are very different

65

## what is the main advantage of using anova

### it allows for individual studies to be replaced by one study that compares more factors

66

## what happens when your test statistics are not continuous, but categorical

### use non parametric stats

67

## name 2 parameters

### mean and standard deviation

68

## give three examples of non parametric data

###
raw counts

number of protected plants in a forest that are stable or declining

number of people receiving social security or not

number of crimes in spring VS summer vs Winter

69

## what is another way to describe the chi square test

### a goodness of fit test

70

## what are two variables in the chi square formula

### expected and observed variable

71

## what is the most popular non parametric test

### chi square

72

## what is the arguable 5th scale of measurement?

###
cyclic

compass directions, months of year

what would the avg direction be between north and south?

73

## mean objective of descriptive stats

### organization and summary of data

74

## what is the main difference between descriptive and inferential stats

### inferential stats provide insights of a population on the basis of SAMPLES and test a hypothesis

75

## the three measures of central tendency

###
mean

mean

median

76

## the three measures of dispersion

###
range

iqr

variance and/or stand dev

77

## how do you find range of data

### it is the difference between the highest and lowest valued observation

78

## what is the IQR

### the difference between the first and third quartile

79

## variance

### calculates how much each value differs from the mean

80

## what is stand dev

### the square root of the variance

81

## what is first order variation

### changes in observation in spatial autocorrelation are due to changes in local environment

82

## what is second order variation

### variation in spatial autocorrelation is due to relationship with other attributes - not the environment itself

83

## ecological fallacy

### confusing correlation for causation

84

## MAUP

### changing classification, boundaries, or extent can change the display of the data

85

## non uniformity of space

### coastal area may have more cases of the flue not because they are near the water, but because they also often have higher population density.

86

## edge effects

### entities may only have a neighbor on one side. think of a crime map of mexico along the US border without US data

87

## what is the difference between euclidean and manhattan block distance

###
we can consider euclidean as the crow flies

manhattan must go around edges

88

## quantile classification

### every class contains the same number of entities

89

## equal interval classification

###
dividing your data into equal intervals

the difference between the highest and lowest value in each class is the same

90

## advantage and disadvantage of natural breaks

###
advantage is that it is good for unevenly distributed data

disadvantage is that datasets cannot be easily compared

91

## quantile advantage and disadvantage

###
advantage is that relative positions (top 20%) can be shown GOOD for evenly distributed data

disadvantage is that the breaks are unnatural

92

## equal interval advantage and disadvantage

###
good food mapping continuous data and is easy to understand

disadvantage is that if data is clustered some classes will be heavily clustered

93

## goods and bads of stand dev classification

###
it is good for normally distriobuted data and getting an idea of how data compares to mean

disadvantage is that the actual values are not displayed and outliers strongly influence mean

94

## what classification scheme should be used for evenly/unevenly distributed data

###
for evenly distributed data use equal interval, stand dev, or quantile

for uneven use natural breaks

95

## about how many classes should be used

### use between 3 and 7 classes

96

## mean center

###
simply the average of the x and y coordinates

(center of gravity)

97

## what is the problem with mean center

### outliers affect the hell out of it

98

## what is an example of weighted mean center

### rather than simply finding the mean center national park, weight the values by weighing the amount of visitors each has per year

99

## median center

### the coordinate with the shortest distance to all features in the study

100

## central feature

### the FEATURE with the shortest distance to all other features

101

## median center vs central feature

###
median center does not need to exist

central feature must exist

median calculates the most accessible location while central feature finds the most accessible entity

102

## what are the three defining parameters of a standard deviational elipse

###
the dispersion along the major axis

the dispersion along the minor axis

the angle of rotation

103

## what is the difference between absolute and relative frequencies

###
relative frequencies are absolute frequencies divided by total number of observations and.

all of them will add up to 1

104

## what is the link between observed data and the normal distribution curve

### the z score

105

## population

### total set of elements under examination in a study

106

## sample

### group of elements actually studied

107

## census

### when an entire population is studied

108

## sampling error

### when uncertainty arises from working with a sample rather than a population

109

## sampling bias

### when the samples used contain a certain population characteristics

110

## central limit theorem

###
if many samples of the same size are taken the distribbution will be normal

the mean should be the same as the population mean

111

## what is a type 1 error

### the null hypothesis is true, but we reject it

112

## what is a type 2 error

### the null hypothesis is false, but we do no reject it

113

## what type of error is it if the alternative hypothesis is true, but we accept the null

### type 2

114

## what type of error is it if the alternative hypothesis is false, but we we reject the null

### type 1

115

## what kind of associations can there be between 2 variables?

### experimental and correlational

116

## experimental correlation

### we are in charge of one of the variable

117

## correlational correlation

### we simply observe both the control and the other

118

## what does pearson's r measure?

### the strength of a linear relationship between two variables

119

## what is the value range for pearson's r?

### -1 to 1

120

## what would the pearson value be if both x and y increase simultaneously

###
near 1

positive

121

## what are 2 conditions that should be had if using pearson's r

###
the data should not contain extreme outliers

the variance of x and the variance of y should be roughly equal - homoelasicity

122

## what happens to mean and variation when data is aggregated?

### variation is minimized, but mean remains constant

123

## problem with MAUP and data aggregation

### if you aggregate data n/s vs e/w the aggregated results will be different

124

## what kind of distance can have multiple shortest routes?

### manhattan distance

125

## is adjacenecy a binary concept?

### yes

126

## how do you calculate margin of error

### plus or minus 1/(SQRT(N))

127

## what happens to margin of error as the sample size increases

### it lowers

128

## what will a distribution table be two tailed

### when using the alternative hypothesis`

129

## what is the empirical rule

### 68% of data lies within 1 stand dev of mean

130

## define type 1 and 2 errors simply

###
if ho is true type 1

if ho is false type 2

131

## why must we use INVERSE distance weighting

### if we used the raw data then features with greater distances would have a greater effect on features, but we want them to have less of an effect because they are far away

132

## Does a value of .8 mean moran's I is significant?

### no, moran's i indicates the strength of a correlation and significance must be addressed in an entirely different manner

133

## explain clusters vs clustering

###
when we say clusters we are referring to a specific cluster of high values (for example counties)

if we speak of clustering we may be discussing the general amount of clusters all throughout Pennsylvania

134

## what is the difference between pearson's r and spearman's correlation coefficient?

###
pearson's r refers to a parametric test involving two quantitative variables

spearman's refers to a non parametric test used for qualitative or ordinal data

135

## when is mean not a good measure of center?

### when the data is not normally distributed or skewed left or right

136

## give an example of a run in coins

### having 8 heads in a row would be a run

137

## give an example of a join using coins

### a join would be having a head, then a tail

138

## what determines whether if it will be one or two tailed

### the alternative hypothesis

139

## when will you have a two tailed sitribution

### when the observation in the test statistic does not equal the control

140

## how do you standardize a row

###
divide the weight in question by the sum of the entire row.

basically it's getting the percentage

141