GEOG364 Final Flashcards Preview

College! > GEOG364 Final > Flashcards

Flashcards in GEOG364 Final Deck (141):
1

runs count

a one dimensional autocorrelation measure

2

joins count

a two dimensional autocorrelation measure

3

spatial autocorrelation generally explained

the correlation of a variable to itself through space
similarity in position vs similarity in attributes

4

free sampling and example

the outcome is always random and not determined by previous results
example being flipping a coin

5

non-free sampling and example

when the outcome is affected by the previous result
example being a card being picked from a deck. each card taken affects the probability of the next card

6

4 factors that can dramatically influence spatial autocorrelation results

a sample size smaller than 30
one category of values occurs in less than 20% of the data
the region is elongated and has few joins
there are a couple of features with many joins and some with very few

7

name a limitation of joins counts

it does not work for numeric data
numbers can be reclassed as "high/low," but this throws away much information

8

what are the two alternatives to use so for joins/counts to measure spatial autocorrelation

moran's i
geary's c

9

in general, what does moran's i and geary's c measure?

they compare the differences in neighbors compared differences in values in the entire study area

10

in moran's i or geary's c what does it mean if the difference between neighboring features is less than between all other features

it would mean that the neighboring features could be considered clustered

11

which spatial autocorrelation uses squared differences between adjacent cases

geary's c

12

which spatial autocorrelation measure uses a covariance term

moran's i

13

name two similarities between geary's c and moran's i FORMULAs

they both divide by total "w" to account for the number of pairs of cases
they both divide by a variance term in order to account for range of data

14

explain what -1, 0, and 1 would mean in a spatial autocorrelation analysis

it would mean you are using moran's i
-1 means negative autocorrelation and the data is dispersed
0 means there is no autocorrelation and pattern is random
1 would mean positive autocorrelation and attributes are clustered

15

explain what 0, 1, and 2 would mean in a spatial autocorrelation analysis

it would mean you are using geary's c
0 means positive autocorrelation and values are clustered
1 means no autocorrelation with random values and no apparent pattern
2 means negative spatial autocorrelation with dispersed value (high-low)

16

match the numbers of moran's i to geary's c

-1 = 2 = negative spatial auto
0 = 1 = no autocorrelation
1 = 0 = positive autocorrelation

17

what does the w represent in a spatial autocorrelation analysis?

the weight given to a measure to set adjeacency
for example, what distance/time/cost would make two features neighbors?

18

what is the alternative method for etsting significance when etsting geary's c or moran's i?

the monte carlo simulation

19

what does monte carlo simulation do?

it generates a sample distribution for a given test statistic. this test statistic can then be used to assess significance

20

global statistics

value summarizes a characteristic for an entire study region

21

why is it important to use measures of autocorrelation in a region?

spatial homogeneity does not exist over global regions/entire study area

22

what do you call it when autocorrelation is low in one area of a region and high in another

spatial heterogeneity

23

LISA

local indicators of spatial autocorrelation
local versions of geary's c and moran's i

24

what does LISA measure that is different than geary's c or moran's i?

LISA measures levels of particular clusters, not overall clustering

25

what is the preffered tets of choice for local clustering measures

moran's i

26

name 4 objectives of a regression analysis

to determine whether a relationship exists
to describe the nature of the relationship mathematically
to assess the degree of accuracy with which the model represents the relationship
in the case of multiple regression, to understand the relative importance of individual independent vairables

27

regression VS correlation

correlation provides us with the extent of a relationship between two variables
a regression analysis provides us with the nature of that relationship

28

y in regression

the dependent variable

29

x in regression

the independent variable

30

a and b in regression

the correlation coefficients

31

e in regression

the random error or residual that the model does not account for

32

what is the line and what does it show in regression

the line is a statistical model that shows the expected mean value of y for each value x

33

how do we create a regression line

by applying a least square criterion

34

what does the least square criterion do

it chooses the line that minimizes the differences between the line it creates and the data points that are given

35

what are the 4 steps to a regression analysis

1. specify independent and dependent variables
2. use sample data to estimate a and b in the model
3. estimate model error and check assumptions
4. evaluate the statistical usefulness of the model

36

can regression describe causality?

NO, it only helps describe the nature of the relationship

37

what are the 4 assumptions made in a regression analysis

1. mean error is 0
2. variance of the error is constant across x values
3. error is normally distributed
4. no relationship exists between y and the residual/error

38

is regression an extension of correlation or is correlation an extension of regression?

regression is an extension of correlation

39

what does ANOVA measure

is measures the variance and overall significance of a regression model

40

how does the size of residuals affect a regression model

smaller residuals mean that the line is a good fit and the model is accurate

41

what is the range for r squared values

0-1

42

what does an r squared value of 0 or 1 mean

o means the line is excellent and there is no difference
1 means the line is horribly off and there is large differences

43

what may r squared look like in a software output

ESS

44

what does standard error of estimates show

it estimates the standard deviation of the errors/residuals
how close are the observed values to the line?
how many values fall within 95% of the value of the fitted line

45

what is a regression model not good for?

estimating a value outside the range of observed value EXTRAPOLATION

46

what is the difference between multiple regression and simple regression

multiple regression uses multiple independent variable

47

name an example of a multiple regression

a linear trend surface

48

multicollinearity

an assumption in a multiple regression analysis
assuming that independent variables do not exhibit high correlation among each other

49

what is trend surface analysis an example of

how regression analysis can be applied to spatial problems

50

what does ANOVA stand for

analysis of variance

51

what is a synonym of ANOVA

statistical analysis, but ANOVA goes over the top

52

what does ANOVA address?

different types of variance and then relates them to overall variance

53

how could you apply ANOVA to following regression
predicting plant growth by fertilizer application

you could additionally asks whether different types of fertilizer has varying effects on plant growth

54

what is a name for two or more categorical predictor variables in ANOVA

factors

55

in terms of columns and rows what does ANOVA compare?

the difference between variables within one column to the overall variation between two different columns

56

name the 4 probability distributions

normal
z
t
f

57

what probability distribution does ANOVA use

f distribution

58

what is the f statistic

a measure for the ratio of the first sample variance to the second sample variance

59

what is a HUGE factor in determining variance values

sample size

60

the larger the sample size, the _____ the sample variance

smaller

61

how do we determine degrees of freedom

sample size minus 1

62

when is ANOVA very useful

when predictor variables are categorical
gender, regions, beer labels

63

what two groups is variance split into in an ANOVA distribution

within group variance and between group variance
INTRA AND INTER

64

why are interaction effects important

think about the gender and beer goggles example. beer has a very different effect on different genders. but when looking at the beer goggles effect free of gender the results are very different

65

what is the main advantage of using anova

it allows for individual studies to be replaced by one study that compares more factors

66

what happens when your test statistics are not continuous, but categorical

use non parametric stats

67

name 2 parameters

mean and standard deviation

68

give three examples of non parametric data

raw counts
number of protected plants in a forest that are stable or declining
number of people receiving social security or not
number of crimes in spring VS summer vs Winter

69

what is another way to describe the chi square test

a goodness of fit test

70

what are two variables in the chi square formula

expected and observed variable

71

what is the most popular non parametric test

chi square

72

what is the arguable 5th scale of measurement?

cyclic
compass directions, months of year
what would the avg direction be between north and south?

73

mean objective of descriptive stats

organization and summary of data

74

what is the main difference between descriptive and inferential stats

inferential stats provide insights of a population on the basis of SAMPLES and test a hypothesis

75

the three measures of central tendency

mean
mean
median

76

the three measures of dispersion

range
iqr
variance and/or stand dev

77

how do you find range of data

it is the difference between the highest and lowest valued observation

78

what is the IQR

the difference between the first and third quartile

79

variance

calculates how much each value differs from the mean

80

what is stand dev

the square root of the variance

81

what is first order variation

changes in observation in spatial autocorrelation are due to changes in local environment

82

what is second order variation

variation in spatial autocorrelation is due to relationship with other attributes - not the environment itself

83

ecological fallacy

confusing correlation for causation

84

MAUP

changing classification, boundaries, or extent can change the display of the data

85

non uniformity of space

coastal area may have more cases of the flue not because they are near the water, but because they also often have higher population density.

86

edge effects

entities may only have a neighbor on one side. think of a crime map of mexico along the US border without US data

87

what is the difference between euclidean and manhattan block distance

we can consider euclidean as the crow flies
manhattan must go around edges

88

quantile classification

every class contains the same number of entities

89

equal interval classification

dividing your data into equal intervals
the difference between the highest and lowest value in each class is the same

90

advantage and disadvantage of natural breaks

advantage is that it is good for unevenly distributed data
disadvantage is that datasets cannot be easily compared

91

quantile advantage and disadvantage

advantage is that relative positions (top 20%) can be shown GOOD for evenly distributed data
disadvantage is that the breaks are unnatural

92

equal interval advantage and disadvantage

good food mapping continuous data and is easy to understand
disadvantage is that if data is clustered some classes will be heavily clustered

93

goods and bads of stand dev classification

it is good for normally distriobuted data and getting an idea of how data compares to mean
disadvantage is that the actual values are not displayed and outliers strongly influence mean

94

what classification scheme should be used for evenly/unevenly distributed data

for evenly distributed data use equal interval, stand dev, or quantile
for uneven use natural breaks

95

about how many classes should be used

use between 3 and 7 classes

96

mean center

simply the average of the x and y coordinates
(center of gravity)

97

what is the problem with mean center

outliers affect the hell out of it

98

what is an example of weighted mean center

rather than simply finding the mean center national park, weight the values by weighing the amount of visitors each has per year

99

median center

the coordinate with the shortest distance to all features in the study

100

central feature

the FEATURE with the shortest distance to all other features

101

median center vs central feature

median center does not need to exist
central feature must exist
median calculates the most accessible location while central feature finds the most accessible entity

102

what are the three defining parameters of a standard deviational elipse

the dispersion along the major axis
the dispersion along the minor axis
the angle of rotation

103

what is the difference between absolute and relative frequencies

relative frequencies are absolute frequencies divided by total number of observations and.
all of them will add up to 1

104

what is the link between observed data and the normal distribution curve

the z score

105

population

total set of elements under examination in a study

106

sample

group of elements actually studied

107

census

when an entire population is studied

108

sampling error

when uncertainty arises from working with a sample rather than a population

109

sampling bias

when the samples used contain a certain population characteristics

110

central limit theorem

if many samples of the same size are taken the distribbution will be normal
the mean should be the same as the population mean

111

what is a type 1 error

the null hypothesis is true, but we reject it

112

what is a type 2 error

the null hypothesis is false, but we do no reject it

113

what type of error is it if the alternative hypothesis is true, but we accept the null

type 2

114

what type of error is it if the alternative hypothesis is false, but we we reject the null

type 1

115

what kind of associations can there be between 2 variables?

experimental and correlational

116

experimental correlation

we are in charge of one of the variable

117

correlational correlation

we simply observe both the control and the other

118

what does pearson's r measure?

the strength of a linear relationship between two variables

119

what is the value range for pearson's r?

-1 to 1

120

what would the pearson value be if both x and y increase simultaneously

near 1
positive

121

what are 2 conditions that should be had if using pearson's r

the data should not contain extreme outliers
the variance of x and the variance of y should be roughly equal - homoelasicity

122

what happens to mean and variation when data is aggregated?

variation is minimized, but mean remains constant

123

problem with MAUP and data aggregation

if you aggregate data n/s vs e/w the aggregated results will be different

124

what kind of distance can have multiple shortest routes?

manhattan distance

125

is adjacenecy a binary concept?

yes

126

how do you calculate margin of error

plus or minus 1/(SQRT(N))

127

what happens to margin of error as the sample size increases

it lowers

128

what will a distribution table be two tailed

when using the alternative hypothesis`

129

what is the empirical rule

68% of data lies within 1 stand dev of mean

130

define type 1 and 2 errors simply

if ho is true type 1
if ho is false type 2

131

why must we use INVERSE distance weighting

if we used the raw data then features with greater distances would have a greater effect on features, but we want them to have less of an effect because they are far away

132

Does a value of .8 mean moran's I is significant?

no, moran's i indicates the strength of a correlation and significance must be addressed in an entirely different manner

133

explain clusters vs clustering

when we say clusters we are referring to a specific cluster of high values (for example counties)
if we speak of clustering we may be discussing the general amount of clusters all throughout Pennsylvania

134

what is the difference between pearson's r and spearman's correlation coefficient?

pearson's r refers to a parametric test involving two quantitative variables
spearman's refers to a non parametric test used for qualitative or ordinal data

135

when is mean not a good measure of center?

when the data is not normally distributed or skewed left or right

136

give an example of a run in coins

having 8 heads in a row would be a run

137

give an example of a join using coins

a join would be having a head, then a tail

138

what determines whether if it will be one or two tailed

the alternative hypothesis

139

when will you have a two tailed sitribution

when the observation in the test statistic does not equal the control

140

how do you standardize a row

divide the weight in question by the sum of the entire row.
basically it's getting the percentage

141

what is factorial ANOVA

this is used when you want to measure the effects two or more independent variables have on an independent variable