Topic 8 Flashcards
(36 cards)
what is a histogram
graph showing the frequency of measurements/obeservations plotted against the range of observations
an important data exploration and summary tool
explain modality and symmetry
median - 50% higher and 50% lower
skew… what way is the tail facing
symmetrical distribution
mode, median, mode are coincident
modality
when there are more than one value with a high frequency
greatly impacts the use of median and mean measures
sample mean vs population mean
divided by the sample number or entire population
we use deviation from mean ro deviation from the median
median is more popular because it doesnt get so easily affected by outliers
standard deviation is important with data transformations T/F?
true
what is data normalization
raw totals (Numerator) are standardized against a denominator
min-max scaling
comparing something to make it comprehensible
standardization (z-score normalization)
types of normalization
denominator is standard deviation
max is 1
min is 0
goes to 0
min-max scaling
important for rasters
range of data
min and max values
does not go to 0 only to the min value
how do you know when you need to standardize your data?
know your data before normalizing it. Normalizing unrelated data is like mixing apples and oragnes. It makes fruit salad, not a good analysis
not all variables need to be normalized
results can be proportions or percentages
data classification considerations
grouping of numerical data into classes for mapping, with each class represented by an individual symbol
class interval: where to put breaks in the data
number of intervals : 4-7
describe equal intervals
equal intervals or steps along the number line
determine data range
not very good
susceptible to outliers
describe quantiles
each class contains the same number of observations/values
easy tp understand
describe mean standard deviation
derive classes from the descriptive statistics of overall data distribution
worst method
maximum breakes (defined interval)
derive classes from groups of similar data values according to local citerion
calculation of classes order data from low to high
use largest differences as class breaks
can be good
susceptible to outliers
you dont see contrast
natural breaks
subjective, visual/manual determination of logical breaks in data distribution in dispersion graph or histogram
depends on what you want to highlight
geoeetrical intervals
class breaks are based on a geoetric series
good for highly skewed data
good for computers
optimal (fisher-jenks)
computational approaches to mimimizing classification error
most common method
indentifies low points in data
rating of classification methods major points
quantiles is only good for ordinal data
optimal is only good one for helping assist with selecting number of classes
enumeration and spatial fallacies
areal aggregation
census tracts
best when units are similar sizes
MAUP
depending on the geometry of spatial organization impacts your outcome and how it will look on the map
change the area = change results
jenks (optimal) tends to stay away from using mean as a central measure T/F?
True
what is multivariate mapping
encoding two or more variables into the symbolization
trade off between the information content and the complexitiy of the map
two main groups
inter-symbol encoding
intra-symbol encoding
bivariate choropleth maps
bivariate normalization (value by alpha)
what is inter-symbol encoding
symbolize 2 symbols concurrently (complimentary symbols)