Data exploration and classification Flashcards

lecture 14

1
Q

what is data exploration?

A

the process of examining data prior to formal structured data analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is data classification in Acr GIS?

A

the data classification tool is a tool which can be used to explore spatial data and is based on descriptive stats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does data exploration include in GIS?

A
  1. In GIS it involves both spatial and attribute data (how & where?)
  2. Media used in GIS includes maps (spatial), graphs, and tables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the crime rate like in Gauteng and where is the highest crime found in this province?

A

projected on a map with stats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does data visualisation invlove?

A
  • Rendering – what to show in a graphic plot & what type of plot to
    make
  • Manipulation – how to operate on individual plots and how to
    organise multiple plots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the fundamental tasks for data exploration?

A
  1. Finding patterns
  2. Posing queries, i.e. exploring data characteristics and data subsets
  3. Making comparisons, i.e. between variables or data subsets

Q – Which portion of my field produces the highest / lowest yield
Q2 – Why do certain portions of my land produce higher yields?
Q – Which areas of Tanzania are most suitable for growing Pinotage?
Q – How does wildfire susceptibility vary across a nature reserve ?
Q – What is the groundwater recharge potential of the Winelands municipality
Q – How does deforestation rates vary across the Peruvian Amazon?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

spatial data exploration statistics?

A

can be:
Descriptive
Inferential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are descriptive statistics?

A

Statistics that provide a statistical summary of a dataset (summary statistic)
1. Measures of central tendency - Describes data by identifying central position.
2. Measures of dispersion .
3. Skewness
4. Kurtosis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are inferential statistics?

A

generalizing from a sample to a population with a calculated degree of certainty.
drawing conclusions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are measures of central tendency?

A

Median, mode, mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are measures of dispersion?

A

Look at the statistical spread or
distribution of a dataset.
Include:
1. Standard deviation / Standaard afwyking
2. Variance/ Variansie
3. Standardised score (z score)

Observe the spread of or trends in
data - can be used to identify outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the standard deviation?

A

Shows how much variation or “dispersion” exists from the average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the variance?

A

Measure of how far a set of numbers is spread out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the standard score (z score)

A

The standardized or z score informs how many standard deviations a
reading is above or below the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is classification?

A

the process of reducing a large number of individual quantitative values to a smaller number of ordered categories, each of which comprises a portion of the original data value range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the different types of classification?

A

Each classification type divides the data value range in different
ways and are used for the classification of interval and ratio
data (mostly):
1. Natural breaks
2. Equal interval classes
3. User defined
4. Quantiles
5. Mean and Standard
Deviation
6. Geometric Interval

17
Q

What is the fundamental principle of classification?

A
  • Each of the original (un-classed) data values must fall into only one of the classes
  • None of the original data values falls into more than one class
  • Always mutually exclusive & exhaustive (if they cannot both be true).
18
Q

Deciding the number of classes:

A

Rules of thumb:
* Monochrome color schemes: No more than 5 to 7 classes.
* Multi-hue map: No more than 9
Need to consider:
* Communication goal?
* Complexity of Spatial Pattern
* Available Symbol Types

19
Q

What is quantitative precision?

A

Communication goal:
* Use larger number of class intervals.
* Each class will represent a relatively small range of the original data values and will therefore represent those values more
precisely.
Trade offs:
* Too much information
* Indistinct symbols

20
Q

What is immediate graphic impact?

A

Communication goal:
* Use smaller number of class intervals.
* Each class will be graphically clear, but will be imprecise quantitatively.
Trade offs:
* Potential for oversimplification
* One class may include wildly
* varying data values

21
Q

What is Jenks natural breaks?

A
  • The Natural Jenks is the default classification method in ArcGIS
  • Minimum variation in value within classes.
  • Maximum variation in value between classes.
  • The method seeks to reduce the variance within classes and maximize the variance between classes.
22
Q

What are the advantages of natural breaks?

A
  • Maximizes the similarity of values within each class
  • Increases the precision of the map given the number of
    classes
23
Q

what are the disadvantages of natural breaks?

A
  • Class breaks often look random
  • Need to explain the method
  • Method will be difficult to grasp for those lacking a background in statistical methods.
24
Q

What is equal interval classification?

A
  • Each class represents an equal portion of original data range.
  • Also called equal size or equal width classification
    Calculation:
    1. Determine range of original values {Range = Max – Min}
    2. Decide Number of classes, {N}
    3. Calculate class width:
    {CW = Range / N}
25
Q

What are the advantages of equal interval?

A
  • Easy to understand, intuitive appeal
  • Each class represents an equal range or amount of the original data range
  • Good for rectangular data distributions.
26
Q

What are the disadvantages of equal intervals?

A
  • Does not often occur in geographic phenomena
  • Not good for skewed data distributions.
27
Q

What is defined interval classification?

A

map author specifies an
interval by which to equally divide a range of values, i.e. class
size
* Intervals may need to be altered to fit the range of the data
* Different from Equal Interval where user specify the number classes
* ArcMap automatically determines the number of classes based on the interval.

Calculation:
1. Set interval size
2. Determine range of original data values:
{Range = Maximum – Minimum}
3. Calculate number of classes:
{N = Range / CW}

28
Q

What are the advantages of the defined interval?

A
  • Easy to understand, intuitive appeal
  • Each class represents a specified amount
  • Good for rectangular data distributions
  • Example: Good for data with “assumed” breaks
29
Q

What are the disadvantages of the defined intervals?

A
  • Not good for skewed data distributions
    ➢ Many classes will be empty and not mapped.
30
Q

What is quantiles classification?

A
  • Places an equal number of cases in each class
  • Sets class break points wherever they need to be in order to accomplish this
31
Q

What are the advantages of quantiles?

A
  • Each class has equal representation on the map
  • Intuitive appeal: map readers like to be able to identify the “top 20%” or the “bottom 20%”
  • Example: Very useful for ordinal data.
32
Q

What are the disadvantages of quantiles?

A
  • Very irregular break points unless data have rectangular distribution.
  • Breaks can sometimes lead to an over-weighting of the outlier in that class division.
33
Q

What is Mean and Standard Deviation classification?

A

Places break points at the Mean and at various Standard Deviation intervals above and below the mean
Mean:
Measure of central
tendency
Standard Deviation:
Measure of variability

34
Q

What are the advantages of the mean and SD?

A
  • Shows how much the feature’s attribute value varies from the mean
  • Useful to emphasize which observations are above the mean and which observations are below the mean
  • Example: Income and education levels
35
Q

What are the disadvantages of mean and SD?

A
  • Many map readers are not familiar with the concept of the standard deviation
  • Not good for skewed data.
36
Q

What is the geometric interval classification?

A
  • Used for visualizing continuous data that is not distributed normally
  • The width of each succeeding class interval is larger than the previous interval by a constant amount.

Calculating the constant amount, CW:
* Decide on number of classes, N.
* Calculate the range: R = Max - Min
* R = CW + 2CW + . . . + NCW

37
Q

What are the advantages of geometric intervals?

A
  • Uneven, but regular class
    breaks
  • Used for data that contains excessive duplicate values,
    e.g., 35% of the features
    have the same value
  • Tends to even out class frequencies for skewed distributions while making class widths relatively small in areas where there is high frequency.
38
Q

What are the disadvantages of geometric intervals?

A
  • Uncommon
  • Unequal width classes