Week 2: Data capture, Acquisition and Big Data Flashcards

(78 cards)

1
Q

The two types of data sources

A

Primary and secondary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data requirements are driven by

A

Question parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Layers of data requirements (data slices) that represent the real world

A
  1. Your data
  2. Survey control
  3. Water features
  4. Boundaries
  5. Addresses
  6. Transportation
  7. Elevation
  8. Imagery
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Foundation of all GIS processes

A

Spatial data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data determines:

A
  1. What theme (issues/problems) you can answer
  2. What types of analysis (e.g forestry vs. emergency services dispatch etc.)
  3. Where (study area dimensions and location - a building in Dunedin or the entire South Island)
  4. When (time / date stamp of data)
  5. Quality of your analysis (e.g historical data, paper media, confidentiality)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What determines what spatial data is needed

A
  1. What questions are trying to be answered
  2. What data sets do you need to answer the questions
  3. E.g location of all trees over 2 metres in height in Dunedin
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

2 ways of acquiring spatial data for GIS?

A
  1. Data capture
  2. Data transfer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 methods of data capture

A
  1. Create from primary sources
  2. Derive from secondary sources
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is data captured from primary sources

A

Field collection - GPS, imagery, tree heights, measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is data captured from secondary sources

A

Scan or digitize from existing maps/images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

1 method of data transfer

A

Acquire existing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is existing data acquired

A
  1. Government
  2. Commercial vendors
  3. Data agreements for limited use
  4. Open source
  5. Crowd sourced
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pros of collecting data yourself

A
  1. Control how and when data is collected
  2. Can specify spatial resolution, extent, and spatial and temporal collection parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cons of collecting data yourself

A
  1. Can be costly (hire equipment, personel, aircraft, computer hardware)
  2. Often time consuming to plan and execute
  3. May not be possible to achieve required without specific equipment, expertise or permits
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Pros of using existing data sets

A
  1. Can generally be acquired quickly if in digital format
  2. Numerous data source options (open source, government, commercial vendors)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cons of using existing data sets

A
  1. No control over collection parameters - resolution, extent…
  2. May have limited metadata to describe how it was collected
  3. Fit for purpose - does it meet all data requirements
  4. Cost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

5 stages in data collection projects

A
  1. Planning
  2. Preperation
  3. Digitizing / transfer
  4. Editing / improvement
  5. Evaluation
  6. Repeat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

High quality georeferenced data represents how much of GIS projects resources

A

3/4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data collection is a balance between

A
  1. Speed of data capture
  2. Data quality
  3. Price
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What should be conducted if possible to test equipment and processes

A

Pilot study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How is data obtained from government sources

A
  1. Land information New Zealand (LINZ) - hardcopy topo maps, landonline for land parcels
  2. Department of conservation (DOC) - infrastructure and ecological regions
  3. Landcare research (land cover)
  4. GNS (surface geology, faults)
  5. Territorial local authorities: city councils - DCC
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How is data obtained from commercial sources

A
  1. Eagle technology (ArcGIS dealer) - for StatsNZ census
  2. Critchlow associates (Mapinfo products), cadastral
  3. GeographX (digital elevation models, shaded relief images)
  4. Digital globe - high resolution global satellite imagery
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How is data obtained from data sharing agreements

A
  1. Limited usage licenses
  2. Educational data sets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How is data obtained from open source

A
  1. Government agencies - LINZ - Topo Hydro, Elevation
  2. City councils - DCC - aerial imagery, street maps, rates maps
  3. USGS - Earth explorer - elevation, imagery, land cover
  4. Data warehouses and portals - NSIDC - National Snow Ice Data Center
  5. Commercial and independent sites - Koordinates.com - Pacific Data Hub (pacific nations data portals)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How is data obtained from crowd sourced
1. Open map (OSM) - communally created mapping features 2. Humanitarian OSM Team - HOT - Mapathons (disaster response, public health, clean water)
26
What has to be considered for data usability
Source reliability and quality
27
What determines if a source is reliable
1. Data creator, currency, scale , file format 2. Metadata 3. Is the data fit for purpose 4. Data cleanup / preparation may be required
28
Main methods of inputting primary data collection into a GIS
1. Raster - remote sensing, imagery 2. Vector - GPS, surveying
29
Main methods of inputting secondary data collection
1. Create from hardcopy maps (digitizing and scanning) 2. Convert existing digital data (data model conversions - raster to vector and v.versa), (data format conversions - ArcGIS to Mapinfo), (remote sensing data classification)
30
What is remote sensing
Measuring the properties of objects (e.g physical, chemical, biological, etc.) without direct contact
31
Satellite-based data collection of electromagnetic radiation reflected by
Different earth surfaces
32
Spectral signatures can differentiate land cover between
1. Water 2. Green vegetation 3. Dry bare soil
33
Spatial resolution
1. "The shortest distance over which change is recorded" 2. Dimensions of the smallest object a system can detect
34
Temporal resolution
"the minimum time between collection iterations"
35
Satellite configurations
1. Geo stationary 2. Earth orbiting
36
How are GPS/GNSS and surveying instruments used for collecting spatial data
1. Field collection of position data 2. Accompanied by recording of attribute data 3. Linking spatial features to existing tabular database
37
How is surveying used for collecting spatial data
1. Determine location (x,y,z) by measuring distances and angles from known benchmark points - analogue: transits and theodolites - digital: total stations and LIDAR - up to 1mm accuracy 2. Highly accurate but expensive 3. Ideal for smaller numbers of vector objects (e.g property boundaries)
38
Point cloud definition
1. A set of data points that can be visualised as the framework of an object or the surface of terrain 2. Has a 3d coordinate system (X,Y) coordinate values and a (Z) elevation value for each point
39
Types of lidar from different collection platforms
1. Satellite LIDAR 2. Airbourne LIDAR 3. Mobile LIDAR 4. Tripod LIDAR
40
To determine a location, the GNSS reciever measures
Several satellites
41
What does GNSS stand for
Global navigation satellite systems
42
How many satellites are needed for coverage
3 with extremely good clocks
43
A forth satellite is required to know
The 3D location of the reciever
44
Multi-beam echo sounding (MBES) is used in what type of survey
Hydrographic
45
Many kinds of secondary data only recorded on
Hardcopy maps
46
What is involved during raster scanning of maps
Map is passed over scanner bed (drum or flat) and map features are recorded on a grid
47
Pixel value in raster scanning is determined by
Reflectance of light from the paper map at that point
48
Spatial resolution for GIS
400 to 1000 dpi
49
Large output files require
1. Format conversion 2. Georeferencing 3. Editing and feature extraction
50
Old style of digitizing maps involved
1. Digitizer: a table embedded with a fine wire grid 2. "trace" linear features or locate point features 3. Resolution from .01" to .001" 4. Operator controls which map features to extract 5. Many existing map data were captured using digitizing
51
New style of digitizing in software involves
1. Digitizing features on computer screen based on digital orthophotos 2. Relatively easy, but usually not as accurate as table digitizing
52
Format conversion is
Conversion of data formats within a single data model - e.g import AutoCAD DWG file into vector GIS - shape files
53
Rasterization
Converting vector data to raster data - from points / lines / polygons to grid cells
54
Vectorization
Converting raster data to vector data - from grid cells to points / lines / polygons
55
Big data can be described by
The 3 Vs
56
What are the 3 Vs
1. Volume, Velocity, Variety
57
How is volume defined in big data
Data is too large for standard tools and processing computers
58
How is velocity defined in big data
Data is being created / collected very quickly
59
How is variety defined in big data
There are many types of data in many databases, emails / texts / images / via / social media
60
Approximately how much data is created each day
328.77 million terabytes
61
Approximately how much data will be generated in 2024
120 Zettabytes
62
Approximately how much data will be generated in 2025
181 zettabytes
63
Videos account for how much of internet data traffic
Over half
64
How many data centres do the US and NZ have
2700 in the US 81 in NZ
65
Stores and vendors often provide loyalty rewards cards to customers to encourgae them to
Shop at their store and provide them discounts
66
The data collected from loyalty cards are very valuable because
It can be used to track what people are buying, when they are buying it and at what price (data mining)
67
Big data challenges
1. Exceeds capacity of current computing systems 2. Data quality - messiness in data 3. Different date formats (excel supports two date systems) 4. Biases in the data may not be visible (the method of collection may not be documented, slection bias)
68
Why can data quality have messiness
1. May contain errors or be incomplete due to automated data collection (online survey or instrument failure) 2. Differences in data formating and column headers (may require data cleanup if detected)
69
What data is being created (about/by) you right now?
1. Phone - location, text, images 2. Smart watch (biometrics) - immobility, heart rate, O2 3. IP ping - wifi, cell network 4. Logins to digital accounts 5. Social media 6. Traffic status
70
There are often uses for big data beyond what could be achieved with
Normal data
71
Limitations of data usage is caused by
Data not being defined, and not considered when the data was collected/created
72
Unethical GIS practices
1. Lying with maps 2. Using GIS to plan/support war (intentional harm) 3. Using GIS as a surveillance tool 4. Providing insufficient data quality in a GIS, resulting in unreliable decision making 5. Causing unintentional harm to people (through degredation in their quality of life or safety) 6. Allowing deception in the use of GIS for analysis when the measn are suspect but a socially desirable result occurs 7. Not acknowledging the work of others
73
Open data
Data that can be freely accessed, used, reused, and redistributed without hindrance by anyone - subject only at most to the requirement for attribution and share - alike
74
What is the cost of open data
1. There is an overhead cost for a government of organization to compile/host/update data 2. Raw data can be less expensive to provide but may not be sufficient for the general data user
75
VGI stands for
Volunteered geographic information
76
VGI - Openstreetmap allows users to
Add features based on aerial imagery to build up unmapped areas
77
The privacy debate on open data is
The need for readily available information for the social and economic benefit of the community vs. The responsibility to ensure that the rights of individuals and groups are not infringed by abuse of that information
78
Census data aggregations
Meshblocks will be reported no smaller than a certain number to ensure privacy