Midterm two Flashcards

(58 cards)

1
Q

What is a database?

A

-Backbone of how raw data is stored and how information is generated from it

-A collection of related data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a database used for? (7)

A

DEOTSSO

  1. Data management systems (DBMS) can house large quantities of data and allow multiple users to work on the database concurrently
  2. (data)Evolve in a database
  3. (in) One location, therefore reduced redundancy
  4. Transfer user knowledge between applications
  5. (data) Sharing is encouraged
  6. (data) Security and standards for data acessc an be developed and enforced
  7. (If) Organized, maintenance costs are reduced and there is a reduced data application
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the process of turning data into value?

A

Structure/organization turns DATA into INFORMATION and efficient management of INFORMATION gives it VALUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between data and information?

A

DATA is RAW collection and then turns INTO information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do we use databases? (4)

A
  1. order data
  2. re order data
  3. summarise data
  4. combine data
    in order to obtain information of value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can be information of value be achieved 5

A

1) shared manner
2) easy access
3) with concurrent access (simultaneous)
4. minimum data duplication
5. with integrity (validity) ensured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why are databases important to GIS?

A

store attribute data associated with spatial data in a GIS

store topological data associated with spatial data in a GIS

store metadata associated with spatial data in a GIS

overall make querying of spatial data possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why are DBMS sucessful?

A
  • a data model to represent real world objects in a digital context stored in a computer

-a data load capability- tools to help import and load data into the database structure

-indices- help speed up searches

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to create a database?

A

-engine filters out to different types of database
-more about thinking things through

1) Data investigation
-why are you collecting the data and what aspects do you need?
-type, quantity or quality, choosing the right entity and attribute

2) Data relationships
- when you have multiple tables, understand the relationships between entities and their attributes

3) Data design and structure
-what software? field names, structures, types are you going to develop?

4) The database
-how do you populate database? process? how is data maintained/upgraded?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the different types of relationships between entities?
what do entity relationship models help us recu / what relationships can databases be joined by ?

A
  1. one to many
  2. many to many
    • multiples compared to multiples, these WILL NOT JOIN
  3. one to one
    • a unique value to a unique value (there is not more than one)
  4. many to one

-help used decide of the proper tables for their relational database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Chens entity relationship model comprised of?

A
  1. Entity sets: the fundamental thematic groupings or phenomena being modelled

2.Relationship sets: “subsets of the cross product of two or more entity sets” ex the subset of hotels in a certian town

3.Mappings: which define the relationship between the members of entity sets, may be one to many, one to one, many to many

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the database types through the years?

A
  1. 1960; hierarchical database (aka navigational)
    -Tree-like structure
    -Starts from a root
    -Child nodes and parents nodes
    -Describes one to many relationships

2) 1980; relational database
-Multiple tables
-Relationships maintained by a common field
-One to many relationship and many to many relationships

3) 1990; object- orientated database

4)Today; NoSQL database, cloud databases, self-driving databases
-Resides on a private, public or hybrid cloud computing platform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are two database types?

A

1) Single files (a database type)
-AKA flat files
-Spreadsheets; the data is stored in columns (fields) and rows (records)
-Useful for smaller datasets
-Will have limited impact on computational speed or storage space

2) Relational databases (a database type)

-Related to one another via unique identifiers that allow users to link two or more databases or tables together
-The unioque identifer can also be referred to a primary key in the origin table
-The same field in the adjoining (destination) table is called the foreign key
-If both tables have the same field name in each database, this is referred to as the common key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are types of attribute data 4

A

1.Nominal
2. Ordinal
3. Interval
4. Ratio (age, height, weight)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What can Deleting and adding attribute fields help with?

A
  1. Inputting the result of a classification or computation
  2. Allows specificity for the type of data field you require
  3. Removed redundant or unnecessary fields
  4. Improves storage capacity and processing time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What record can data entry be in?

A
  1. Single record; typing and changing that record only
  2. Multiple records; calculate command; these records are selected first and then updated with a specific value or classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can you do data queires?

A

1.Select by attribute so ask questions for data via its attributes

  1. Select by location so select data based on their proximity to other features
  2. Use standard query language (SQL)
    -Syntax or SQL statements (using = )
    -Boolean connectors (using “and:” or “or)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is data quality?

A

the indication of how good the data are; overall fitness and suitability of data for a specific purpose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why is data quality important?

A
  1. “Garbage in-garbage out” (aka GIGO) –> need good data because biggers decisions will be based on it
  2. Error prone data
    -Can affect the reliability of the final product
    -Lead to misinterpretation of the final product, affects decision making
    -Provide inaccurate measurements or models
    -Provide inaccurate results of queries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How to improve data quality 7

A
  1. Metadata
    -Metadata is “data about the data” ; the 5 W’s, scale, projection and coordination, transformation, and usage
  2. Understanding accuracy vs precision
    -Accuracy is the extent to which estimated data values approach its true value ex. Plus or minus two meters
    -Precision is the recorded level of detail of your data ex more decimal points

3.Understanding errors

 -Errors are flaws in data; the difference between reality and the GIS computer environment  
 -Errors can be single, definable departures from reality ex. The easting and northing location for one water monitor was entered incorrectly 
 -Errors can also be persistent widespread deviations throughout a whole database ex easting and northings for all water monitor locations was everted incorrectly
  1. Completeness
    -Will cover the entire study area and time period; complete set of attributes in database
  2. Compatibility
    -Is the data used together sensibly? Data should be collected and captured using similar methods ex are the overlays at the same scale
  3. Consistency
    -Applies not only to separate data sets but also within individual data sets
  4. Applicability
    -Appropriateness or suitability of data for a set of commands, operations, analysis or to solve a specific problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What errors can exist within data? 3

A
  1. Bias
    -Systematic variation of data from reality
    -Can be technical or human based
  2. Resolution
    -To describe the smallest feature in a dataset that can be displayed or mapped
    -Raster is figured out by cell size –> whatever the set cell size is and larger seen
    -Vector is determined by scale of the original map, the point size, and line width of the features represented thereon and the precision of digitizing
  3. Generalization
    -Simplifying the complexities of the real world to produce scaled models and maps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is metadata and why is it important 6

A

-International organization of standards (ISO)

-important because
Protects an institution or organizations data investment
1. Helps user understand data
2. Enables discovery
3. Limits liability
4. Highlights prudent data stewardship
5. Reduces workload associated with questions about data
6. Reduces overall costs in the long term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are some sources of conceptional error? 6

A

conceptual errors: Errors stemming from our knowledge and understanding and trying to model reality.

  1. Ecological fallacy; the assumption that an individual from a specific group or area will exhibit a trait that is predominant in the group as a whole –> take one individual context and apply to everyone/whole
  2. MAUP (modification areal unit problem); a challange that occurs during the spatial analysis of aggregated data in which the results differ when the same analysis is applied to the same data, but different aggregation schemes are used
  3. Mental Maps
  4. Individual perception of reality
  5. Spatial models used to reflect reality; vector and raster
  6. Coming from different backgrounds/disciplines; reductionist view (detailed, explains parts of a system ex biology), OR holistic view (broader, tried to explain interrelationship at meso and macro scales)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are errors in source data? 3

A
  1. Survey data
    -Operation of equipment (GPS)
    -Incorrectly inputting the wrong attributes into the database
  2. Remote sensing data or air photos
    -Georeferenced incorrectly
    -Misclassification (clouds or shadows in an image)
    -Time sensitive
  3. Maps
    -Digitizing process
    -Generalization
    -Boundaries (fuzzy)
25
what are some sources of errors in data encoding 2
1. Source map error continuous data -Fuzzy boundaries -Map scale -Map measurements 2.Operational -Mainly focusing on digitizing process (map registration) -Human error; psychological, physiological, line thickness, method of digitizing (point vs stream mode)
26
OVERALL what are the sourcs of error 5
1. conceptional errors: Errors stemming from our knowledge and understanding and trying to model reality 2.Errors stemming from source data 3.Errors in data encoding 4.Errors in data editing and conversion 5. Errors in data processing and analysis
27
what are some sources of errors in data editing and conversation
1. Considered “last line of defense” against errors -Cleaning and editing is done by scrutinizing the work – manual -Cleaning and editing is done by automatically by setting parameters, tolerances, and topography rules within the software to help detect errors (vector) -Minimize misclassifications by eliminating “noise” from easter cells, this can be rectified by filtering and reclassifying each cell to meet the trends in the data 2. Conversion -Raster to vector -Topological problems -Vector to raster -Loss of connectivity vis topological errors -Loss of small polygons -Change to grid orientation -Change to grid the origin and datum
28
what are some sources of errors in data processing and analysis 3
GIS analysis techniques can bring in error 1. Classification; what intervals used to classify; reflectance values for each pixel within a specific wavelength 2. Aggregation/disaggregation; is aggregating smaller to larger than no problem 3. Integration through overlay procedures
29
What are some methods to finding errors 4
1. Visual inspection: Cleaning, editing, and comparing your output with the source data 2. Double digitizing: Map is digitized twice 3. Error signatures: Each person digitizing does it differently. If they make the same mistakes, they become identifiable to that individual; “their error in signature” 4. Statistical analysis: outliers, error modelling (epsilon), error bands around digitized lines (Monte Carlo)
30
History of that data record identifies
1.Source 2.Data capture method 3.Editing and cleaning procedures 4.Conversion and transformation 5.Identifying known errors
31
What are some benefits of (data?) lineage 4
1.Error detection 2.Management accountability 3.External accountability 4.Quality reporting
32
What is a buffer
The generation of a unique purpose polygon that is a specified distance around a point, line or area feature
33
What are the characteristics/techniques of buffering in vector models? 5
Positive buffering: Buffering algorithm identifies the geometric coordinates of a line exactly n units away from a point, line or feature. Identifies x and y of the beginning node, intermediate vertices and the ending node Point buffering: buffering around a point creates a perfect circle zone that are some n units from the point feature Line buffering; buffering around a line creates a zone that has the approximate shape as the original linear feature extended out n units from the original linear feature Area/ polygon buffering; buffering around an area creates a zone that has the approximate shape as the original polygon feature extended out n units from thew original polygon feature Variable buffering; the buffer distance or buffer size can vary according to numerical values provided in the vector layer attribute table for each feature
34
What are the types of buffering?
1.Arbitrary -Best guess by a GIS analyst of what the size of a buffer should be; not based on scientific principle 2. Causative -Landscapes or conditions surrounding points, lines, and areas are not heterogenous -If a priori (before the fact) about these conditions, then you can apply causative bugger logic to identify what the buffer distances should be for each 3.Mandated Agencies set strict guidelines and restrictions for buffer dimensions around features
35
What is overlay analysis using vector data?
-An overlay of two or more thematic features can be performed using either vector (topological) or raster (logic) layers -Topological overlay involves comparing 2 or more vector feature layers that have been topologically structured to produce a new topologically structured vector feature layer -Basically, new layer combines the attributes and geometric characteristics of the 2 initial input layers
36
What are the types of Types of topological overlays in vector buffering?
**see visual representation 1. intersection; uses AND 2. union; uses OR 3. symmetrical difference; uses XOR 4. Identify; computes geometric intersection of the input layer and te identify layer = the input feature will get the attributes of the identity layer 5. clipping; none of the attributes are combined 6. erasing; none of the attributes are combined 7. splitting
37
What are the polygon functionalities in vector buffering
1. Overlay of a point layer on a polygon layer= point in polygon overlay 2.Overlay of a polygon layer on a polygon layer= polygon in polygon overlay
38
What is a GIS flowchart
**view example 1.Shows step by step approach to how GIS analysis was conducted 2.Automates the process 3.Makes it shareable 4.Makes it transferable and interoperable
39
What is reclassification in raster spatial operations
The process of taking input cell values and replacing them with new output cells. Change cells value or meaning by applying this technique. Simply or reinterpret cell value.
39
What is distance measurement operations (distance) in raster spatial operations
-To determine the euclidean distance (as the crow flies) between the two centroids of two pixels (cells) in a raster, you would use Pythagorean theorem -This uses the Euclidean algorithm and simply searches a specific distance away from a point, line, or area and recodes all of those pixels into a user specified value (distance value = 100m) -Point Euclidean, line Euclidean, area (polygon) Euclidean
40
What is turning local raster operations to a single raster dataset
Where each pixel (cell) at a location in a raster dataset may be considered a local object. The value of this pixel can be operated upon, independent of neighbouring pixels (basically changing the value of a cell) 1. Arithmetic 2. Logarithmic 3. Trigonometric 4. Power functions (integer, sin, log10, ^2)
41
What is turning local operations to multiple raster dataset
Map algebra: tomlin; a suite of arithmetical and algebraic operations/functions Take two raster with two different (multiple) values and combine them 1. Simple math operations (min, max, mean, std dev) 2. Simple math operation; ranking (summation) 3. Simple math operations; frequencies (majority, minority, diversity)
42
What is a map overlay using local operations applied to multiple raster layers
- basically adding a bunch of layers to create a sandwich -Extent is the same between each layer -Each cell is the same value? -The cell between each layer line up to the next layer must line up or else overlay will not work -Raster is not just cell on top of cell -Resolution MUST be consistent between each layer
43
What is an application example of a raster model?
RUSLE -revised universal soil loss equation; determines the long term average annual soil loss in tons per acre per year for a geographic area (if erosion occurs what is the restoration factors you will put into place? Map tells you where and possibly how you can)
44
What are neighborhood operations to change raster based values
1. Shape -Local raster operation modifies the values of each cell in a raster layer, while neighborhood operations are dependent of the characteristics of neighboring cells -So, cell (f) values change due to the cells surrounding it -Uses a spatial filter (3x3 or 5x5 most common) 2. Filter -Value of the output cell is a function of the logic applied to the cells in the n x n spoatial moving window. Can fill in missing values, as spatial window moves it recalculates all values.
45
What are neighbourhood operations apply to a single raster dataset
1. Qualitative operation; land cover (majority, minority, diversity/unique values) 2. Quantitative operations; elevation (minimum, mean, max, std dev)
46
what are (neighbourhood?) zonal operations; break down areas by creating zones
1. Zonal mean; the max elevation value in each zone is assigned to all the pixels in the zone 2. Zonal minimum: the minimum elevation value in each zone is assigned to all the pixels in the zone
47
What is a network?
-Use netwrks to move goods, resources, serivces, and themselves from place to place - Even other species rely on them to migrate form habitat to habitat or landscape to landscape -A system of topologically interconnected lines and intersections that represent a linear network of possible routes from one location to another
47
What is geocoding?
-Process of finding a geographic location from an address or a postal code -Process of transforming a description of a location to a location ont he earths surface -Most common type is address matching
48
How are Networks used?
1. Optimum paths or pathways 2. Locating resources or facilities within a network
49
What is address matching
-Plots addresses as points on a map -Addresses are processed along road segemnts based on attrbitues in a geocoded database -results are also shown as a % of those addresses succesfully matched = match rate
50
What are the main components of geocoding?
1. Input dataset; records of addresses 2. Output dataset; resulting addressed locations that are geographically referenced, can be impacted if input data is not quality assured 3. Processing algorithm; determines the appropriate position of the input postion in the output dataset. This is based on the attribute data in the reference dataset 4. Reference dataset; has all the geographically referenced information to aid in finding the correct locations of your input dataset
51
What is geocoding quality match rate impacted by
1.Misspellings 2.Incorrect city names 3.Wrong direction 4.House numbers out of street range 5.Wrong street type 6.Abbreviations 7.Missing data 8 Must specify the network analysis to be performed ex straight line
52
What are the geocoding network types?
1. Undirected -Transportation based -Movement along traversed lines (edges) n any direction -Instances of one-way, U turns are permissible -Fastest or shortest route applications 2. directed -Movement of services or materials in one direction within a network -AKA geometric utility
53
Undirected Network Analysis steps to building a topologically correct database
*see diagram 1. collect source network information -GPS, digitization 2. Build topologically correct Network dataset elements -edges (dervied from lines ex one way, length, x and y), -junctions (derived from -intersections, complete stop, 4 way, barriers), turns (right, left, yield) 3. specify network analysis -identify optimum route, find closest facility, identify service area and optimum delivery route and then location-allocation modelling 4. Solve the network analysis problem and present results -cartographic display, route instruction (turn by turn) *network layer defines elements and the connectivity gives values to the element attributes
54
Directed Network analysis steps to building a topologically correct database
* see diagram 1. collect source network information - GPS, digitization 2. Build a topologically correct utility network dataset elements -edges (derived from lines; cables, pipelines) -junctions (source and intersection) -point and area ancillary features 3. specify utility network analysis -determine flow direction in a utility network -trace edge and upstream/downstream junctions -use barriers to cordon off or isolate specific parts of the network 4. solve the utility network analysis problem and present results -cartographic display -route instruction (turn by turn)
55
UNDIRECTED: Specify the network analysis to be preformed components
1. shortest path or optimal route -Uses impedances or cost to move from one location to another 2. Network service area analysis -Geographic region that encases all parts of the network that can be reached within a certain impedance (cost) 3. Location allocation modelling for facility location -Finding best locations for on more facilities that services a given set of points (customers) -Take factors such as facilities available, cost and max impedance times
56
What are some components of a directed Network analysis steps to building a topologically correct database
1. For a junction: could be a valve (with their condition: diameter, age, open, closed, etc. For a source junction: points show how the material is being deliveredinto the network and how it pushed through it For a sink point: they show where the commodity or material is used, collected or leaves the network. show how the material is being delivered into the network and how it pushed through it A barrier: is a isolated part of the network (ex. lake or pond)