Midterm two Flashcards
(58 cards)
What is a database?
-Backbone of how raw data is stored and how information is generated from it
-A collection of related data
what is a database used for? (7)
DEOTSSO
- Data management systems (DBMS) can house large quantities of data and allow multiple users to work on the database concurrently
- (data)Evolve in a database
- (in) One location, therefore reduced redundancy
- Transfer user knowledge between applications
- (data) Sharing is encouraged
- (data) Security and standards for data acessc an be developed and enforced
- (If) Organized, maintenance costs are reduced and there is a reduced data application
what is the process of turning data into value?
Structure/organization turns DATA into INFORMATION and efficient management of INFORMATION gives it VALUE
What is the difference between data and information?
DATA is RAW collection and then turns INTO information
Why do we use databases? (4)
- order data
- re order data
- summarise data
- combine data
in order to obtain information of value
How can be information of value be achieved 5
1) shared manner
2) easy access
3) with concurrent access (simultaneous)
4. minimum data duplication
5. with integrity (validity) ensured
Why are databases important to GIS?
store attribute data associated with spatial data in a GIS
store topological data associated with spatial data in a GIS
store metadata associated with spatial data in a GIS
overall make querying of spatial data possible
Why are DBMS sucessful?
- a data model to represent real world objects in a digital context stored in a computer
-a data load capability- tools to help import and load data into the database structure
-indices- help speed up searches
-
How to create a database?
-engine filters out to different types of database
-more about thinking things through
1) Data investigation
-why are you collecting the data and what aspects do you need?
-type, quantity or quality, choosing the right entity and attribute
2) Data relationships
- when you have multiple tables, understand the relationships between entities and their attributes
3) Data design and structure
-what software? field names, structures, types are you going to develop?
4) The database
-how do you populate database? process? how is data maintained/upgraded?
What are the different types of relationships between entities?
what do entity relationship models help us recu / what relationships can databases be joined by ?
- one to many
- many to many
- multiples compared to multiples, these WILL NOT JOIN
- one to one
- a unique value to a unique value (there is not more than one)
- many to one
-help used decide of the proper tables for their relational database
What is Chens entity relationship model comprised of?
- Entity sets: the fundamental thematic groupings or phenomena being modelled
2.Relationship sets: “subsets of the cross product of two or more entity sets” ex the subset of hotels in a certian town
3.Mappings: which define the relationship between the members of entity sets, may be one to many, one to one, many to many
What are the database types through the years?
- 1960; hierarchical database (aka navigational)
-Tree-like structure
-Starts from a root
-Child nodes and parents nodes
-Describes one to many relationships
2) 1980; relational database
-Multiple tables
-Relationships maintained by a common field
-One to many relationship and many to many relationships
3) 1990; object- orientated database
4)Today; NoSQL database, cloud databases, self-driving databases
-Resides on a private, public or hybrid cloud computing platform
What are two database types?
1) Single files (a database type)
-AKA flat files
-Spreadsheets; the data is stored in columns (fields) and rows (records)
-Useful for smaller datasets
-Will have limited impact on computational speed or storage space
2) Relational databases (a database type)
-Related to one another via unique identifiers that allow users to link two or more databases or tables together
-The unioque identifer can also be referred to a primary key in the origin table
-The same field in the adjoining (destination) table is called the foreign key
-If both tables have the same field name in each database, this is referred to as the common key
What are types of attribute data 4
1.Nominal
2. Ordinal
3. Interval
4. Ratio (age, height, weight)
What can Deleting and adding attribute fields help with?
- Inputting the result of a classification or computation
- Allows specificity for the type of data field you require
- Removed redundant or unnecessary fields
- Improves storage capacity and processing time
What record can data entry be in?
- Single record; typing and changing that record only
- Multiple records; calculate command; these records are selected first and then updated with a specific value or classification
How can you do data queires?
1.Select by attribute so ask questions for data via its attributes
- Select by location so select data based on their proximity to other features
- Use standard query language (SQL)
-Syntax or SQL statements (using = )
-Boolean connectors (using “and:” or “or)
What is data quality?
the indication of how good the data are; overall fitness and suitability of data for a specific purpose
Why is data quality important?
- “Garbage in-garbage out” (aka GIGO) –> need good data because biggers decisions will be based on it
- Error prone data
-Can affect the reliability of the final product
-Lead to misinterpretation of the final product, affects decision making
-Provide inaccurate measurements or models
-Provide inaccurate results of queries
How to improve data quality 7
- Metadata
-Metadata is “data about the data” ; the 5 W’s, scale, projection and coordination, transformation, and usage - Understanding accuracy vs precision
-Accuracy is the extent to which estimated data values approach its true value ex. Plus or minus two meters
-Precision is the recorded level of detail of your data ex more decimal points
3.Understanding errors
-Errors are flaws in data; the difference between reality and the GIS computer environment -Errors can be single, definable departures from reality ex. The easting and northing location for one water monitor was entered incorrectly -Errors can also be persistent widespread deviations throughout a whole database ex easting and northings for all water monitor locations was everted incorrectly
- Completeness
-Will cover the entire study area and time period; complete set of attributes in database - Compatibility
-Is the data used together sensibly? Data should be collected and captured using similar methods ex are the overlays at the same scale - Consistency
-Applies not only to separate data sets but also within individual data sets - Applicability
-Appropriateness or suitability of data for a set of commands, operations, analysis or to solve a specific problem
What errors can exist within data? 3
- Bias
-Systematic variation of data from reality
-Can be technical or human based - Resolution
-To describe the smallest feature in a dataset that can be displayed or mapped
-Raster is figured out by cell size –> whatever the set cell size is and larger seen
-Vector is determined by scale of the original map, the point size, and line width of the features represented thereon and the precision of digitizing - Generalization
-Simplifying the complexities of the real world to produce scaled models and maps
What is metadata and why is it important 6
-International organization of standards (ISO)
-important because
Protects an institution or organizations data investment
1. Helps user understand data
2. Enables discovery
3. Limits liability
4. Highlights prudent data stewardship
5. Reduces workload associated with questions about data
6. Reduces overall costs in the long term
What are some sources of conceptional error? 6
conceptual errors: Errors stemming from our knowledge and understanding and trying to model reality.
- Ecological fallacy; the assumption that an individual from a specific group or area will exhibit a trait that is predominant in the group as a whole –> take one individual context and apply to everyone/whole
- MAUP (modification areal unit problem); a challange that occurs during the spatial analysis of aggregated data in which the results differ when the same analysis is applied to the same data, but different aggregation schemes are used
- Mental Maps
- Individual perception of reality
- Spatial models used to reflect reality; vector and raster
- Coming from different backgrounds/disciplines; reductionist view (detailed, explains parts of a system ex biology), OR holistic view (broader, tried to explain interrelationship at meso and macro scales)
What are errors in source data? 3
- Survey data
-Operation of equipment (GPS)
-Incorrectly inputting the wrong attributes into the database - Remote sensing data or air photos
-Georeferenced incorrectly
-Misclassification (clouds or shadows in an image)
-Time sensitive - Maps
-Digitizing process
-Generalization
-Boundaries (fuzzy)