Computerization Flashcards

1
Q

What is data?

A

Values of qualitative or quantitative variables belonging to a set of items. Individual facts, statistics, or items of information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is data organized?

A

Most data are placed into some context that underlies their structure
.
Most of these constructs are linear in nature (one-dimensional).

Data items are laid out in a list of some sort (more later) with a beginning that progresses linearly to an end.

Data items may be numbers or characters, dates, etc.

Data items may repeat in the list.

The actual sequence of the data may not be significant, or may be very significant (more later as well)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a database?

A

A database is an organized collection of data, today typically in digital form. The data are organized to model relevant aspects of reality (for example, the availability of seats on planes), in a way that supports processes requiring this information (for example, finding a flight with window seats).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a database typically do?

A

Typically puts lists vertically into tables, in which each column is a list, and each row is an object (it has an identity - such as a company in the example). Care mustbe taken that the lists all“line up” by row.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is spatial data? What does it require? What do we adapt to create spatial data? What do

A

The big jump into spatial data from lists is the addition of multiple dimensions. Spatial data (like maps transformed from reality) require at least two dimensions since “location” is an important (likely critical) aspect of meaning.

We adapt coordinate systems to create spatial data models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does spatial data have to store?

A

An attribute of location among other attributes. Maps are a good example of this. On maps, location is shown by position on page, for “data” we need to measure coordinates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why do we need spatial data?

A

The data that is stored in a one-dimensional list. If we knew the location of each part of the data it could help us gather information. It gives us a different understanding that we cannot get from linear data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three things spatial data require?

A

1) Require consideration of measurement types (as do list data)
2) Require appropriate modeling by appropriate geometric dimension
3) Require appropriate consideration of variation across space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is all data?

A

Measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is nominal data?

A

Nominal data have names only: qualitatitive differences are recognized only.

Mathematically: = <>

Examples: Voting by party, Soils, Land cover, Male/Female

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is ordinal data?

A

Ordinal data are the lowest level of quantitative data in that they give implied rank through names.

Mathematically: = <> > <

Example: high, medium, low : near, far : more, less: strongly agree, agree, neutral, disagree, strongly disagree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is interval data?

A

Interval data use numbers and amount of difference between numbers, but have no “0” that is meaningful. “0” is usually an arbitrary value used for convenience.

Mathematically: = <> > < + -

Examples:
Time
Temperature
Earth Grid systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is ratio data?

A

Ratio data use numeric scales like interval, but in this case “0” means “none”.

Mathematically: = <> > < + - / *

Examples:
Counts of most anything
Age: Intervals from an interval scale (subtractions)
Distances, areas
Percentages
Densities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Spatial data have______

A

Dimensionality. For mapping and databases, we use a geometric transformation from reality to a database representation. We recognize that we can use simple geometric analogs to reality in making a database of spatial data: Points, Lines, Areas. Surfaces/Volumes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are point objects?

A

0 dimension.

Locate features that either:

1) take up no space on Earth (true points): survey locations, graticule, crossings such as that at Four Corners, NM
2) take up a small space at the scale of the map (cartographic points): cities, buildings,utility poles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are line objects?

A

1 dimension

Represent features that are extremely long compared to their width. May be:

1) true lines that have no width on Earth: all survey boundaries: national, state, county, or property, contour lines, graticule lines
2) cartographic lines that have width on Earth, but we do not choose to maintain it on the map. This choice is a function of scale, roads, streams/rivers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are area objects?

A

Two dimensions

Used for features that have discernible length and width at the scale of the map. Features can be of several types based on variation and on measurement level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are surfaces?

A

2.5 Dimensions

Most GIS applications do not use true 3D coordinate systems (X, Y, Z) for location, they use a map attribute for the third dimension – so we consider surfaces 2.5D. We place an attribute at each (X,Y) location.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A calculation based on scale and dimension: At 1 inch to 1000 feet, 300 feet is______.

A

0.333 inches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Surface models vary across _______

A

Space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Because surface models collect data “everywhere” to form the surface, there are some issues to consider when deciding to model a surface. What are these issues?

A

At what density do we want to provide information (usually called the resolution)?

How do the values vary from place to place? Are they continuous (spatially dependent on values at neighbors)? Are they discrete (spatially less dependent on values at neighbors)?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are discrete surfaces?

A

Discrete surfaces are not predictable by space, data must be collected everywhere, but there a finite number of locations that
have data - (e.g. SALES TAX RATES BY STATE)

Observed by noting that neighboring areas do not necessarily have similar values

Slide 28

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are continuous surfaces?

A

Continuous surfaces are somewhat predictable by space, data may be collected anywhere, and there are an infinite number of locations that have unique data - (e.g. TEMPERATURES)

Observed by noting that neighboring locations have similar values.

Slide 29

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is spatial autocorrelation?

A

If a particular variable like elevation is spatially autocorrelated, the values nearer a measured location are closer in value to that at the measured location and those farther away are likely farther in value from that at the measured location.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

A key purpose of GIS database design is to use the GIS to retrieve data efficiently. We can use GIS to do ________ based on fields in a table, but also to do retrieval based on spatial geometry________. That is a large part of the power of GIS.

A

standard attribute queries

spatial query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the two approaches of streamlining GIS database?

A

Simple lists are suited for data that have to be used in a particular sequence or they lose their meaning (book text, movie frames, lines in vector GIS – later), or for small databases.

Ordered lists are used to speed up the search process when the initial data sequence is not important and the data order can be changed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are simple lists?

A

Simple lists have only one sequence inherent in them: top to bottom. Items in a list may not have an underlying internal order, and may not need one.

A shopping list is a good example. What is the order on a shopping list?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

For a simple list, what is the formula?

A

int(log(n)/log(2))+1

Slide 43

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the problem with searching simple lists?

A

Most databases have many fields (remember field = list), and the records/tuples can only be sorted physically on one. You can only sort on one field, so you can only search the database on that field efficiently, the rest are simple lists.

30
Q

What is the solution to the problem associated with searching simple lists?

A

Keep a copy of the data for every field. A five field database would require five copies, then use the best one each time. This is very inefficient of space.

Find a way to sort the database without copying all the data.

31
Q

What is indexing?

A

It is when we invert the files. In indexing we number the data records from 1-n, then we sort the record numbers not the data themselves. Index files for each variable give the logical record sequence to follow for sorting on that variable. Slide 46

32
Q

Why are spatial queries important?

A

It makes it more efficient to search spatial data.

Spatial queries must be computed from the data in the database, and its structure is very important.

Sorting in two or three dimensions is less efficient than in one, but there are efficiencies to be had from that (we’ll leave that to the software people for now).

Our two basic models for structuring spatial data allow efficiency in different ways. Efficiency is also measured in more ways with spatial data than with list data.

33
Q

What must spatial database models support?

A

2D location attributes plus associated “character” attributes. Most support measurement levels of variable types (nominal and ranked). They must support both continuous and discrete variables.

34
Q

What are the two sampling approaches for simplifying spatial database models?

A

Location Generalization – reduce the infinite number of locations in space (2.5D surfaces) to a fixed and reasonable number

Feature Generalization – reduce the infinite number of locations making up features in space (0, 1, or 2D features) and reduce their complexity

35
Q

How do we do location generalization?

A

As in the Map Generalization process, we first select the pertinent features to collect and decide on a classification.

Since there are an infinite number of points on a surface, we cannot store data at every point. We therefore tessellate (grid) the area in question and store a single value (for a single feature) in each grid cell – forms a systematic sample of space at a particular resolution. Database size is directly related to the desired resolution.

Slides 51,52, 53, 55 and 56

36
Q

If the resolution of the raster surface increases from 100 meters to 10 meters, the number of cells goes up by _____. Why? What does this to to the image and the database?

A
  1. 100x
  2. We increase the rows and columns by 10x each.
  3. We get a better resolution and the database size goes up exponentially.
37
Q

Is raster data efficient? Why or why not?

A
Raster data are efficient in that they need store the location attribute (X,Y) of the grid only once and a Z at every location. 
All other (X,Y) location coordinates are computable as:

1) The grid is aligned with the coordinate system axes
2) The cells are all the same size and shape

38
Q

If a raster surface dataset has an upper left corner at (650,000, 4,500,000) and a cell size of 100 meters. The cell at row 50, column 40 would have an easting coorindate of:_______.

A

650,000 + 40 * 100 = 654,000

39
Q

What is vector coding?

A

Feature generalization.

You don’t limit the locations, but record a finite set of desired features at a finite scale wherever they occur.

This approach still requires simplification of the features themselves as they contain an infinite number of locations (unless they are points), but the features do not occur everywhere.

Examples: slide 64

40
Q

What two models dominate vector data models?

A

1) Spaghetti model

2) Topologic model .

41
Q

What are Spaghetti models?

A

Lines are lists of points with no relationships with other lines. Slide 66.

42
Q

What are Topologic models?

A

Lines are streams that end and begin at intersections. Lines are coded between endpoints (nodes). Linear Topology allows networks which describe connectivity and allow routes and paths to be computed (Google map directions.). Slide 71

43
Q

What are shapefiles?

A

Basic features classes. In ArcMap we use shapefiles (not in a Geodatabase) or Feature Classes (inside Geodatabases.) Both of these are spaghetti file formats. At the time of creation, someone decided where they would begin and end, and logically that might be at connection points

44
Q

When do raster data models dominate?

A

In cases in which data are continuous in nature and do not show distinct edges or boundaries.

45
Q

When do vector models dominate?

A

In cases in which the data are discrete so features are clearly defined with edges and boundaries

46
Q

What do raster data models require?

A

Raster models require only one (X,Y) coordinate, but a lot of Z codes. If used for continuous data in which all values are (or can be) different, this is fine. If used in discrete cases, these codes can be very redundant.

47
Q

What do vector models require?

A

Vector models require a lot of (X,Y) coordinates to match the shapes of their Earth features, but only require a single Z code for each feature. Very efficient for discrete features. Difficult to code continuous surfaces as (X,Y, and Z) codes are needed at many points to capture the surface shape.

48
Q

How long is computation time for raster data?

A

Raster data tend to be more efficient to process in most operations as their geometry is extremely simple (cells in rows and columns). But raster databases are often larger than vector equivalents.

49
Q

How long is computation time for vector data?

A

Vector data are best for networks and linear features. Computations are often more complex as they have no regular geometry, precision is greater, and, if the data are also accurate, spatial accuracy is better..

50
Q
Which of the following should be modeled as raster surfaces: 
elevation
Temperature
Counties
Streams
Picnic tables
Campsites
Travel time from Blacksburg to places in Virginia
A

1) elevation
2) Temperature
3) Travel time from Blacksburg to places in Virginia

51
Q

Should I use raster or vector data?

A

The best overall recommendation is to use both types to their strengths. Raster for continuous surfaces and vector for discrete phenomena.

52
Q

How does the GIS database store maps? How is vector and raster data stored in this?

A

Conceptually, GIS databases store maps in layers (also called themes, coverages, or elements) which are registered together by Geography (Coordinate systems)

Raster or Vector models are used to store each layer’s spatial data as appropriate.

Linear attribute data vary by implementation

53
Q

What is cardinality?

A

Data relationships. When we collect data for GIS we have to collect both spatial and attribute data.

54
Q

Geographic features can have different types of relationships with their attributes, and with each other: What are the two types?

A

1:1 – a one to one relationship means that a single feature can have only a single attribute of that type
(Examples: city at 37⁰ 14’N -80⁰ 25’W has one name – Blacksburg
city at 37⁰ 14’N -80⁰ 25’W has one population – 45,354)

1:many – a single feature could have multiple attributes of the same type
(Examples: Road in Blacksburg has more than one name (S. Main Street, U.S. 460 Business)
Land Parcel at 1115 S. Main Street has three owners
Land Parcel at 1115 S. Main Street has 6 addresses (apartments))

55
Q

Geographic features can have different types of relationships with their attributes or each other.

A

many:1 – in this case many features share a single attribute
(Example: a land owner owns many land parcels in Blacksburg, or a parcel has many owners)

many:many – in this case everything goes:
(Example: A land parcel has several buildings on it, and the buildings run across the boundary and appear in many land parcels.)

56
Q

How does structure relate to data?

A

Choices of structures are based on the relationships that the data hold on to the other. Both vectors and rasters can be used with tables to hold additional attributes.

57
Q

What are generic flat files?

A

Simplest type of database – basically a text table delimited with characters that do not occur in the data: tabs, commas or spaces are common though the delimiters can vary

Except for the delimiters, no overhead memory space is used

Classic “list of lists” – rows are “objects”, columns are lists

Good for transfer between systems, editable in Excel and readable by Arc as a text table

BUT: Very slow to search, provides no flexibility and so is not useful in GIS

58
Q

What are hierarchical structures?

A

Data are arranged in a parents-child relationship each level a break down of the one above it. Slide 88/89

59
Q

Which of the following could be handled using a hierarchical design?

1 You and your facebook “friends” (1 to many, but how about their friends who are also your fiends?)
2 Your family tree - genealogy
3 The paths on the drill field (do they connect in some hierarchical manner)
4 Airline flights from Dulles airport (do they connect in some hierarchical manner)
5 Census Geography

A

Your family tree - genealogy

Census Geography

60
Q

What do network structures allow?

A

Network structures allow data to be tied together more freely than hierarchical structures. Like a road network, you can add a new path when there is a consistent need for one.

Allows 1 to 1, 1 to many, many to 1, and many to many relationships
improvement in flexibility over hierarchical.
Does not require that we establish “levels”, so data on each end do not have to be parts of the same wholes.

61
Q

What is the problem with network structures?

A

Problem is that the links still have to be planned and the data search speed is not adequate for unexpected cases that don’t have paths. Still not flexible enough.

62
Q

What are relational structures?

A

Simple yet very powerful systems that offer ultimate flexibility (all relationships can be supported) and have a strong mathematical basis.

Relations are tables of rows and columns that hold data. You have seen these as they are used in ArcMap for the attributes and, today, in ArcMap, actually hold the spatial data as well if you are using a GeoDatabase.

They look a lot like flat file tables, but are far more powerful because they can change form at the request of the user.

63
Q

What is relation?

A

A table in a relational structure design

64
Q

What is a turple?

A

A row in a table. In GIS a tuple is a feature modeled on an entity. Each row represents one feature.

65
Q

What is a field?

A

A column in a table that contains a list of information of a particular type for all rows. Types originally were aimed at business: applications:Integers, floating, point numbers (decimals), text, Dates

In GIS, a field is a single attribute of a set of features.

66
Q

What is a key?

A

A field that uniquely identifies each tuple. Often used to join relations together, but does not have to have that purpose

67
Q

What is a schema?

A

The layout of the relation, order and type of fields: text, integer, double, etc

68
Q

What is a join?

A

A logical connection of relations based on matching up the values ina key field. These can get rather complex as data are separated out into multiple tables,but that is the reason for the flexibility. Modern relational database interfaces make this rather easy today.

69
Q

A key to performance is that the database have a smart design. What are three ways a database can have a smart design?

A

Relations with fewer fields are better than those with many. “Narrow” tables are preferred. Length is not important, especially if indexed or sorted on the field to be searched.

Relations are designed by “topic”. If you have a map in GIS that has counties, and you want to store a large number of attributes for each county, split them into narrow relations along logical lines – they can be joined later as needed with careful design: Population data, Environmental data, Economic data

Each split requires a key added to the new table to allow for joins, so make the key as small and simple as possible

70
Q

In a relational structure, how would I save space in the following data set:

“I currently have a single table of soil polygons for Montgomery County, and it contains 25,000 polygons. Each polygon contains Arc Id information and geometry plus a soil type that has 27 properties, some stored in long text string fields, and many in numeric values.

As I inspect the data, I note that there are 14 different soil types and the attributes are always the same for each type.”

A

The best strategy is to look at the many to one relationship between the polygons and the soil properties.

There are many soil polygons, but only 14 different soil types. Each soil type matches only 1 soil property set.

The original table stores the 27 attributes 25,000 times so it is wasting a lot of space.

Split the tables as in C and make each table one field wider with a small key. Use an integer key 1-14 as it takes up very little space.

Then we need only to put the attributes in a table of 14 rows. That saves 24,984 copies that we did not need.

Slide 100

71
Q

What wins in GIS (whats the best to use)?

A

Relational structures.

Flexibility is best of the major options

Speed of data retrieval is improved by: having lots of small relations (less data to read through – divide and conquer), appropriate joins in place to build only as large a database as needed as any time (un-joined tables are not involved)