Statistics and Census Flashcards by Sarah Walwema

Research Assessment Steps

Define problem
Specify the boundaries to the problem
Develop a fact base
List goals and objectives
Identify the range of solutions
Define potential costs and benefits
Review the problem statement

How well did you know this?

Not at all

Perfectly

“the process of studying a procedure or business to identify its goal and
purposes and create procedures that will efficiently achieve them”.

a problem-solving technique that breaks down a system into its component
pieces, and how well those parts work and interact to accomplish their purpose

Systems Analysis

How well did you know this?

Not at all

Perfectly

used to compare the means of two different sets of observed data and to find to what extent such difference is ‘by chance’

only 2 sets of data can be used

T-test

How well did you know this?

Not at all

Perfectly

The main difference is that a t-test is used for small sample sizes (n <30) or when the population variance is unknown and uses the t-distribution. A Z-test is used for large sample sizes ( n>30) with known population variance and relies on the normal distribution

z test

How well did you know this?

Not at all

Perfectly

Sampling error vs. sampling bias?

sampling bias is collected in such a way that some members of the intended population have a lower or higher sampling probability than others. It results in a biased sample[1]
sampling error difference between the sample statistic and population parameter is considered the sampling error.[

How well did you know this?

Not at all

Perfectly

Each individual has an equal chance of being
selected for the sample.

Simple random

How well did you know this?

Not at all

Perfectly

every Xth individual is selected from the list, starting at a randomly chosen poin

Systemic Sampling

How well did you know this?

Not at all

Perfectly

population may have 2 or more groups in the study; provides the best results because it ensures even coverage of the population but maintains the random selection probabilities
* can be disproportioanl when sampling is not proportional to the percentage of the group populations

Stratified sampling

How well did you know this?

Not at all

Perfectly

population is divided into smaller geographic units such as neighborhoods wihtin a city or blocks within a district; sample consists of random selection within each city or block and all individuals within those are sampled

Cluster sampling

How well did you know this?

Not at all

Perfectly

assignment of numbers or symbols for the purpose of designating subclasses that represent unique characteristics:
1. renaming (social security/uniform numbers)
2. categorical (male, female)

Weakest level of measurement

Nominal scale

How well did you know this?

Not at all

Perfectly

type of statistical distribution where the data points are clustered more toward the lower side of the scale, and there are very few higher scores, resulting in a longer tail extending towards the right side of the distribution

Skewed right

How well did you know this?

Not at all

Perfectly

negatively skewed, shows a distribution where the majority of data points are clustered on the right side, and the tail of the distribution extends towards the left. This means that the smaller values (the “left tail”) are less frequent than the larger values.

Skewed left

How well did you know this?

Not at all

Perfectly

a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value.

(in squared units)

Variance

squaring emphasizes the deviation from the mean, making it more sensitive to large deviations. This can be helpful for certain statistical inference

How well did you know this?

Not at all

Perfectly

the average distance from the mean; in original unit

Standard deviation

How well did you know this?

Not at all

Perfectly

Linear regression vs. t test?

t-test: significance between 2 datasets
linear regression: extent to which an independent variables influence dependent variables

How well did you know this?

Not at all

Perfectly

Hierarchy of Census Data

Study These Flashcards

method of estimating future population size by analyzing data points that indirectly reflect population changes, such as school enrollments, voter registrations, utility connections, or housing permits, rather than relying solely on direct population counts; this data is then used to project future population trends and inform planning decisions based on these indicators.

Study These Flashcards

Symptomatic projections

you have populations for the county/region but you want projections for your local community; you know that historically, your muni population has been 10% of the county population, so you “stepdown” those numbers

Study These Flashcards

Stepdown/Ratio method:

Uses births, deaths and net migration to estimate population projections

Most complex and used for census and pyramids
Net migration is the most difficult to predict

Study These Flashcards

Cohort-Component Method:

ratio that measures the number of people who are not in the labor force compared to the number of people who are. It’s used to measure the strain on the working population.;
A good ratio is low, meaning there are enough working-age people to support the dependent population

Study These Flashcards

Dependency rate:

2010 Census: Fastest growing state, state with largest overall population increase, fastest growing muni

Study These Flashcards

Palm Coast, FL is fastest growing between 2000 and 2010, which grew by 92%

Nevada is fastest growing state AND Texas had largest overall population increase

2020 Census: Fastest growing state, state with largest overall population increase, fastest growing muni

Study These Flashcards

The Villages in FL grew the fastest

Utah is fastest growing state, TX has largest numeric increase

Sizes of census geographies (tracts, block groups, blocks and public use microdata area)

Study These Flashcards

Metropolitan Statistical Area: one city with 50,000 or more
Consoidlated MSA: CMSA; metro statistical area with 1M or more (18 CMSAs)
Urbanized area/urban cluster: density settled areas with pop of 50,000, core block groups/blokcs with at least 1,000 per square mile to delineate urban core
Census tracts: generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people
Block group (smallest for Single Family 3 and 4); 300 to 3,000
Census blocks: has the smallest unit of 100% tabulation data; average size is 100 people

Assumes growth will occur at a constant rate
Normally accurate for short projection periods; often overestimates population in the long term

graphed as concave ascending curve

Study These Flashcards

Exponential Curve

upper limit of population is defined assumes declining rate or percentage of growht as upper capacity is approached furhter out the proejction, closer the buildout graphed as convex, ascendung curve

Modified exponential curve

assumes growth begins slowly, increases momentum until ti reaches the inflection point then sows to increments of contuosly decreasing rates

Gompertz Growth Curve

Assumes growth is predictable based upon past trends in different area

Comparitive Method

assumes relationship between local and larger area will remain constant

Ratio Method (Shift Share)

poulation change is function of natural increase (birthds-deaths) adn net migration Population cohorts typically 5-year ranges Visualized in population pyramid model

Cohort-Component Method ## Footnote Cohort survival, on the other hand, is a specific part of the cohort-component method, focusing on how a cohort (a group of people born in the same year) survives over time due to mortality.

Constant vs. Shift Share

Constant-Share: Assumes that the local economy will grow at the Same rate as the larger (reference) economy Shift-Share: Recognizes that local economy rarely maintains a constant share of the reference economy Modifies "constant-share" method by adding a "shift" term in the formula that accounts for the differences between the local and reference area's growth rates The "shift" term is equal to the difference between an industry's local growth rate and that of the reference economy for a specified period Assumes that the difference between the local and reference economy growth rates will remain constant over time

Selected from a populatoin that is divided into multiple groups or classes provides best results because it ensures even coverage of the population but maintains the random selection probability

Stratified Sample | Disproportional: when sampling is stratified but not proportional to the

# Cohort-Component Method * Estimates are calculated for current population levels * Projections are calculated for future population levels * Forecasts are subjective and apply only to selected projections * Migration is the movement of people into and out of a given study area * Birth Rate is the total number of babies born per 1000 females in their childbearing years (typically 15-40) * Death Rate is the total number of deaths per 1000 people in the total population

* Estimates are calculated for current population levels * Projections are calculated for future population levels * Forecasts are subjective and apply only to selected projections * Migration is the movement of people into and out of a given study area * Birth Rate is the total number of babies born per 1000 females in their childbearing years (typically 15-40) * Death Rate is the total number of deaths per 1000 people in the total population

A population of at least 100,000 people and includes at least one city that has a population of at least 50,000 people or an urbanized area

Metropolitan Statistical Area

assumes that population growth is growing at absolute equal increments per year, decade, or other unit of time. It also assumes that growth will follow a similar pattern in future years. When best to use: Use when the pattern of growth is similar to a straight line; it is especially useful when projecting areas experiencing slow growth (like rural areas) **or decline**

Linear projection

used to collect more detailed information from approximately one in six households. In addition to all of the 100-percent data, the ----- questionnaire for Census 2000 collected sample data on the social and economic characteristics of the population and the physical and financial characteristics of housing.

Long form census

dissimilarity index

demographic measure of the evenness with which two groups are distributed across component geographic areas that make up a larger area. A group is evenly distributed when each geographic unit has the same percentage of group members as the total population. The index score can also be interpreted as the percentage of one of the two groups included in the calculation that would have to move to different geographic areas in order to produce a distribution that matches that of the larger area. **The index of dissimilarity can be used as a measure of segregation.** A score of zero (0%) reflects a fully integrated environment; a score of 1 (100%) reflects full segregation.

A properly selected statistical sample of a large population will always

Provide a mathematical estimate of the accuracy of the calculated population characteristics.

Descriptive vs. inferential

descriptive statistics state facts and proven outcomes from a population, whereas inferential statistics analyze samplings to make predictions about larger populations

computer-accessible files containing records for a sample of housing units, with information on the characteristics of each housing unit and the people in it. based primarily upon counties, and may be whole counties, groups of counties, or places (at least 100,000 people)

Public Use Microdata Samples (PUMAS)

Statistics and Census Flashcards

(39 cards)