Statistics and Census Flashcards
(39 cards)
Research Assessment Steps
- Define problem
- Specify the boundaries to the problem
- Develop a fact base
- List goals and objectives
- Identify the range of solutions
- Define potential costs and benefits
- Review the problem statement
“the process of studying a procedure or business to identify its goal and
purposes and create procedures that will efficiently achieve them”.
a problem-solving technique that breaks down a system into its component
pieces, and how well those parts work and interact to accomplish their purpose
Systems Analysis
used to compare the means of two different sets of observed data and to find to what extent such difference is ‘by chance’
only 2 sets of data can be used
T-test
The main difference is that a t-test is used for small sample sizes (n <30) or when the population variance is unknown and uses the t-distribution. A Z-test is used for large sample sizes ( n>30) with known population variance and relies on the normal distribution
z test
Sampling error vs. sampling bias?
- sampling bias is collected in such a way that some members of the intended population have a lower or higher sampling probability than others. It results in a biased sample[1]
- sampling error difference between the sample statistic and population parameter is considered the sampling error.[
Each individual has an equal chance of being
selected for the sample.
Simple random
every Xth individual is selected from the list, starting at a randomly chosen poin
Systemic Sampling
population may have 2 or more groups in the study; provides the best results because it ensures even coverage of the population but maintains the random selection probabilities
* can be disproportioanl when sampling is not proportional to the percentage of the group populations
Stratified sampling
population is divided into smaller geographic units such as neighborhoods wihtin a city or blocks within a district; sample consists of random selection within each city or block and all individuals within those are sampled
Cluster sampling
assignment of numbers or symbols for the purpose of designating subclasses that represent unique characteristics:
1. renaming (social security/uniform numbers)
2. categorical (male, female)
Weakest level of measurement
Nominal scale
type of statistical distribution where the data points are clustered more toward the lower side of the scale, and there are very few higher scores, resulting in a longer tail extending towards the right side of the distribution
Skewed right
negatively skewed, shows a distribution where the majority of data points are clustered on the right side, and the tail of the distribution extends towards the left. This means that the smaller values (the “left tail”) are less frequent than the larger values.
Skewed left
a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value.
(in squared units)
Variance
squaring emphasizes the deviation from the mean, making it more sensitive to large deviations. This can be helpful for certain statistical inference
the average distance from the mean; in original unit
Standard deviation
Linear regression vs. t test?
t-test: significance between 2 datasets
linear regression: extent to which an independent variables influence dependent variables
Hierarchy of Census Data
method of estimating future population size by analyzing data points that indirectly reflect population changes, such as school enrollments, voter registrations, utility connections, or housing permits, rather than relying solely on direct population counts; this data is then used to project future population trends and inform planning decisions based on these indicators.
Symptomatic projections
you have populations for the county/region but you want projections for your local community; you know that historically, your muni population has been 10% of the county population, so you “stepdown” those numbers
Stepdown/Ratio method:
Uses births, deaths and net migration to estimate population projections
Most complex and used for census and pyramids
Net migration is the most difficult to predict
Cohort-Component Method:
ratio that measures the number of people who are not in the labor force compared to the number of people who are. It’s used to measure the strain on the working population.;
A good ratio is low, meaning there are enough working-age people to support the dependent population
Dependency rate:
2010 Census: Fastest growing state, state with largest overall population increase, fastest growing muni
Palm Coast, FL is fastest growing between 2000 and 2010, which grew by 92%
Nevada is fastest growing state AND Texas had largest overall population increase
2020 Census: Fastest growing state, state with largest overall population increase, fastest growing muni
The Villages in FL grew the fastest
Utah is fastest growing state, TX has largest numeric increase
Sizes of census geographies (tracts, block groups, blocks and public use microdata area)
Metropolitan Statistical Area: one city with 50,000 or more
Consoidlated MSA: CMSA; metro statistical area with 1M or more (18 CMSAs)
Urbanized area/urban cluster: density settled areas with pop of 50,000, core block groups/blokcs with at least 1,000 per square mile to delineate urban core
Census tracts: generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people
Block group (smallest for Single Family 3 and 4); 300 to 3,000
Census blocks: has the smallest unit of 100% tabulation data; average size is 100 people
- Assumes growth will occur at a constant rate
- Normally accurate for short projection periods; often overestimates population in the long term
graphed as concave ascending curve
Exponential Curve