# Think Stats - Allen Downey Flashcards

Why would a survey be ‘oversampled’

so that under represented groups in the population is large enough to draw statistical inferences

so that under represented groups in the population is large enough to draw statistical inferences, what process would be used

Oversampling

In a survey, what documents the design of the study, the survey questions, and the encoding of the responses.?

The codebook

What is the codebook for?

It documents the design of the study, the survey questions, and the encoding of the responses.

Where might a codebook for public source data be held?

It might be in github

What is a DataFrame?

The fundamental data structure provided by pandas, containing a row for each record, and a column for each variable

What is a way to access a column from a dataframe?

By creating a series, which is like a python list but with indices

MySeries = df2[[“columnName”, “columnName2”, “columnName3”]]

What is a recode?

Example: ‘processDuration’ could be a recode calculated from processFinish - procesStart

In Pandas, how do you add a new column to a DataFrame

Simply name the new column, and what it is to be populated with:

NOT dot notation, like this:

df.totalwgt_lb = df.birthwgt_lb + df.birthwgt_oz / 16.0

Are histograms good for comparing two distributions against each other?

For example, if there are fewer data points in one distribution than the other then some of the apparent differences in the histograms will be due to sample sizes.

In statistics, What is a parameter?

the parameter tells us something about the whole population.

What does ddof stand for?

Delta degrees of freedom

In statistics what is estimation?

inferring a parameters

of a distribution from a sample statistic.

In statistics, what is an ‘estimator’

A statistic, used to estimate a parameter.

what is anecdotal evidence?

Based on data that is unpublished and usually personal.

5 steps to approach a problem using statistics?

1 - Data Collection 2 - Descriptive Statistics 3 - Exploratory Analysis 4 - Estimation 5 - Hypothesis Testing

What is anecdotal evidence?

Evidence, often personal, that is collected casually rather than by a well-designed study.

What is the population?

“Population” often refers to a group of people, but the term is used for other subjects, too.

What is a cross-sectional study?

A study that collects data about a population at a particular point in time.

in a study, what is a cycle?

In a repeated cross-sectional study, each repetition of the study is called a cycle.

what is a longitudinal study?

A study that follows a population over time, collecting data from the same group repeatedly.

In a statistical study what is a record?

In a dataset, a collection of information about a single person or other subject.

In a statistical study what is a respondent?

A person who responds to a survey.

In a statistical study what is a sample?

The subset of a population used to collect data.

In a statistical study what does ‘representative’ mean?

A sample is representative if every member of the population has the same chance of being in the sample.