Data Management Flashcards

1
Q

Respondents’ Unique IDs can potentially be used to:

A

Link respondents’ personally identifiable information to their responses

Link different tables with a different structure in a relational database.

Link raw data to analysis code and to analysis output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The metadata we track for the data collection process includes

A

Surveyor assignments
Completion rate
Surveyor attrition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Master code files are meant specifically to:

A

Run (call) all other coding files in the project

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

We use relative references in our code so that

A

We do not need to repeat the full file path of the working directory for each file used or created

Different analysts who have different locations for their project folder do not need to change the file path for each file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When publishing data, the code book should be created when?

A

After the dataset is final, before data publication

The codebook describes the data such as variable names, labels, question text, and summary statistics such as the mean, minimum and maximum values, etc. Because variables may be generated during analysis, and summary statistics may change after certain cleaning decisions, it is best to produce the code book at the very end, when the datasets are final.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which documents are included in the “manual” for the published data?

A

ReadMe file

Code book

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

According to Gentzkow and Shapiro, rather than naming the latest version of a file: regressions_022713_mg.do, one should instead:

A

Use version control software, and not use dates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is required to merge two datasets?

A

There needs to be a relational parameter or “foreign key” (i.e. variable on which to merge the two datasets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Merge

A

A horizontal combination of datasets by a unique ID

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Append

A

A vertical combination of data sets that possess variables in common (at least a subset); same variable names and datatypes

Adds observations to the existing variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Master file

A

a file that runs ALL code in your project

Useful for:
– Setting any globals that might be used across do-files
– Installing user-written commands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Codebook

A

• Contains information about the data: variable name,
labels, question text, min/max values, etc.
• Critical for easy interpretation of the data and in
furthering analysis
• Have do-file that creates codebook from raw data
• When: Created once the data set is final

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ReadMe Files

A

• Outlines key information about all published files: data
and analysis files, questionnaires, codebooks
– E.g. format of the data (such as # of observations per
student, # of variables)
• Describes how data/analysis files interact with one
another – e.g. which came first, is one a subset of another?
• When: Immediately after each round of data collection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly