M7 U3 - Data Types and Sources - Q2 Flashcards

1
Q

What decides the type of data that will be used in the project? (2)

A

the tasks and methods that were defined at the same time as the project’s business and analytic objectives.

The type of data will influence the source and data collection techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

List the categories of numeric/categorical data (4)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s one interesting thing about qualitative data?

A

Qualitative data is sometimes transformed to enable it to be used in certain machine learning modeling techniques that require quantitative data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe structured vs unstructured data and examples of their storage backbones

A

Structured Data

  • Fixed formats (usually row and column structure?)
  • Easy to extract
  • Requires a predefined schema
  • Examples: spreadsheets, relational databases and other repositories in the row and column format.

Unstructured Data

  • Most difficult to extract
  • Doesn’t fit row and column structure
  • It cannot be maintained in formats that are uniform.
  • Doesn’t need a predefined schema
  • Examples: Text, multimedia files, and log files from servers, NoSQL databases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some ways to classify data? (4)

A
  • Data type: Numeric vs categorical and subtypes of each
  • Qualitative and Quantitative Data
  • Structured and Unstructured Data
  • Internal and External Data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s a Secondary data source?

A
  • Secondary data sources: gathered from sources external to an organization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What’s a primary data source?

A

Primary data sources: collected and processed by an organization and housed internally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What sources can internal data be collected by?

A

Can come from a primary or secondary data source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What data sources can an organization’s data governance framework affect?

A

Both primary and secondary sources

Any data used by the organization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s the key to distinguishing between internal and external data sources?

A

I believe: If the data is stored in a company’s DB and completely controlled by that company, it’s internal. Otherwise, external.

The data does not have to be about things within the company to be internal (but I think it usually is).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the key to distinguishing between primary and secondary data sources?

A

Whether or not you collected it yourself. If so, it’s primary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Is primary data internal or external? What about secondary data?

A
  • Primary data can be collected from internal or external sources
  • Secondary data will usually come from external sources.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do we know about secondary data?

A

It’s often used by others (too). I.e. it’s usually not your own.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Give examples of each of the 4 groupings of data sources

A
  • Primary Internal: Data scientist conducts questionnaires and focus groups with employees of their own company.
  • Primary External: Data scientist conducts questionnaires and focus groups with customers.
  • Secondary Internal: Your company purchases potential client data from data brokers (external source). That data has now become your company’s data that will be used for marketing, advertising, etc. (It has now become internal data) .
  • Secondary External: An example of secondary data is data used in a kaggle competition or a dataset from the popular UCI Machine Learning Repository. You did not collect that data and it has been used by others.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What data does data governance affect?

A

Any data that is used by the organization for decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is data collection?

A

Data collection is the process of gathering and analyzing data that can meet defined business and analytic objectives.

17
Q

When data is collected, it can be for one of three purposes? List them (3)

A
  1. Data collected to define business and analytic objectives
  2. data collected to define business requirements
  3. data needed for developing an analytic solution
18
Q

What are the traditional data collection methods?

A
  • Questionnaires and surveys
  • Interviews
  • Observations
  • Focus groups
19
Q

What’s obtrusive vs unobtrusive data collection methods? Give examples.

A
  • Obtrusive: Participants of the data collection exercise are aware that data is being collected from them for a purpose. E.g. the 4 traditional data collection methods.
  • Unobtrusive: Can be done without the knowledge of the subject of the study. E.g. web sources of data, social media data, data sets from a data repository.
20
Q

The traditional data collection processes are similar to the requirements gathering techniques. What is the difference between the data collected during both processes?

A

The data collected during the requirements phase is useful in determining what data is collected in the data gathering phase.

21
Q

Where is the first place you should start looking for data during the data collection process?

A

Start from within the organization. No matter how small, you should start collecting data from within your client organization.

22
Q

What’s the main idea of how data collection fits into the project?

A

When you defined the project’s business and analytic objectives, the tasks and methods were proposed as well, those tasks and methods drive the type of data that will be used in the project. The type of data will influence the source and data collection techniques.

23
Q

Give examples of external data (4)

A
  • Statistics from surveys
  • Questionnaires
  • Research
  • Customer feedback.
24
Q

What influences the source and data collection techniques?

A

The type of data

25
Q

What do we know about nominal data? (2)

A

Can’t be:

  • ordered
  • measured
26
Q

List examples of internal data (4)

A

Data about:

  • Operations
  • Maintenance
  • Personnel
  • Finance
27
Q

Data collected from Twitter by an presidential candidate’s election campaign team is considered which of the following?

  • Internal
  • External
  • Primary
A

External data