GA - The Value of Data Flashcards

1
Q

What is the role of a data scientist when dealing with stored data?

A

> Data on its own isn’t interesting or important. A data analyst must turn it into something meaningful.

> Data analysts do this by uncovering insights concealed within data.

> They connect multiple datasets together to show how data interacts, and they generally make the world more understandable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why are companies storing more data?

A

Companies use it to answer strategic questions, make informed decisions, and drive growth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the four V’s of big data?

A
  1. Volume - Scale of data
  2. Variety - Different forms of data
  3. Veracity - refers to the the trustworthiness of the data
  4. Velocity - the frequency of incoming data that needs to be processed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the six parts of the Analytics Workflow?

A

This is not a begin at the top and end at the bottom workflow, you will revisit steps along the way as needed

  1. Identify - you must understand the problem that you’re trying to answer
  2. Obtain - you must find or collect the right data to help answer your question
  3. Understand - You need to make sure you can correctly interpret the results and trust the data
  4. Prepare - Make sure the data doesn’t contain incorrect or missing values
  5. Analyze - uncover the answers to your questions
  6. Present - determine the best way to share your results with others
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the most tangible goal when Identifying the problem? (Identifying the problem)

A

The most tangible goal is to transform a “business question: into a “data question.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the most common places for data storage? (Obtaining the Data)

A

> Flat Files (e.g., comma-delimited text files, commonly called CSV)

> Spreadsheets

> Databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What tools know how interact with data stored in the formats on the previous slide? (Obtaining the Data)

A

> Excel (or another spreadsheet software)

> Structured Query Language (SQL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some steps we should take when understanding what data we have to work with? (Understand the Data)

A

> Define each column of data

> Think about potential usefulness

> Think about potential shortcomings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some items we are looking for when preparing data? (Prepare the Data)

A

> Incorrect values
Missing Data
Duplicate line items

Explore some of the ways that a dataset might contain bad data and some of the solutions to those problems. https://github.com/Quartz/bad-data-guide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is aggregated data? (Analyze the data)

A

Refers to representing many data points with a single one.

> The most common are the sum, count, avg (mean), minimum, and maximum

> We might summarize data in other ways such as ranking values, or showing the range of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is raw data?

A

refers to any data object that hasn’t undergone thorough processing, either manually or through automated computer software

How well did you know this?
1
Not at all
2
3
4
5
Perfectly