{ "@context": "https://schema.org", "@type": "Organization", "name": "Brainscape", "url": "https://www.brainscape.com/", "logo": "https://www.brainscape.com/pks/images/cms/public-views/shared/Brainscape-logo-c4e172b280b4616f7fda.svg", "sameAs": [ "https://www.facebook.com/Brainscape", "https://x.com/brainscape", "https://www.linkedin.com/company/brainscape", "https://www.instagram.com/brainscape/", "https://www.tiktok.com/@brainscapeu", "https://www.pinterest.com/brainscape/", "https://www.youtube.com/@BrainscapeNY" ], "contactPoint": { "@type": "ContactPoint", "telephone": "(929) 334-4005", "contactType": "customer service", "availableLanguage": ["English"] }, "founder": { "@type": "Person", "name": "Andrew Cohen" }, "description": "Brainscape’s spaced repetition system is proven to DOUBLE learning results! Find, make, and study flashcards online or in our mobile app. Serious learners only.", "address": { "@type": "PostalAddress", "streetAddress": "159 W 25th St, Ste 517", "addressLocality": "New York", "addressRegion": "NY", "postalCode": "10001", "addressCountry": "USA" } }

Data Science Fundamentals Flashcards

(19 cards)

1
Q

What is Data Science?

A
  • Using data to answer questions
  • Data Science is a broad field, hence the broad description
  • A data scientist is broadly defined as someone who combines the skills of a software programmer, statistician, and storyteller/artists to extract the nuggets of gold hidden under mountains of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Three qualities of big data

A
  1. Volume
  2. Velocity
  3. Variety
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does Volume stand for in Big Data?

A

How much data there is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does Velocity stand for in Big Data?

A

The rate at which data is being generated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does Variety stand for in Big Data?

A

The many forms the data comes in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Diagram of Data Science skills overlap

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Science steps

A
  1. Subject Matter Expertise so we have enough expertise in the area that we want to ask about in order to formulate our questions.
  2. Cleaning and Formatting Data typically requires some programming
  3. Analyze Data typically requires stats and math knowledge
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What can you do with R

A
  • Access data
  • Experiment with the data
  • Analyze the data
  • Plot the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why are data scientists in so much demand?

A
  • Because most of the answers are not already outlined in textbooks.
  • A data scientist needs to be somebody who knows how to find answers to novel problems.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is data?

A
  • A set of values
  • In statistics, the population you are trying to discover something about
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a variable?

A

Measurements or characteristics of an item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a qualitative variable?

A

Measurements or information about qualities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a quantitative variable?

A

Measurements or information about quantities or numerical items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Common types of messy data

A
  1. Sequencing data
  2. Population census data
  3. Electronic Medical Records (EMR) or other large databases
  4. Geographic information system (GIS) data (mapping)
  5. Image analysis and image extrapolation
  6. Language and translations
  7. Website traffic
  8. Personal/Ad data (eg. Facebook, Netflix predictions, etc)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is sequencing data?

A
  • Data produced by sequencing machines
  • For example, DNA or RNA sequencing data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What format is sequencing data often found in?

A
  • FASTQ format
  • This is a raw file format produced by sequencing machines.
  • These files are often hundreds of millions of lines long.
17
Q

Why is Image analysis messy data?

A

There is a lot of information coded in an image or video and it has to be extracted.

18
Q

Why is census information considered messy data?

A
  • Almost all members of a country answer a set of standardized questions
  • When you have that many respondants, the data is large and messy
19
Q

Is data of secondary or primary importance?

A

Secondary
Data is important, but a good data scientist asks questions first and seeks out relevant data second.