{ "@context": "https://schema.org", "@type": "Organization", "name": "Brainscape", "url": "https://www.brainscape.com/", "logo": "https://www.brainscape.com/pks/images/cms/public-views/shared/Brainscape-logo-c4e172b280b4616f7fda.svg", "sameAs": [ "https://www.facebook.com/Brainscape", "https://x.com/brainscape", "https://www.linkedin.com/company/brainscape", "https://www.instagram.com/brainscape/", "https://www.tiktok.com/@brainscapeu", "https://www.pinterest.com/brainscape/", "https://www.youtube.com/@BrainscapeNY" ], "contactPoint": { "@type": "ContactPoint", "telephone": "(929) 334-4005", "contactType": "customer service", "availableLanguage": ["English"] }, "founder": { "@type": "Person", "name": "Andrew Cohen" }, "description": "Brainscape’s spaced repetition system is proven to DOUBLE learning results! Find, make, and study flashcards online or in our mobile app. Serious learners only.", "address": { "@type": "PostalAddress", "streetAddress": "159 W 25th St, Ste 517", "addressLocality": "New York", "addressRegion": "NY", "postalCode": "10001", "addressCountry": "USA" } }

DS Foundations Part 1 Flashcards

(30 cards)

1
Q

What is the difference between structured and unstructured data?

A

Structured data fits into tables with rows and columns; unstructured data includes text, images, audio, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are examples of categorical data?

A

Gender, product category, ZIP code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is ordinal data?

A

Categorical data with a meaningful order, but no fixed spacing (e.g., rankings, Likert scales).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is nominal data?

A

Categorical data without an inherent order (e.g., colors, names).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between discrete and continuous data?

A

Discrete data has countable values; continuous data can take any value in a range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Exploratory Data Analysis (EDA)?

A

A process to summarize data characteristics using visualization and statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the purpose of EDA?

A

To understand data distributions, spot anomalies, and form hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What plot is best for showing distribution of a numerical variable?

A

Histogram or boxplot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What plot is best for identifying relationships between two numerical variables?

A

Scatter plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does a boxplot show?

A

Median, quartiles, and potential outliers of a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the mean?

A

The average of all data values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the median?

A

The middle value when data is sorted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When is the median better than the mean?

A

When the data is skewed or contains outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is standard deviation?

A

A measure of how much values vary around the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is IQR?

A

The interquartile range: Q3 − Q1, shows middle 50% spread.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is skewness?

A

A measure of the asymmetry of a distribution.

17
Q

What is kurtosis?

A

A measure of the ‘tailedness’ of a distribution.

18
Q

What does a right-skewed distribution look like?

A

It has a long tail on the right; mean > median.

19
Q

What is a uniform distribution?

A

A distribution where all outcomes have equal probability.

20
Q

What does a bimodal distribution suggest?

A

There may be two subgroups or populations in the data.

21
Q

What are common data quality issues?

A

Missing values, duplicates, outliers, inconsistent formats.

22
Q

What is a missing value?

A

An entry in a dataset that has no recorded value.

23
Q

How can you handle missing data?

A

Methods include deletion, mean/median imputation, or predictive models.

24
Q

What is data duplication?

A

When the same observation appears more than once unnecessarily.

25
What is outlier detection?
The process of identifying data points that are significantly different from others.
26
Why is data ethics important?
To protect individual rights, ensure fairness, and prevent misuse of data.
27
What is informed consent in data collection?
When individuals agree to data use with full knowledge of risks and purpose.
28
What is algorithmic bias?
When data or models systematically favor certain groups over others.
29
What is the difference between correlation and causation?
Correlation shows association; causation implies one variable influences another.
30
Why is context important in data interpretation?
Because the same metric can mean different things in different domains.