Other questions Flashcards

(44 cards)

1
Q

What is sensitivity ?

A

How far you are right in +VE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is True positive value ?

A

Sensitivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is specificity ?

A

How left you are in -VE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is true negative value ?

A

Specificity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the different types of distribution ?

A

.Normal
.Skewed
.Uniform
.Bi-modal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What calculation used to calculate an outlier ?

A

Q1-1.5xIQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to handled missing data ?

A

.Remove the entire sample
.Imputation make a reasonable guess as to what to what it may be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is included in central tendency ?

A

Mean, median, mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is variability ?

A

The spread of data
Variance, standard deviation, range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

CRISP-DM life cycle

A

A business centric approach to the data science loop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is host datasheet ?

A

Allow download as hosted files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is API?

A

Application program interface
Allows you to retrieve info through predetermined functioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is web scraping?

A

Scripts used to pull info from websites

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are reals ?

A

Represented as R, it means 3.3, 3.2 3.1 etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is structured data ?

A

Dictionary of keys and values
Allows for easy data retrieved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is quantative data ?

A

Numerical data, which can either be discrete or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is qualitative data ?

A

Or categorical data
Doesn’t have an inherit numerical value,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Give examples of qualiative data ?

A

Phone number
Post code

19
Q

What is nominal values ?

A

Qualiative data
. Hair colour
. Nationality
. Name

20
Q

What are ordinal values ?

A

Qualitve data
. Has an order
. Likely, not likely, unlikely…

21
Q

What is a simple hypothesis?

A

State a relation between the dependent and independent variable

22
Q

What is a complex hypothesis ?

A

State a relationship between multiple dependent and and independent variables

23
Q

What is a directional hypothesis ?

A

Predicts how independent variables affect the dependent variables

24
Q

What is unstructured ?

A

Does not conform to a defined model structure

25
What is data ?
A small piece of information A recorded observation
26
What is atomic data ?
Primitive data type Cannot be broken down further
27
What is composite data ?
Can be broken down into atomic or more composite String, lists etc
28
What is normalizing a dataset ?
.Ensures all features contribute equality .Use when the model needs equal weight (k means)
29
What is standardising a dataset ?
.Rescaling data so that it has a mean of 0 and a standard deviation of 1 .
30
What are the two types of variables ?
Categorical and numeric
31
What types of categorical data can you have ?
Nominal, no order (gender) Ordinal, ordered (good - bad)
32
What types of numeric data can you have ?
Continuous, infinite like height Discrete, number of people
33
What is a global optimum ?
Absolute best solution
34
What is the scale of normalization? and when is it best used ?
0-1 When data has a fixed range
35
What is the downside of normalization?
Its sensitive to outliers
36
When is standarization best used ?
Data may go beyond the sample range, handles outliers better
37
Why reduce dimensionality ?
Curse of dimensionality = too many dimensions .Hard to visualize .Slower to computer .Possible irrelevant features
38
What is true a true positive ?
Corrected predicted positive cases
39
What is a ROC curve and what does it plot ?
Helps you see how well a classifier performs across thresholds TPR vs FPR
40
What's the issue with training data to evaluate?
You're testing on data the model has already seen
41
What is underfitting ?
High bias, low variance (too simple)
42
What is overfitting ?
Low bias, high variance (too complex)
43
How to guard against overfitting ?
Split the data: 80% training set 20% test set
44