Defining Data science Flashcards

1
Q

What is data science

A

Art of processing data to find answers to your question

Analyzing data and trying to get answers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What we can get from data science

A

1- We can be descriptive ( summarize characteristics of data set, no interpretation)
2- We can be exploratory ( explore patterns, trends and relationships within a dataset; we use to generate hypothesis for future investigation)
3- It can be predictive ( in order to predict an outcome)
4- It can be inferential ( derive conclusions about data set)
5- It can be “causal” not “casual” (if we change one factor will it lead to the change of other factors
6- About underlying mechanisms of the observed patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Framework of doing data science:

A
1- Problem Identification
2- Data discovery (through screening , inventory, or aquisition)
3- Data ingestion and governance
4- Data wrangling
5- Fitness for use
6- Statistical modelling and analysis
7- Communication and dissemination
8- All that must have Ethics Review
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Factors affecting problem Identification

A

1- Theories and Hypothesis
2- Domain expertise
3- Domain knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Factors affecting data discovery

A

1- Potential data sources

2- Data integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Factors affecting data discovery

A

1- Potential data sources

2- Data integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data collection types

A

1- Statistically designed ( surveys, experiments, remote sensing)
2- Adminstrative ( governmental agnecies, registered student data)
3- Oppurtunity ( from the internet - API)
4- Procedural ( Related to process and policies like change in insurance policy)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data Wrangling

A

Transforming raw data into appropriate form for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data governance and ingestion

A

Data governance: establishment and adherence to rules regarding data access, dissemination and destruction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data ingestion

A

Bringing data into data management platforms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Fitness for use assessment

A

Assessing the constraints on data by the statistical methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data analysis type

A

1-Summarization
2-Visualization
3-Classification: predicting category for new data
4- Regression: Predicting quantitative value for new observation
5- Clustering finding unlabeled subgroups
6- Estimation: taking measurements for small numbers in a large group and making a good guess for the large group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Variable types:

Quantitative and Qualitative

A

Quantitative : measurements are close in value and nature like pressure of wind
Qualitative: assumes values in finite set , also known as categories, discrete variables and factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Attribute:

A

Data field representing characteristic of data object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Qualitative attribute- Nominal

A

Like patient ID , occupation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Qualitative attribute - Binary

A

Can have two value like True or False

17
Q

Qualitative attribute - ordinal

A

Have meaningful order like pain level moderate severe minimal

18
Q

Quantitative - measurable quantity ;

Interval- scaled measured on an equal sized units

A

1- Do not have a zero-point

2- Mean, median, and mode are meaningful

19
Q

Quantitative - Ratio-scaled

A

ordered integer values with a zero point like weight

20
Q

Predicting output using input

A

Regression for quantitative outputs

Classification for qualitative output

21
Q

Data frame is like a table. they are rectangular objects containing data

A

True

22
Q

Every column of data frame is a vector - We must have same vector length over a data frame

A

True

23
Q

Vector vs list

A

Vector: all data same type
List: different data types: integer, character, logical, list