Intro to DataScience Flashcards

(74 cards)

1
Q

What is Data Science?

A

Application of computational and statistical techniques on data to gain insight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between data and information

A

Data is unusable until organized, however information in the result of processed data when put into context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain what computer programming is

A

Create a sequence of instruction capable of automating a system to performing specific task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the main features of using IDE?

A

Availability of tools to test and debug

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does Jupyter Notebook stand for?

A

Julia, Python, and R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Whats the difference between structured and ustructured data?

A

Structured data is organized with predetermined set of rules.

Unstructured data is sets of data where it is difficult to determined predetermined sets of rules to organize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

XML vs JSON, which takes less storage? and what do they stand for?

A

JSON, as they don’t use end tags

Java Script Object Notation
Extensible Markup Language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the properties of a list, tuple, sets, and dictionaries

A

Lists: Ordered, changeable, allow duplicate

Tuple: Ordered, unchangeable, allow duplicate

Set: Unordered, Unchangeable, Unindexed, no duplicate (Unchangeable but you can add
or remove items)

Dictionary: Unordered, changeable, no duplicate (Ordered as of python 3.7, patch 3.6 and earlier stil consider it to be unordered)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Types of data structures that can contain different datatypes

A

Lists and Dataframes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to access components of a list?

A

Using the $ sign or [[ ]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to find the length of a string in R?

A

nchar()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a database

A

Collection of data stored in a computer system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a DBMS allow us to do?

A

Store, Query, Update, Manage, Control access to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Advantage of using a DBMS

A

Store massive amounts of data

Access to multiple users

Concurrency

Efficient Manipulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are iterations?

A

Command to order the computer to run the same commands repeatedly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 3 main types of iterations?

A

for, while, repeat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Can you rewrite a for loop with a while loop?

A

Yes, it also works vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Can you rewrite a while loop with repeat?

A

Yes, but the converse is not true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is debugging?

A

Task of fixing problems in our code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

State 3 Condition handling tools

A

withCallingHandlers()

tryCatch()

try()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

State 3 debugging tools

A

traceback()
options()
browser()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is defensive programming

A

Strategy of making a code fail in a well defined manner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

State the Fail-fast principle

A

Avoid Functions with non-standard evaluation

Avoid Functions that return different output based on the input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What objects are mutable?

A

List and Dictionaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Does aliasing work the same way in both Python and R?
R uses a copy-on-modify strategy, while Python rewrites the original copy
26
What is the difference between nums.sort() and sorted(nums)
nums.sort() alters the original list into a sorted list sorted(nums) only display the sorted version of the list without storing it to any variable
27
In OOP, what is a class?
Type of object that allows us to bundle data and functionality together Attributes are attached to maintain the state, and methods to modify
28
In OOP, What is encapsulation?
Bundling data and methods to restrict direct access of data to object
29
In OOP, What is inheritance
A child class is based on the parent class and has access to methods from the parent class
30
In OOP, what is polymorphism?
Methods in child class to behave differently from the parent class
31
What is data wrangling?
Implementation and design process to turn unstructured data for analytical process
32
What is the use of apply() function?
Manipulate data repeatedly without the use of writing loops
33
What function do we use to convert wide data to long format? and vice versa
wide to long : melt() in R or *.melt() and pivot_longer() in python long to wide: acast() and dcast() in R and pivot_table() in python
34
What is the difference between .loc[ ] and .iloc [ ]
.loc[ ] is a label-based way to get the specific values .iloc[ ] is an index-based way to get the specific values
35
How to handle missing values in Python?
.dropna() .fillna() .interpolate()
36
When using a negative integer for subsetting, what is the difference in their function in R and Python?
In R, we remove the values on the selected index In Python, the negative integer calls the value counted from the furthest right
37
What is the Work flow of DS project
1. Import and wrangle data 2. EDA and visualize 3. Consider several models and separate signal from noise 4. Compare models and inform future decisions
38
How do you visualize distributions?
Box plots, Violin Plots, Histogram, Density Plots (Kernel and ridgeline)
39
What graph do we use to explore the association between two continuous variables?
Scatter plot
40
What is the use of histogram and Kernel Density plot?
Provide information for the distribution of continuous variables
41
Difference between Violin Plot and Kernel Density plot?
Violin plot: empirical density of continuous var across categories of other vars KDP: provides empirical density of single variable
42
is ggplot2 the grammar of graphics? if not, why?
ggplot2 evolved from the ideas of the grammar of graphics
43
Difference between facet_grid and facet_wrap
Facet_grid plots all available plots Facet-wrap only display plots with actual values
44
What library do we use to visualize networks in python?
networkx package
45
What is a machine learning model
Algorithm that inputs data and outputs prediction based on parameters
46
What is the aim of machine learning?
The primary aim is to achieve a low test error ideally (but not necessarily) with a low training as well.
47
What is the typical sequence in machine learning?
``` Data Collection Data Wrangling Model Building Model Evaluation Model saving & testing ```
48
How to impute missing values?
Impute mean, median, mode Impute value from the observation Sample-based on histogram Remove cases with missing values
49
Difference between supervised and unsupervised machine learning?
Supervised ML: We have a single variable as target/response variable to predict Unsupervised ML: No single target/response variable
50
ML models for Regression
Linear, lasso, ridge regression
51
ML models for classification
Logistic regression Penalised logistic regression Support vector machine (svm)
52
ML models that work for any type of response variable
``` Random forest Gradient boosting Decision trees Gaussian process Neural networks ```
53
How do we evaluate regression tasks?
Mean Squared error (MSE)
54
Why do we use the Cross-validation method
To handle cases with a fortunate or unfortunate splits that happens by chance
55
Explain what tuning hyperparameters is
The process to determine optimal values of a parameter via cross-validation to accurately predict an outcome
56
What is sensitivity and specificity
Sensitivity is the rate of true positives Specificity is the rate of true negatives
57
How to evaluate classification task in ML
Using ROC curve and calculate area under the curve to determine the likelihood
58
What is the ROC Curve
A chart that illustrates the quality of a classification method by plotting its sensitivity (on the y-axis) vs the 1-specificity (x-axis) for a varying range of thresholds.
59
What SDLC model has no feature of revisiting previous versions?
Waterfall Model
60
What are the features of an iterative model
Rigid, but insights are gained from earlier iterations
61
What SDLC model should we use for a fast and flexible workframe?
Agile model
62
Explain the features of a V-shaped model
A testing phase is done before each implementation of the development phase
63
Explain what the DevOps model is
Most recent model | Software devs and engineer work from development until interaction with customers
64
Explain what a spiral SDLC model is
Combination of iterative and waterfall Most flexible as it can adopt multiple models based on risk patterns
65
State all types of software testing
Unit testing Integration testing Acceptance testing System testing
66
Can you write down unit tests before writing the code itself?
Yes
67
Should you write a documentation in distinct files from the main script?
Yes
68
Should you write a documentation in distinct files from the main script?
Yes
69
Advantage of writing down codes in the form of packages
Easy to share with other developers Modular structure ease-out debugging process
70
Typical Sequence of Git from creation of local repository to storing them in remote repository
1. Git init 2. Git add 3. Git Status 4. Git Commit 5. Git Remote 6. Git Push
71
What is the use for git checkout
Switch or change into a new branch
72
Explain what git diff does
Spots the difference between commits and working trees
73
What does git merge do?
Combine multiple sequences of commits into one unified history.
74
What is the difference between git clone and git pull?
Git clone: Creates a local copy from the remote repository Git pull: fetch and download content from remote to local repository to match the content