Lecture 1 Flashcards

1
Q

Data

A

A set of discrete, objective facts about events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dataset

A

a collection of data with a defined structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data point

A

a single instance in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Attribute

A

A single property of the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data science

A

a collection of techniques used to extract value from data

process of building a representative model that fits the observational data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Model

A

representation of a relationship between variable in a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

modeling

A

process in which a representative abstraction is built from the observed dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data science model serves two purposes

A
  1. it predicts the output (interest rate) based on the new and unseen set of input variables
  2. the model can be used to understand the relationship between the output variable and all the input variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

techniques used in the steps of a data science process

A
- descriptive statistics 
exploratory visualization 
dimensional slicing 
hypothesis-testing 
data engineering 
business intelligence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Supervised model

A

supervised data science tries to infer a function or relationship based on labeled training data and uses this function to map new unlabeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Unsupervised data

A

uncovers hidden patterns in unlabeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Classification and regression techniques

A

predicting a target variables based on input variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Clustering

A

the process of identifying the natural groupings in a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

recommendation engines

A

the systems that recommend items to the users based on individual user preference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

anomaly or outlier detection

A

identifies the data points that are significantly different from other data points in a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

time-series forecasting

A

the process of predicting the future value of a variable based on past historical values that may exhibit a trend and seasonality

17
Q

text mining

A

a data science application where the input data is text which can be in the form of documents, messages, emails or web pages

18
Q

feature selection

A

A process in which attributes in a dataset are reduced to a few attributes that really matter

19
Q

association analysis

A

identifying pairs of items that are purchased together, so that specific items can be bundled or placed next to each other

20
Q

deep learning

A

increasingly used for classification and regression problems

21
Q

Big data

A

High-volume, high-velocity, and or high variety information that requires new forms of processing to enable enhanced decision making, insight discovery and process optimization

22
Q

Big data characteristics (5vs)

A

Volume, velocity, variety, veracity, and value

23
Q

volume

A

increase in data size coming from infinite sources

24
Q

velocity

A
  • increase in the speed of input and output data and the ability to quickly incorporate new data
  • ability to quickly add new data sources
25
Q

Variety

A

increasing the range of diversity and data structure

  • structured data,
  • semi-structured data,
  • unstructured data
26
Q

Veracity

A

valid and truthful data that provides the right direction for future decisions and actions

  • data freshness
  • quality dimensions (challenges)
  • trust, quality& validity of data
27
Q

Value

A

data that has high veracity provides higher value

- usefulness of data for an enterprise

28
Q

Data science tends to fall into three broad categories

A

investigating, predicting, and optimizing

29
Q

Data science tasks

A
regression 
clustering 
association analysis 
anomaly detection 
recommendation engines 
deep learning 
time series forecasting 
text mining 
feature selection 
classification