Chapter 9: Big Data Analytics for Managing Risk - Vocabulary Flashcards

(45 cards)

1
Q

Big data

A

Sets of data that are too large to be gathered and analyzed by traditional methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Structured data

A

Data organized into databases with defined fields, including links between databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Unstructured data

A

Data that is not organized into predetermined formats, such as databases, and often consists of text, images, or other nontraditional media

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data science

A

An interdisciplinary field involving the design and use of techniques to process very large amounts of data from a variety of sources and to provide knowledge based on the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Internal data

A

Data that is owned by an organization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

External data

A

Data that belongs to an entity other than the organization that wishes to acquire and use it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Economic data

A

Data regarding interest rates, asset prices, exchange rates, the Consumer Price Index, and other information about the global, the national, or a regional economy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Geodemographic data

A

Data regarding classifications of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data-driven decision making

A

An organizational process to gather and analyze relevant and verifiable data and then evaluate the results to guide business strategies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Predictive model

A

A model used to predict an unknown outcome by means of a defined target variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Training data

A

Data that is used to train a predictive model and that therefore must have known values for the target variable of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Target variable

A

The predefined attribute whose value is being predicted in a data analytical model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Class label

A

The value of the target variable in a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Attribute

A

A variable that describes a characteristic of an instance within a model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Instance (example)

A

The representation of a data point described by a set of attributes within a model’s dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Overfitting

A

The process of fitting a model too closely to the training data for the model to be effective on other data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Holdout data

A

In the model training process, existing data with a known target variable that is not used as part of the training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Generalization

A

The ability of a model to apply itself to data outside the training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Accuracy

A

In model performance evaluation, a model’s correct predictions divided by its total predictions

20
Q

Precision

A

In model performance evaluation, a model’s correct positive predictions divided by its total positive predictions

21
Q

Recall

A

In model performance evaluation, a model’s correct positive predictions divided by the sum of its correct positive predictions and incorrect negative predictions

22
Q

F-score

A

In statistics, the measure that combines precision and recall and is the harmonic mean of precision and recall

23
Q

Supervised learning

A

A type of model creation, derived from the field of machine learning, in which the target variable is defined

24
Q

Unsupervised learning

A

A type of model creation, derived from the field of machine learning, that does not have a defined target variable

25
Machine learning
Artificial intelligence in which computers continually teach themselves to make better decisions based on previous results and new data
26
Segmentation
An analytical technique in which data is divided into categories
27
Association rule learning
Examining data to discover new and interesting relationships among attributes that can be stated as business rules
28
Classification tree
A supervised learning technique that uses a structure similar to a tree to segment data according to known attributes to determine the value of a categorical target variable
29
Leaf node
A terminal node of a classification tree that is used to classify an instance based on its attributes
30
Arrow
A pathway in a classification tree
31
Node
A representation of a data attribute
32
Algorithm
An operational sequence used to solve mathematical problems and to create computer programs
33
Linear regression
A statistical method to predict the numerical value of a target variable based on the values of explanatory variables
34
Generalized linear model (GLM()
A statistical technique that increases the flexibility of a linear model by linking it with a nonlinear function
35
Link function
A mathematical function that describes how the random values of a target variable depend on the mean value generated by a linear combination of the explanatory variables (attributes)
36
Cluster analysis
A model that determines previously unknown groupings of data
37
Artificial Intelligence (AI)
Computer processing or output that simulates human reasoning or knowledge
38
Social network analysis
the study of the connections and relationships among people in a network
39
Neural network
A data analysis technique composed of three layers, including an input layer, a hidden layer with nonlinear functions, and an output layer, that is used for complex problems
40
Complex claim
A claim that contains one or more characteristics that cause it to cost more than the average claim
41
Information gain
A measure of the predictive power of one or more attributes
42
Recursively
Successively applying a model
43
Root node
The first node in a classification tree
44
Combination of nodes
A representation of a data attribute in a classification tree
45
Lift
In model performance evaluation, the percentage of positive predictions made by the model divided by the percentage of positive predictions that would be made in the absence of the model