Chapter 9: Big Data Analytics for Managing Risk - Vocabulary Flashcards
(45 cards)
Big data
Sets of data that are too large to be gathered and analyzed by traditional methods
Structured data
Data organized into databases with defined fields, including links between databases
Unstructured data
Data that is not organized into predetermined formats, such as databases, and often consists of text, images, or other nontraditional media
Data science
An interdisciplinary field involving the design and use of techniques to process very large amounts of data from a variety of sources and to provide knowledge based on the data
Internal data
Data that is owned by an organization
External data
Data that belongs to an entity other than the organization that wishes to acquire and use it
Economic data
Data regarding interest rates, asset prices, exchange rates, the Consumer Price Index, and other information about the global, the national, or a regional economy
Geodemographic data
Data regarding classifications of a population
Data-driven decision making
An organizational process to gather and analyze relevant and verifiable data and then evaluate the results to guide business strategies
Predictive model
A model used to predict an unknown outcome by means of a defined target variable
Training data
Data that is used to train a predictive model and that therefore must have known values for the target variable of the model
Target variable
The predefined attribute whose value is being predicted in a data analytical model
Class label
The value of the target variable in a model
Attribute
A variable that describes a characteristic of an instance within a model
Instance (example)
The representation of a data point described by a set of attributes within a model’s dataset
Overfitting
The process of fitting a model too closely to the training data for the model to be effective on other data
Holdout data
In the model training process, existing data with a known target variable that is not used as part of the training data
Generalization
The ability of a model to apply itself to data outside the training data
Accuracy
In model performance evaluation, a model’s correct predictions divided by its total predictions
Precision
In model performance evaluation, a model’s correct positive predictions divided by its total positive predictions
Recall
In model performance evaluation, a model’s correct positive predictions divided by the sum of its correct positive predictions and incorrect negative predictions
F-score
In statistics, the measure that combines precision and recall and is the harmonic mean of precision and recall
Supervised learning
A type of model creation, derived from the field of machine learning, in which the target variable is defined
Unsupervised learning
A type of model creation, derived from the field of machine learning, that does not have a defined target variable