Chapter 4 Key Terms Flashcards

0
Q

algorithm

A

A step-by-step search in which improvement is made at every step until the best solution is found.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Adaptive resonance theory

A

An unsupervised learning method created by Stephen Grossberg.
ART is a neural network architecture that is aimed at being brainlike in unsupervised mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Apriori algorithm

A

The most commonly used algorithm to discover association rules by recursively identifying frequent itemsets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

area under the ROC curve

A

A graphical assessment technique for binary classification models where the true positive rate is plotted on the Y-axis and false positive rate is plotted on the X-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

artificial neural network (ANN)

A

Computer technology that attempts to build computers that operate like a human brain. The machines possess simultaneous memory storage and work with ambiguous information. Sometimes called, simply, a neural network.
See neural computing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

association

A

A category of data mining algorithm that establishes relationships about items that occur together in a given record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

axon

A

An outgoing connection (i.e. terminal) from a biological neuron.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

backpropagation

A

The best-known learning algorithm in neural computing where the learning is done by comparing computed outputs to desired outputs of training cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

bootstrapping

A

A sampling technique where a fix number of instances from the original data are sampled (with replacement) for training and the rest of the dataset is used for testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

business analyst

A

An individual whose job is to analyze business processes and the support they receive (or need) from information technology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

categorical data

A

Data that represent the labels of multiple classes used to divide a variable into specific groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

chromosome

A

A candidate solution for a genetic algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

classification

A

Supervised induction used to analyze the historical data stored in a database and to automatically generate a model that can predict future behavior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

clustering

A

Partitioning a database into segments in which the members of a segment share similar qualities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

confidence

A

In association rules, the conditional probability of finding the RHS of the rule present in a list of transactions where the LHS of the rule exists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

connection weight

A

The weight associated with each link in a neural network model.
Neural networks learning algorithms assess connection weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CRISP-DM

A

A cross-industry standardization process of conducting data mining projects, which is a sequence of six steps that starts with a good understanding of the business and the need for data mining project (i.e. the application domain) and ends with the deployment of the solution that satisfied the specific business need.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

data mining

A

A process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

decision trees

A

A graphical presentation of a sequence of interrelated decisions to be made under assumed risk. This technique classifies specific entities into particular classes based upon the features of the entities; a root followed by internal nodes, each node (including root) is labeled with a question, and arcs associated with each node cover all possible responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

dendrite

A

The part of a biological neuron that provides inputs to the cell.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

discovery-driven data mining

A

A form of data mining that finds patterns, associations, and relationships among data in order to uncover facts were previously unknown or not even contemplated by an organization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

distance measure

A

A method used to calculate the closeness between pairs of items in most cluster analysis methods. Popular distance measures include Euclidian distance (the ordinary distance between two points that one would measure with a rule) and Manhattan distance (also called the rectilinear distance, or taxicab distance, between two points).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

entrophy

A

A metric that measures the extent of uncertainty or randomness in a data set. If all the data in a subset belong to just one class, then there is no uncertainty or randomness in that data set, and therefore the entropy is zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

fuzzy logic

A

A logically consistent way of reasoning that can cope with uncertain or partial information. Fuzzy login is characteristic or human thinking and expert systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

genetic algorithm

A

A software program that learns in an evolutionary manner, similar to the way biological systems evolve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Gini index

A

A metric that is used in economics to measure the diversity of the population. The same concept can be used to determine the purity of a specific class as a results of a decision to branch along a particular attribute/variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

heuristics

A

Informal, judgmental knowledge of an application area that constitutes the rules of good judgment in the field. Heuristics also encompasses the knowledge of how to solve problems efficiently and effectively, how to plan steps in solving a complex problem, how to improve performance, and so forth.

27
Q

hidden layer

A

The middle layer of an artificial neural network that has three or more layers.

28
Q

hypothesis-driven data mining

A

A form of data mining that beings with a proposition by the user, who then seeks to validate the truthfulness of the proposition.

29
Q

information gain

A

The splitting mechanism used in ID3 (a popular decision-tree algorithm.

30
Q

interval data

A

Variables that can be measured on interval scales.

31
Q

k-fold cross-validation

A

A popular accuracy assessment technique for prediction models where the complete dataset is randomly split into k-manually exclusive subsets of approximately equal size. The classification model is trained and tested k times. Each time it is trained on all but one fold and then tested on the remaining single field. The cross-validation estimate or the overall accuracy of a model is calculated by simply averaging the k-individual accuracy measures.

32
Q

knowledge discovery in databases (KDD)

A

A machine-learning process that performs rule induction or a related procedure to establish knowledge from large databases.

33
Q

Kohonen’s self-organizing feature map

A

A type of neural network model for machine learning.

34
Q

learning algorithm

A

The training procedure used by an artificial neural network.

35
Q

link analysis

A

The linkage among many objects of interest is discovered automatically, such as the link between Web pages and referential relationships among groups of academic publication authors.

36
Q

machine learning

A

The process by which a computer learns from experience (e.g. using programs that can learn from historical cases).

37
Q

Microsoft Enterprise Consortium

A

Worldwide source for access to Microsoft’s SQL Server 2008 software suite for academic purposes–teaching and research.

38
Q

multi-layered perception (MLP)

A

Layered structure of artificial neural network where several hidden layers can be placed between the input and output layers.

39
Q

neural computing

A

An experimental computer design aimed at building intelligent computers that operate in a manner modeled on the functioning of the human brain.
See artificial neural network (ANN).

40
Q

neurons

A

A cell (i.e. processing element) of a biological or artificial neural network.

41
Q

nominal data

A

A type of data that contains measurements of simple codes assigned to objects as labels, which are not measurements.
For example the variable martial status can be generally categorized as (1) single, (2) married, and (3) divorced.

42
Q

numeric data

A

A type of data that represents the numeric values of specific variables. Examples of numerically valued variables include age, number of children, total household income (in U.S. dollars), travel distance (in miles), and temperature (in Fahrenheit degrees).

43
Q

ordinal data

A

Data that contain codes assigned to objects or events as labels that also represent the ran order among them.
For example, the variable credit score can be generally categorized as (1) low (2) medium, and (3) high.

44
Q

pattern recognition

A

A technique of matching an external pattern to a pattern stored in a computer’s memory (i.e. the process of classifying data into predetermined categories). Pattern recognition is used in inference engines, image processing, neural computing, and speech recognition.

45
Q

prediction

A

The act of telling about the future.

46
Q

processing element (PE)

A

A neuron in a neural network.

47
Q

RapidMiner

A

A popular, open-source, free-of-charge data mining software suite that employs a graphical enhanced user interface, a rather large number of algorithms, and a variety of data visualization features.

48
Q

ratio data

A

Continuous data where both differences and ratios are interpretable.
The distinguishing feature of a ratio scale is the possession of a nonarbitrary zero value.

49
Q

regression

A

A data mining method for real-wold prediction problems where the predicted values (i.e. the output variable or dependent variable) are numeric (e.g. predicting the temperature for tomorrow as 68 degrees).

50
Q

result (outcome) variable

A

A variable that expresses the result of a decision (e.g. one concerning profit), usually one of the goals of a decision-making problem.

51
Q

SAS Enterprise Miner

A

A comprehensive and commerical data mining software tool developed by SAS Institute.

52
Q

SEMMA

A

An alternative process for data mining projects proposed by the SAS Institute. The acronym “SEMMA” stands for “sample, explore, modify, model, and assess.”

53
Q

sensitivity analysis

A

A study of the effect of a change in one or more input variables on a proposed solution.

54
Q

sequence mining

A

A pattern discovery method where relationships among the things are examined n terms of their order of occurrence to identify associations over time.

55
Q

sigmoid function

A

An S-shaped transfer function in the range of 0 to 1.

56
Q

simple split

A

Data is partitioned into two manually exclusive subsets called a training set and a test set (or holdout set). It is common to designate two-thirds of the data as the training set ad the remaining one-third as the test set.

57
Q

SPSS PASW Modeler

A

A very popular, commercially available, comprehensive data, text, and Web mining software suite developed by SPSS (formerly Clementine).

58
Q

summation function

A

A mechanism to add all the inputs coming into a particular neuron.

59
Q

supervised learning

A

A method of training artificial neural networks in which sample cases are shown to the network as input, and the weights are adjusted to minimize the error in the outputs.

60
Q

support

A

The measure of how often products and/or services appear together in the same transaction; that is, the proportion of transactions in the dataset that contain all of the products and/or services mentioned in a specific rule.

61
Q

support vendor machines (SVM)

A

A family of generalized linear models, which achieve a classification or regression decision based on the value of the linear combination or input features.

62
Q

synapse

A

The connection (where the weights are) between processing elemtns in a neural network.

63
Q

transformation (transfer) function

A

In a neural network, the function that sums and transforms inputs before a neuron fires. It shows the relationship between the internal activation level and the output of a neuron.

64
Q

unsupervised learning

A

A method of training artificial neural network in which only input stimuli are shown to the network, which is self-organizing.

65
Q

Weka

A

A popular, free-of-charge, open-source suite of machine-learning software written in Java, developed at the University of Waikato.