Informatics-Based Research Processes: Data Mining and Artificial Intelligence Flashcards
(17 cards)
is an iterative process that explores and models big data to identify patterns and provide meaningful insights.
data mining
is the term that describes the use of voting and averaging in predictive data mining to synthesize the predictions from many models or methods or the use of the same type of model on different data.
bagging
is what the term infers—a means of increasing the
power of the models generated by weighting the combinations of
predictions from those models into a predicted classification.
boosting
is an approach that uses mainly graphical techniques to gain insight into a data set. Its goal varies based on the purpose of the analysis, but it can be applied to the data set to extract variables, detect outliers, or identify patterns.
Exploratory data analysis (EDA)
is a subset of AI that permits computers to learn either inductively or deductively
Machine learning
is the process of reasoning and making generalizations or extracting patterns and rules from huge data sets—that is, reasoning from a large number of examples to a general rule.
Inductive machine learning
moves from premises that are assumed true to conclusions that must be true if the premises are true.
Deductive machine learning
combines the predictions from several models. It is helpful when several models are used in the same project. The predictions from the different classifiers can be included as input into the meta-learning.
Meta-learning
The goal is to synthesize these predicted classifications to generate a final, best-predicted classification, which is a process also referred to as
stacking
represent nonlinear predictive models.
neural netwroks
is so named because the sets of decisions form a
tree-shaped structure
decision tree
is a technique in which the user manually chooses
specific data points or subsets of data on an interactive data display
brushing
involves selection of the modeling methods and their
application to the prepared data set.
modeling
aims to decrease discrepancies in
business and manufacturing processes through dedicated
improvements
six sigma
application of computer technology and three-dimensional modeling
to large sets of biological data”
bioinformatics
is the activity of analyzing, synthesizing, and interpreting biological data to develop processes, algorithms, or models in order to comprehend and appreciate biological systems and their interrelationships. Computational biology focuses on “theoretical models, computational simulations, and mathematical models for statistical inference”
computational biology