1: Chapter 1 (Textbook) Flashcards
Define Data Mining.
Data mining is the process of discovering interesting patterns and knowledge from large amounts of data.
What is Knowledge Discovery in Data (KDD)?
Knowledge Discovery in Data (KDD) refers to the overall process that includes data preparation, search for patterns, knowledge evaluation, and refinement.
Explain Data Cleaning.
Data cleaning involves the removal of noise and inconsistent data from the database to prepare high-quality data.
What is a Data Warehouse?
A data warehouse is a central repository of information, collected from multiple sources and stored under a unified schema at a single site to support management’s decision-making process.
Describe Data Integration.
Data integration involves combining data from multiple sources into a coherent data store to provide a unified view of these data.
What does Data Selection entail?
Data selection is retrieving relevant data from the database based on the analysis task.
Define Data Transformation.
Data transformation is the process of converting data into appropriate forms for mining.
What is Pattern Evaluation in data mining?
Pattern evaluation involves identifying the truly interesting patterns representing knowledge.
What does Knowledge Presentation involve in data mining?
Knowledge presentation uses visualization and knowledge representation techniques to present the mined knowledge to users, making it understandable and useful.
Explain the difference between Data Characterization and Data Discrimination.
Data characterization aims to provide a general description of a dataset, focusing on main characteristics. Data discrimination compares the features of one class of data against another to highlight differences.
What are the typical applications of Data Mining?
Typical applications include business intelligence, web search engines, market analysis, healthcare data analysis, and more, where patterns and insights extracted can significantly influence decisions and strategies.
What challenges do Data Mining face?
Challenges include handling big data, integrating diverse data types, mining knowledge in multidimensional space, and ensuring privacy and security of data.
Define “Association Analysis” in data mining.
Association analysis is a type of data mining that involves finding interesting associations or correlation relationships among a large set of data items.
What is “Classification” in data mining?
Classification is the process of finding a model that describes and distinguishes data classes or concepts for the purpose of being able to use the model to predict the class of objects whose class label is unknown.
Define “Regression” in the context of data mining.
Regression is used to predict missing or unavailable numerical data values, rather than class labels, by modeling continuous-valued functions.
What is “Clustering”?
Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups.
Explain “Outliers” in data mining.
Outliers are data objects that do not comply with the general behavior or model of the data. They can be seen as exceptions or anomalies.
What are the primary steps involved in the data mining process?
The primary steps include data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation, and knowledge presentation.
What role does “Data Cleaning” play in data mining?
Data cleaning helps in improving the quality of data by removing noise and handling missing or inconsistent data.
How is “Data Integration” important in data mining?
Data integration is crucial as it combines data from different sources, providing a unified view that can be more effectively analyzed.
Why is “Data Selection” important?
Data selection is critical because it involves choosing the relevant portion of data necessary for the mining process, thus ensuring efficiency and effectiveness.
Describe the significance of “Data Transformation” in the mining process.
Data transformation converts data into formats suitable for mining, facilitating easier and more effective analysis.
What does “Pattern Evaluation” entail?
Pattern evaluation involves determining which patterns produced by the data mining process are actually interesting and potentially useful.
What is the goal of “Knowledge Presentation”?
The goal of knowledge presentation is to visualize and present the results of data mining in an understandable manner to the end-user.