Test Prep Flashcards Preview

Information Systems > Test Prep > Flashcards

Flashcards in Test Prep Deck (57)
Loading flashcards...
1
Q

Digital Transformation

A

Digital Technology creates Disruptive Digital Forces which demands Digital Transformation.

Digital transformation is inevitable for firms.

2
Q

Nature of Big Data

A
  • Volume:Large-capacity data storage is not only the problem of data integration but also a critical challenge for analysis.
  • Velocity: Data’s patency, availability, and liquidity become critical; velocity indicates the speed of data changes as well as the need for timely data access and processing.
  • Variety: Firms have more new data for analysis, such as social media, mobile data, various databases that store hierarchical data, text records, e-mail, metering data, video, images, audio, stock ticker data, and financial transactions.
3
Q

Role of IT in organizations

A

Support ->
Enhance Effectiveness & Efficeny ->
Value Creation

4
Q

Data Information - Knowledge Pyramid

A

(Bottom up) -

Data -> Information -> Knowledge

5
Q

Organization and Information Systems

A

A growing interdependence between a firm’s information systems and its business capabilities and operations: Changes in strategy, rules, and business processes increasingly require changes in hardware, software, databases, and telecommunications; often, what an organization can do depends on what its information systems will permit it to do.

6
Q

Uses of Information Technology by Firms

A

• Cost reduction through technology-enabled automation.
• Improve firm competitiveness; e.g., IT as a strategic driver. Empowerment: Customers, suppliers, and internal employees.
• IT-enabled self-service practices and models.
• Operations excellence by increasing organization effectiveness and
efficiency; e.g., decision making, communications and coordination.
• Accelerate information velocity and make the organization more
agile, adaptive, and competitive.
• Effective customer relationship management for customer intimacy
• Effective supply chain integration for supplier intimacy.
• Enable new product development, service design, and business models.

7
Q

IT Empowers

A

Supplers - Firm - Customers

8
Q

Identifying Valuable Customers: RFM Analysis

A

Recency
Frequency
Monetary

Valuable customers” are more important to a firm !

9
Q

Identifying Valuable Customer: Clustering Analysis in N-Dimensional Space

A
Growth Potential
Making Referrals
Revenue Contributions
Purchase Frequency
Profit Margin
10
Q

Customer Life Cycle

A

Acquire, Enhance, Retain

11
Q

Supply Chain: Bullwhip Effect

A
  • Small changes in actual demand create much grater problems for upstream partners. • Lack of trust and information sharing among channel partners !!
  • Stockpiling translates to huge costs because firms manage inventories “just-in-case.” • Manufacturers cannot plan production efficiently and effectively.
12
Q

Data and Firm Performance

A

Data (Information) Visibility
Data (Information) Accessibility
Data (Information) Analytics Capability
Information Velocity

13
Q

Data quality:

A

The totality of features and characteristics of data that bears on their ability to satisfy the given business purposes.

14
Q

Total Data Quality Management

A

Define
Measure
Analyze
Improve

15
Q

Data Governance

A
  • Data governance establishes a formal structure and processes by which an organization manages all important issues surrounding data, including data quality as measured by accuracy, completeness, consistency, security, availability.
  • In addition to data quality management, data governance also defines data ownership, data access rights, data sharing mechanisms and data audits, across different departments and business units, which are critical to the entire organization.
  • Central to effective data governance are formal structure, commitment, technology, processes, and accountability.
16
Q

Data-Centric Information Value Chain

A
  1. Increasing Visibility
  2. Mirroring Capability
  3. Create Value
17
Q

Key Challenges in IT Management

A
  • A young technology (discipline).
  • Rapid technological advancements and changes. • Deep penetrations in all aspect of organizations. • Widening business-technology gap.
  • Increasing specialization and sub-specialization. • Shifting focuses, disruptive technology.
18
Q

Key Challenges in IT Management

A
  • A young technology (discipline).
  • Rapid technological advancements and changes. • Deep penetrations in all aspect of organizations. • Widening business-technology gap.
  • Increasing specialization and sub-specialization. • Shifting focuses, disruptive technology.
19
Q

Business Intelligence

A

(BI) refers to use of technology and statistical techniques to gather, analyze large amounts of data to support business decision making; i.e., discovering important patterns and phenomena for interpretation and business actions.

20
Q

Data mining:

A

A common technological/computational approach to extract business intelligence from vast amounts of (high-quality) data.

Data mining: A process that extracts previously unknown, interesting, valid, and actionable data patterns (knowledge) from a large set of data.

21
Q

Web mining:

A

Discovery and analysis of useful patterns and information from the Web (e.g., Web resources, Web structures, clickstream data); firms can use Web mining to better understand customer behaviors, evaluate the effectiveness of a website, or manage marketing campaigns.

22
Q

Data Mining: General Process

A
Selection ->
Preprocessing ->
Transformation -> 
Data Mining -> 
Interpretation/Evaluation
23
Q

Data Mining: A High-Level View

A
  1. Prepare data
  2. Analyze Mining Results
  3. Collect Business Outcomes and Assimilate Knowledge
  4. Learning and Feedback
24
Q

Cluster analysis

A

divides a data set into mutually exclusive, distinct
groups (subgroups) such that members of each group are as close together as possible to one another, and the different groups are as far apart as possible.

25
Q

Association pattern/rule analysis

A

reveals the degree to which variables in a data set are associated with one another, in terms of intensity and frequency; e.g., “25% of the time, events A and B happened the same time”; “15% the time, customer purchased X and Y together”; association rule analysis is also known as “market basket analysis” (analyzing products the customer purchased in a transaction).

26
Q

Classification analysis:

A

Assign an instance (example) to one of the predefined outcome classes; i.e., output bins.

27
Q

Statistical analysis;

A

e.g., correlation, distribution, variance analysis.

28
Q

Data Mining: Examples

A

Associations are occurrences linked to a single event.
• In sequences, events are linked over time.
• Classification recognizes patterns that describe the group to which an item belongs by examining existing items that have been classified and by inferring a set of rules.
• Clustering works in a manner similar to classification when no groups have yet been defined.
• Although associations and classifications involve predictions, forecasting uses predictions in a different way; it uses a series of existing values to forecast what other values will be.

29
Q

Web Mining: Overview

A
  • Web mining, also known as knowledge discovery from Web (KDW), refers to the process of using data mining techniques and their extensions to analyze Web data to discover patterns and knowledge important to firms.
  • Web mining is about discovering and extracting information or knowledge from Web data/documents/services.
30
Q

Data Mining: Common Pitfalls

A
  • Not understanding business needs and problems !!
  • Careless handling of data: Excessively quantified data, miscoded data, performing analysis without taking precautions against sampling errors, loss of precision due to improper rounding or estimations of data value, and inappropriate handling of missing values.
  • Lack of data mining model development and validation.
  • Believing in alchemy.
  • Insufficient participation by business domain experts: In principle, business domain experts should lead data mining operations in the organization, not IT professionals.
31
Q

Data Mining: Complexity

A

• Scalability: Advances in data generation and collection, data sets with
sizes of gigabytes, terabytes, or even petabytes become common !!
• High dimensionality: Encounter data sets with hundreds or thousands of attributes !!
• Heterogeneous and complex data: Data mining in widely used in business, science, medicine, and other fields, the needs for techniques capable of handling heterogeneous data and complex attributes grow.
• Data ownership and distribution: Data is geographically distributed among resources owned by distinct entities.
• Non-traditional analysis: Current data analysis tasks often require the generation and evaluation of thousands of hypotheses, and consequently, the development of some data mining techniques has been motivated by
the desire to automate the process of hypothesis generation and evaluation.

32
Q

Association pattern/rule analysis (

A

Association pattern/rule analysis (market basket analysis) discovers interesting co-occurrence of items from a set of transactions, each of which contains a collection of items.

33
Q

Itemset:

A

A set of item or items; e.g., {egg, milk}.

34
Q

Cardinality (or size) of an itemset:

A

The exact number of items in an itemset.

35
Q

Support of an itemset:

A

The ratio between the number of transactions that includes all the items in an itemset and the total number of transactions under analysis.

36
Q

Support of X→Y

A

Support of X→Y is obtained from dividing the number of transactions that contain X∩Y by the total number of transactions in the data

37
Q

Confidence of X→Y

A

Confidence of X→Y is obtained from dividing the number of transactions that contain X∩Y by the number of transactions that contain X

38
Q

Downward Closure Property of Support

A
  • When the support of an itemset (A) is less than the specified minimum support, then any itemsets or association patterns that contain this itemset (A) should not be considered because the support definitely will NOT satisfy the minimum support threshold; i.e., Apriori algorithm.
  • This analysis technique is also known as minimum support pruning, built on the downward closure property of support.
39
Q

Virtual Items

A

• Virtual items allows association rule analysis to use
information beyond transaction items.
• Different stores exhibit different selling patterns because of geographic differences, management effectiveness, and demographic characteristics.
• Virtual item may be used to capture payment method; e.g., cash, credit card or check.
• Virtual item can be used to represent store location (type) where a purchase was made; allowing analyses the purchase behaviors in different store locations (types) of a company.

40
Q

Apriori Algorithm

A

Step 1: Generate large 1-itemsets
Step 2: Generate candidate 2-itemsets
Step 3: Generate large 2-itemsets
Step 4: Construct association rules from large 2-itemsets
Step 5: Repeat step 2 for combining candidate 3-itemsets

41
Q

Clustering Analysis for Market

Segmentation: Basic Steps

A
  1. Formulate the segmentation problem and select the variables that we want to use as the basis for clustering.
  2. Compute the distance customers along the selected variables.
  3. Apply the clustering procedure to the chosen distance measure.
  4. Decide the number of clusters.
  5. Map and interpret clusters and draw conclusions; e.g., illustrative techniques include perceptual maps that are useful for firm’s interpreting the resulting clusters.
42
Q

k-Means Clustering: Example

A

Centroid of a cluster is the average of all the data points in that cluster.

43
Q

K-Means clustering: General Process

A
  1. Choose the number of clusters, k.
  2. Generate k random points as cluster centroids.
  3. Assign each point to the nearest cluster centroid.
  4. Re-compute the new cluster centroid.
  5. Repeat steps 3-4 until some convergence criterion is satisfied.
    Typical convergence criterion is the assignment of customers to clusters has not changed in multiple iterations.
44
Q

Classification Analysis

A
  • Classification is the process that establishes classes with attributes from a set of instances (called training examples). The class of an instance must be one from a finite set of pre-determined class values, while attributes of the instance are descriptors of the instance that are likely to affect its outcome class value.
  • Classification techniques: ID3 and its descendants (such as C4.5), CN2, A
45
Q

Neural Network Architecture

A
  • Processing units are grouped into linear arrays (layers)
  • A neural network always has an input layer, an output layer, and may or may not have “hidden” layers.
  • Processing unit processes its inputs and produces a single output value; this processing is known as the unit’s activation function.
  • For an input node, the activation function simply passes its value to the output of the node.
  • For a non-input node, the activation function has two parts: a combination function and a transfer function.
46
Q

Confusion Matrix

A
A confusion matrix is a table that categorizes predictions according to whether they match the actual value.
• The class of interest is known as the positive.

Confusion matrix shows the number of correct and incorrect predictions made by the classification model compared to the actual outcomes (target value) in the data.

47
Q

True Positive, False Negative and Accuracy

A

True Positive, False Negative and Accuracy

• False Positive (FP): Incorrectly classified as the class of interest; a.k.a. error of omission, Type I error, or false alarm.
• False Positive (FP): Incorrectly classified as not the class of interest; a.k.a. error of commission, Type II error.
 • Accuracy: Overall correctness of the model
and is calculated as the sum of correct
classifications divided by the total number
of classifications.
48
Q

F-Measure or F1

A

• F-measure: The harmonic mean
of precision and recall.
• F-measure can be used as a single measure of performance.

49
Q

Database: Essential Characteristics

A
  • Self-describing collection of data: A database contains, in addition to the user’s data, a description of the overall database design and essential data structure and inter-relationships.
  • Related data: A logically coherent collection of data with some inherent meaning with respect to business operations, users’ needs or common data characteristics or properties.
  • Integrated data: A unification of several otherwise distinct data files, reducing or completely eliminating unplanned data redundancy among the source files.
  • Shared data: A database provides a central information repository which can be accessed by different application programs and users.
50
Q

Database and Enterprise Systems

A

An enterprise system is consisted of a set of integrated software modules and a central (enterprise) database that can be simultaneously shard by many different business processes and functional areas throughout the enterprise

51
Q

Database Approach: Costs and Risks

A
  • New and specialized staff; e.g., database administrator, database designers and application developers.
  • Considerable system installation and management costs and complexity.
  • Data and system conversion costs.
  • Need for routine backup and error recovery.
  • Potential organizational conflicts.
52
Q

Data Quality: Overview

A
  • Data quality refers to the totality of features and characteristics of data that bears on their ability to satisfy a given purpose; data quality is the sum of the degrees of excellence for factors related to data.
  • Data quality is the degree of excellence exhibited by the data in relation to the portrayal of the actual phenomena.
  • Data quality is not a rocket science: Data quality has been around for customer relationship management for many years; data quality is the baseline for effective customer relationship management.
  • Quality data are critical to data-driven business intelligence !!
53
Q

Data Quality: A TDQM Approach

A
  1. Define
  2. Measure
  3. Analyze
  4. Improve
54
Q

Data administration

A

Data administration defines the policies and procedures through which data can be managed as an organizational resource across all functional departments and business units.

55
Q

Data quality audit

A

Data quality audit is a structured survey of the accuracy and level of completeness of the data in an information system; data quality audits can be performed by surveying entire data files, surveying samples from data files, or surveying end users for their perceptions of data quality.

56
Q

Enterprise Resource Planning (ERP) System: A Functional View

A
  • Based on a database platform
  • Eases exchange of information and data among different corporate divisions
  • Unites major business practices within a single group of software modules
  • Modules run on client/server environment
  • Each module works separately, performing data processing functions
57
Q

Database versus Data Warehouse • Database

A

• Designed and optimized to ensure that every transaction gets recorded and stored immediately.
• Volatile because data are constantly being updated, added, or edited.
• Online transaction processing (OLTP) system because transactions
must be recorded and processed as they occur, that is, in real time.
• Data Warehouse
• Designed and optimized for analysis and quick response to queries.
• Are nonvolatile; when data are stored, they can be read only and rarely deleted so that they can be used for comparison with newer data.
• Online analytic processing (OLAP) system.
• Subject-oriented; i.e., data captured are organized to have similar data linked together.