Final Flashcards

(201 cards)

1
Q

There are basic chart types and specialized chart types. A Gantt chart is a specialized chart type.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

This measure of central tendency is the sum of all the values/observations divided by the number of observations in the data set.

A

arithmetic mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Subject oriented databases for data warehousing are organized by detailed subjects such as disk drives, computers, and networks.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Two-tier data warehouse/BI infrastructures offer organizations more flexibility but cost more than three-tier ones.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A(n) ________ architecture is used to build a scalable and maintainable infrastructure that includes a centralized data warehouse and several dependent data marts.

A

Hub-and-spoke

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes?

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Converting continuous valued numerical variables to rangers and categories is referred to as discretization

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data source reliability means that data are correct and are a good match the analytics problem

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The competitive imperatives for BI include all of the following except

A

Right user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In the 2000s, the DW-driven DSSs began to be called BI systems.

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are enterprise resources planning (ERP) systems related to supply chain management (SCM) systems?

A

Complementary systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

OLTP systems are designed to handle ad hoc analysis and complex queries that deal with many data items.

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Information dashboards enable ________ operations that allow the users to view underlying data sources and obtain more detail.

A

drill-down/drill-through

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In the opening case, police detectives used data mining to identify possible new areas of inquiry.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Clustering partitions a collection of things into segments whose members share

A

Similar Characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ratio data is a type of categorical data.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which of the following is a data mining myth?

A

Data mining requires a separate, dedicated database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

If using a mining analogy, “knowledge mining” would be a more appropriate term than “data mining.”

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The cost of data storage has plummeted recently, making data mining feasible for more firms.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

All of the following statements about data mining are true EXCEPT

A

the process aspect means that data mining should be a one-step process to results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Using data mining on data about imports and exports can help to detect tax avoidance and money laundering.

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

K-fold cross-validation is also called sliding estimation.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Big Data often involves a form of distributed storage and processing using Hadoop and MapReduce. One reason for this is

A

the processing power needed for the centralized model would overload a single computer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
In the Opening Vignette on Sports Analytics, what type of modeling was used to predict offensive tactics?
Heat Maps
26
What type of analytics seeks to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible?
Prescriptive
27
Demands for instant, on-demand access to dispersed information decrease as firms successfully integrate BI into their operations.
False
28
The use of dashboards and data visualizations is seldom effective in identifying issues in organizations, as demonstrated by the Silvaris Corporation Case Study.
false
29
Today, many vendors offer diversified tools, some of which are completely preprogrammed (called shells). How are these shells utilized?
All a user needs to do is insert the numbers.
30
The growth in hardware, software, and network capacities has had little impact on modern BI innovations.
False
31
Information systems that support such transactions as ATM withdrawals, bank deposits, and cash register scans at the grocery store represent transaction processing, a critical branch of BI.
False
32
If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining."
True
33
Because of performance and data quality issues, most experts agree that the federated architecture should supplement data warehouses, not replace them.
True
34
Ratio data is a type of categorical data
FALSE
35
The use of dashboards and data visualizations is seldom effective in identifying issues in organizations as demonstrated by the Silvarts corporation Case study
False
36
Markey basket
False
37
In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document.
True
38
________ is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network
Cohesion
39
What has caused the growth of the demand for instant, on-demand access to dispersed information?
the more pressing need to close the gap between the operational data and strategic objectives
40
The need for more versatile reporting than what was available in 1980s era ERP systems led to the development of what type of system?
executive information systems
41
What storage system and processing algorithm were developed by Google for Big Data?
* Google developed and released as an Apache project the Hadoop Distributed File System (HDFS) for storing large amounts of data in a distributed way. * Google developed and released as an Apache project the MapReduce algorithm for pushing computation to the data, instead of pushing data to a computing node.
42
Describe the role of the simple split in estimating the accuracy of classification models.
The simple split (or holdout or test sample estimation) partitions the data into two mutually exclusive subsets called a training set and a test set (or holdout set). It is common to designate two-thirds of the data as the training set and the remaining one-third as the test set. The training set is used by the inducer (model builder), and the built classifier is then tested on the test set. An exception to this rule occurs when the classifier is an artificial neural network. In this case, the data is partitioned into three mutually exclusive subsets: training, validation, and testing.
43
Data is the contextualization of information, that is, information set in context
False
44
This measure of dispersion is calculated by simply taking the square root of the variations.
standard deviation
45
Nominal data represent the labels of multiple classes used to divide a variable into specific groups.
False
46
In the Dallas Cowboys case study, the focus was on using data analytics to decide which players would play every week.
False
47
This plot is a graphical illustration of several descriptive statistics about a given data set
Box and whisker plot
48
Which type of visualization tool can be very helpful when the intention is to show relative proportions of dollars per department allocated by a university administration?
Pie chart
49
Which type of visualization tool can be very helpful when a data set contains location data?
Geographic map
50
The data storage component of a business reporting system builds the various reports and hosts them for, or disseminates them to users. It also provides notification, annotation, collaboration, and other services.
False
51
One way an operational data store differs from a data warehouse is the recency of their data.
True
52
Properly integrating data from various databases and other disparate sources is a trivial process.
False
53
What is Six Sigma?
a methodology aimed at reducing the number of defects in a business process
54
A Web client that connects to a Web server, which is in turn connected to a BI application server, is reflective of a
three-tier architecture
55
The data warehousing maturity model consists of six stages: prenatal, infant, child, teenager, adult, and sage.
True
56
When representing data in a data warehouse, using several dimension tables that are each connected only to a fact table means you are using which warehouse structure?
Star schema
57
User-initiated navigation of data through disaggregation is referred to as "drill up."
False
58
Data warehouses are subsets of data marts.
False
59
The BPM development cycle is essentially a one-shot process where the requirement is to get it right the first time.
False
60
Operational or transaction databases are product oriented, handling transactions that update the database. In contrast, data warehouses are
Subject-oriented and nonvolatile
61
What type of analytics seeks to determine what is likely to happen in the future?
Predictive
62
Online transaction processing (OLTP) systems handle a company's routine ongoing business. In contrast, a data warehouse is typically
a distinct system that provides storage for data that will be made use of in analysis.
63
In the Opening Vignette on Sports Analytics, what was adjusted to drive one-time ticket sales?
Ticket prices
64
Successful BI is a tool for the information systems department, but is not exposed to the larger organization.
False
65
Business intelligence (BI) is a specific term that describes architectures and tools only.
False
66
Managing information on operations, customers, internal procedures and employee interactions is the domain of cognitive science.
False
67
The user interface of a BI system is often referred to as a(n) ________.
Dashboard
68
As the number of potential BI applications increases, the need to justify and prioritize them arises. This is not an easy task due to the large number of ________ benefits.
Intangible
69
________ series forecasting is the use of mathematical modeling to predict future values of the variable of interest based on previously observed values.
Time
70
Dashboards present visual displays of important information that are consolidated and arranged on a single ________.
Screen
71
Descriptive statistics is all about describing the sample data on hand.
True
72
Which characteristic of data requires that the variables and data values be defined at the lowest (or as low as required) level of detail for the intended use of the data?
data granularity
73
In the FEMA case study, the BureauNet software was the primary reason behind the increased speed and relevance of the reports FEMA employees received.
True
74
Dashboards provide visual displays of important information that is consolidated and arranged across several screens to maintain data order.
False
75
Which characteristic of data means that all the required data elements are included in the data set?
Data richness
76
Data source reliability means that data are correct and are a good match for the analytics problem.
False
77
With the balanced scorecard approach, the entire focus is on measuring and managing specific financial goals based on the organization's strategy.
False
78
Moving the data into a data warehouse is usually the easiest part of its creation.
False
79
Data warehouse administrators (DWAs) do not need strong business insight since they only handle the technical aspect of the infrastructure.
False
80
Because the recession has raised interest in low-cost open source software, it is now set to replace traditional enterprise software.
False
81
The three main types of data warehouses are data marts, operational ________, and enterprise data warehouses.
Data stores
82
Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales.
False
83
In the Influence Health case study, what was the goal of the system?
increasing service use
84
Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from
analyzing the vast data amounts routinely collected.
85
Statistics and data mining both look for data sets that are as large as possible.
False
86
The data field "ethnic group" can be best described as
nominal data.
87
In the Target case study, why did Target send a teen maternity ads?
Target's analytic model suggested she was pregnant based on her buying habits.
88
One way to accomplish privacy and protection of individuals' rights when data mining is by ________ of the customer records prior to applying data mining applications, so that the records cannot be traced to an individual.
de-identification
89
Patterns have been manually ________ from data by humans for centuries, but the increasing volume of data in modern times has created a need for more automatic approaches.
Extracted
90
In the Influence Health case, the company was able to evaluate over ________ million records in only two days.
195
91
What are the most important assumptions in linear regression?
1. Linearity. This assumption states that the relationship between the response variable and the explanatory variables is linear. That is, the expected value of the response variable is a straight-line function of each explanatory variable, while holding all other explanatory variables fixed. Also, the slope of the line does not depend on the values of the other variables. It also implies that the effects of different explanatory variables on the expected value of the response variable are additive in nature. 2. Independence (of errors). This assumption states that the errors of the response variable are uncorrelated with each other. This independence of the errors is weaker than actual statistical independence, which is a stronger condition and is often not needed for linear regression analysis. 3. Normality (of errors). This assumption states that the errors of the response variable are normally distributed. That is, they are supposed to be totally random and should not represent any nonrandom patterns. 4. Constant variance (of errors). This assumption, also called homoscedasticity, states that the response variables have the same variance in their error, regardless of the values of the explanatory variables. In practice this assumption is invalid if the response variable varies over a wide enough range/scale. 5. Multicollinearity. This assumption states that the explanatory variables are not correlated (i.e., do not replicate the same but provide a different perspective of the information needed for the model). Multicollinearity can be triggered by having two or more perfectly correlated explanatory variables presented to the model (e.g., if the same explanatory variable is mistakenly included in the model twice, one with a slight transformation of the same variable). A correlation-based data assessment usually catches this error.
92
With ________, all the data from every corner of the enterprise is collected and integrated into a consistent schema so that every part of the organization has access to the single version of the truth when and where needed.
Enterprise Resource Planning (ERP)
93
Briefly describe five techniques (or algorithms) that are used for classification modeling.
* Decision tree analysis. Decision tree analysis (a machine-learning technique) is arguably the most popular classification technique in the data mining arena. * Statistical analysis. Statistical techniques were the primary classification algorithm for many years until the emergence of machine-learning techniques. Statistical classification techniques include logistic regression and discriminant analysis. * Neural networks. These are among the most popular machine-learning techniques that can be used for classification-type problems. * Case-based reasoning. This approach uses historical cases to recognize commonalities in order to assign a new case into the most probable category. * Bayesian classifiers. This approach uses probability theory to build classification models based on the past occurrences that are capable of placing a new instance into a most probable class (or category). * Genetic algorithms. This approach uses the analogy of natural evolution to build directed-search-based mechanisms to classify data samples. * Rough sets. This method takes into account the partial membership of class labels to predefined categories in building models (collection of rules) for classification problems.
94
Six Sigma rests on a simple performance improvement model known as DMAIC. What are the steps involved?
Define. Define the goals, objectives, and boundaries of the improvement activity. At the top level, the goals are the strategic objectives of the company. At lower levels—department or project levels—the goals are focused on specific operational processes. 2. Measure. Measure the existing system. Establish quantitative measures that will yield statistically valid data. The data can be used to monitor progress toward the goals defined in the previous step. 3. Analyze. Analyze the system to identify ways to eliminate the gap between the current performance of the system or process and the desired goal. 4. Improve. Initiate actions to eliminate the gap by finding ways to do things better, cheaper, or faster. Use project management and other planning tools to implement the new approach. 5. Control. Institutionalize the improved system by modifying compensation and incentive systems, policies, procedures, manufacturing resource planning, budgets, operation instructions, or other management systems.
95
Many business users in the 1980s referred to their mainframes as "the black hole," because all the information went into it, but little ever came back and ad hoc real-time querying was virtually impossible.
True
96
Computerized support is only used for organizational decisions that are responses to external pressures, not for taking advantage of opportunities.
False
97
Data generation is a precursor, and is not included in the analytics ecosystem.
False
98
In what decade did disjointed information systems begin to be integrated?
1980s
99
Major commercial business intelligence (BI) products and services were well established in the early 1970s.
False
100
BI represents a bold new paradigm in which the company's business strategy must be aligned to its business intelligence analysis initiatives.
False
101
Kaplan and Norton developed a report that presents an integrated view of success in the organization called
balanced scorecard-type reports.
102
Interval data are variables that can be measured on interval scales.
True
103
Predictive algorithms generally require a flat file with a target variable, so making data analytics ready for prediction means that data sets must be transformed into a flat-file format and made ready for ingestion into those predictive algorithms.
True
104
Data accessibility means that the data are easily and readily obtainable
True
105
This measure of central tendency is the sum of all the values/observations divided by the number of observations in the data set
arithmetic mean
106
Structured data is what data mining algorithms use and can be classified as categorical or numeric.
True
107
Key performance indicators (KPIs) are metrics typically used to measure
Internal results
108
Visual analytics is aimed at answering, "What is it happening?" and is usually associated with business analytics.
False
109
Oper marts are created when operational data needs to be analyzed
multidimensionally.
110
With the balanced scorecard approach, the entire focus is on measuring and managing specific financial goals based on the organization's strategy.
False
111
Which of the following BEST enables a data warehouse to handle complex queries and scale up to handle many more requests?
parallel processing
112
When querying a dimensional database, a user went from summarized data to its underlying details. The function that served this purpose is
Drill down
113
_______ is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases.
Enterprise information integration (EII)
114
Which data warehouse architecture uses a normalized relational warehouse that feeds multiple data marts?
hub-and-spoke data warehouse architecture
115
All of the following are benefits of hosted data warehouses EXCEPT
greater control of data.
116
Why is a performance management system superior to a performance measurement system?
because measurement alone has little use without action
117
In the Influence Health case study, what was the goal of the system?
increasing service use
118
Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings?
CRISP-DM
119
In estimating the accuracy of data mining (or other) classification models, the true positive rate is
the ratio of correctly classified positives divided by the total positive count.
120
What is the main reason parallel processing is sometimes used for data mining?
because of the massive data amounts and search efforts involved
121
Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications?
Insurance
122
is an evolving tool space that promises real-time integration from a variety of sources, such as relational databases. Web services, and multidimensional databases.
Enterprise information integration (EII)
123
Which Datawarehouse architecture uses a normalized relational warehouse that feeds multiple data marts
hub-and-spoke data warehouse architecture
124
Data warehouse s provide an indirect benefits to organizations. Which of the following is an indirect benefit of data warehouses?
improved customer service
125
All of the following are true about in-database processing technology except
The potentially useful aspect means that the results should lead to some business benefit
126
The Data warehousing maturity model consists of six stages: prenatal, infant, child, teenager, adult, and sage
TRUE
127
List 4 possible analytics applications in the retail value chain
Inventory, Price Elasticity, Shopper Insight, Store Layout
128
In the dell case study, the largest issue was how to properly spend the online marketing budget
FALSE
129
The entire focus of the predictive analytics system in the infinity P &C case was on detecting and handing fraudulent claims for the company's benefit
FALSE
130
Using data mining on data about imports and exports can help to detect tax avoidance and money laundering
TRUE
131
Understanding customers better has helped amazon and other become more successful. The understading comes primarily from
analyzing the vast data amounts routinely collected.
132
Which of the following is a data mining myth
Data mining requires a separate, dedicated database.
133
Nominal data represent the labels of multiple classes used to divide a variable into specific groups
False
134
Which type of question does visual analytics seek to answer
Why did it happen?
135
To respond to its market challenges, Serius XM decidsed to docus on manufacturing efficiency
False
136
Data is the main ingredient for any BI data science, and business analytics initiative
False
137
Google maps has set new standards for data visualization with its intuituve web mapping software
TRUE
138
Dashboards provide visual displays of important information that is consolidated and arranged across several screens to maintain data order
False
139
Traditional BI systems use a large volume of statistic data that has been extracted cleaned and loaded into a data warehouse to produce reports and analyze.
TRUE
140
Big data often involves a form of distribution storage and processing using Handoop and MapReduce. One reason for this is
the processing power needed for the centralized model would overload a single computer.
141
Which is of the following is NOT an example of transaction processing
Sales report
142
Data generation is a precursor, and is not included in the analytics ecosystem
FALSE
143
What type of analytics seeks to determine what is likely to happen in the future.
Predictive
144
if using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining."
TRUE
145
Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales
FALSE
146
In data mining, classification models help in prediction.
TRUE
147
Structured data is what data mining algorithms use and can be classified as categorical or numeric
TRUE
148
Which of the following is LEAST related to data/information visualization?
Statistical graphics
149
Visualization differs from traditional charts and graphs in complexity of data sets and use of multiple dimensions and measures.
TRUE
150
Dashboards can be presented at all the following levels EXCEPT
The visual cube level
151
Descriptive statistics is about describing the sample data on hand
TRUE
152
Business applications have moved from transaction processing and monitoring to other activities. Which of the following is NOT one of those activities?
Data monitoring
153
Managing data warehouses requires special methods, including parallel computing and/or Hadoop/Spark
TRUE
154
The very design that makes an OLTP system efficient for transaction processing makes it inefficient for
end-user ad hoc reports, queries, and analysis.
155
Real-time data warehousing can be used to support the highest level of decision making sophistication and power. The major feature that enables this in relation to handling the data is
speed of data transfer.
156
Data warehousing administrators(DWA) do not need strong business insight since they only handle the technical aspect of the infrastructure
FALSE
157
Data warehouses are subsets of data marts
FALSE
158
Subject oriented databases for data warehousing are organized by detailed subjects such as disk drives, computers, and networks
FALSE
159
Organizations seldom devote a lot of effort to creating metadata because it is not important for the effective use of data warehouses.
FALSE
160
Which approach to data warehouse integration focuses more on sharing process functionality than data across systems?
Enterprise application integration
161
Which kind of data warehouse is created separately from the enterprise data warehouse by a department and not reliant on it for updates?
Independent data mart
162
A large storage location that can hold vast quantities of data (mostly unstructured) in its native/raw format for future/potential analytics consumption is referred to as a(n)
Data Lake
163
The "islands of data" problem in the 1980s describes the phenomenon of unconnected data being stored in numerous locations within an organization.
True
164
Which of the following developments is NOT contributing to facilitating growth of decision support and analytics?
Locally concentrated workforces
165
During classification in data mining, a false positive is an instance classified as true by the model while being false in reality.
TRUE
166
In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as
association rule mining
167
All of the following statements about data mining are true EXCEPT:
The ideas behind it are relatively new
168
Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by
removing identifiers such as names and social security numbers.
169
Which data mining process
CRISP
170
Contextual metadata for a dashboard includes all the following EXCEPT
which operating system is running the dashboard server software.
171
What is the management feature of a dashboard?
Operational data that is identify what actions to take to resolve a problem
172
Benefits of the latest visual analytics tools, such as SAS Visual Analytics, include all of the following EXCEPT
they explore massive amounts of data in hours, not days.
173
When you tell a story in a presentation, all of the following are true EXCEPT
a well-told story should have no need for subsequent discussed
174
Relational databases began to be used in the:
1980s
175
Decision support system (DSS) and management information system (MIS) have precise definitions agreed to by practitioners.
FALSE
176
Computer applications have moved from transaction processing
FALSE
177
Describe and define Big Data. Why is a search engine a Big Data application?
Data that cannot be stored in a single storage unit. It refers to data that arrives in multiple forms (structured or unstructured, or in a stream) A search engine is a big data application because it requires the user to search up a certain topic / question and in return the web searches and delivers billions of web pages relevant to the users search in a fraction of a second
178
There are several basic information system architectures that can be used for data warehousing. What are they?
Some IS architectures that can be used for data warehousing are one, two, and three-tier architectures
179
List 5 reasons for the growing popularity of data mining in the business world
Recognize fraud Identifies rick factors Can improve customer relationships Advances in both computer hardware and software More accessible and affordable
180
List the five most common functions of a business report
To ensure that all departments are functioning properly To provide information To provide the results of an analysis To persuade others to act To create an organizational memory (as part of a knowledge management system)
181
More data, coming in faster and requiring immediate conversion into decisions, means that organizations are confronting the need for RDW. What is RDW?
also known as active data warehousing (ADW), is the process of loading and providing data via the data warehouse as they become available.
182
Which of the following is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies?
BI
183
Describe the difference between descriptive and inferential statistics
Descriptive statistics describe sets of data. Inferential statistics draws conclusions about the sets of data based on sampling
184
A common way of introducing data wharehousing is to refer to its fundamental characteristics. Describe three characteristics of data wharehousing
Subject oriented: Data is organized by detailed subject, such as the sales, products , or customers, containing data relevant for decision support. Integrated: Must place data from different sources into consistent format . To do so they have to deal with various conflicts. Nonvolatile: After the data is entered into the data warehouse, users cannot change the data or update it. Changes are recorded as new data.
185
In lessons learned from the Target case. What leagal warning would you give another reseller using data mining for marketing
If you look at the case you can see that Target didn't violate any law. Target didn't use any information that violates customer privacy. They only used transactional data that every other retail store obtains and stores. In terms of legal matters they didn't do anything wrong.
186
In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe for customers.
True
187
Search engine optimization (SEO) is a means by which
Web site developers can increase Web site search rankings
188
In the Wimbledon case study, designers balanced the needs of mobile and desktop computer users.
True
189
What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation?
small- to medium-sized documents
190
Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters.
False
191
Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining.
False
192
In the car insurance case study, text mining was used to identify auto features that caused injuries
False
193
________ is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close.
Propinquity
194
________ Web analytics refers to measurement and analysis of data relating to your company that takes place outside your Web site.
Off-site
195
Categorization and clustering of documents during text mining differ only in the preselection of categories.
True
196
Web-based media has nearly identical cost and scale structures as traditional media.
false
197
Web site usability may be rated poor if
Web site visitors download few of your offered PDFs and videos.
198
Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations.
False
199
In text mining, tokenizing is the process of
categorizing a block of text in a sentence
200
Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences.
True
201
IBM's Watson utilizes a massively parallel, text mining-focused, probabilistic evidence-based computational architecture called
DeepQA