Chapter 7 Flashcards

1
Q

authoritative pages

A

Web pages that are identified as particularly popular based on links by other Web pages and directories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

clickstream analysis

A

The analysis of data that occur in the Web environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

clustering

A

Partitioning a database into segments in which the members of a segment share similar qualities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

corpus

A

In linguistics, a large and structured set of texts (now usually stored and processed electronically) prepared for the purpose of conducting knowledge discovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

deception detection

A

A way of identifying deception (intentionally propagating beliefs that are not true) in voice, text, and/or body language of humans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

hubs

A

One or more Web pages that provide a collection of links to authoritative pages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

hyperlink-induced topic search

A

(HTS) The most popular publicly known and referenced algorithm in Web mining used to discover hubs and authorities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

polarity identification

A

Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities or to locate its position on the continuum between these two polarities.

Word / Term level.
1) use a lexicon as a reference library.
2) use a collection of training documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

polyseme

A

Words also called homonyms, they are syntactically identical words (i.e., spelled exactly the same) with different meanings (e.g., bow can mean “to bend forward,”
“the front of the ship,” “the weapon that shoots arrows,” or “a
kind of tied ribbon”).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

search engine

A

A program that finds and lists Web sites or pages (designated by URLs) that match some user-selected criteria.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

sentiment analysis

A

The technique used to detect favorable and unfavorable opinions toward specific products and services using a large number of textual data sources (customer feedback in the form of Web postings).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SentiWordNet

A

An extension of WordNet used for sentiment identification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

singular value decomposition

A

Closely related to principal components analysis, reduces the overall dimensionality of the input matrix (number of input documents by number of extracted terms) to a lower dimensional space, where each consecutive dimension represents the largest degree of variability (between words and documents).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

social media analytics

A

The systematic and scientific way to consume the vast amount of content created by Web-based social media outlets, tools, and techniques for the betterment of an organization’s competitiveness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

social network analysis

A

(SNA) The mapping and measuring of relationships and information flows among people, groups, organizations, computers, and other information - or knowledge-processing entities. The nodes in the network are the people and groups, whereas the links show relationships or flows between the nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

spider

A

An application used to read through the content of a Web site automatically (Web Crawler).

17
Q

stemming

A

A process of reducing words to their respective root forms in order to better represent them in a text mining project.

18
Q

stop words

A

Words that are filtered out prior to or after processing of natural language data (i.e., text).

19
Q

term-document matrix

A

A frequency matrix created from digitized and organized documents (the corpus) where the columns represent the terms while rows represent the individual documents.

20
Q

text mining

A

The application of data mining to nonstructured or less structured text files. It entails the generation of meaningful numeric indices from the unstructured text and then processing those indices using various data mining algorithms.

21
Q

tokenizing

A

Categorizing a block of text (token) according to the function it performs

22
Q

trend anaylsis

A

The collecting of information and attempting to spot a pattern, or trend, in the information.

23
Q

voice of the customer (VOC)

A

Applications that focus on “who and how” questions by gathering and reporting direct feedback from site visitors, by benchmarking against other sites and offline channels, and by supporting predictive modeling of future visitor behavior.

24
Q

Web analytics

A

The application of business analytics activities to Web-based processes, including e-commerce.

25
Q

Web content mining

A

The extraction of useful information from Web pages

26
Q

Web crawler

A

An application used to read through the content of a Web site automatically (Spider).

27
Q

Web mining

A

The discovery and analysis of interesting and useful information from the Web, about the Web, and usually through Web-based tools.

28
Q

Web structure mining

A

The development of useful information from the links included in Web documents.

29
Q

Web usage mining

A

The extraction of useful information from the data being generated through Web page visits, transactions, and so on

30
Q

WordNet

A

A popular general-purpose lexicon created at Princeton University.