Chapter 7 Flashcards by Michael Elkerton

authoritative pages

Web pages that are identified as particularly popular based on links by other Web pages and directories.

How well did you know this?

Not at all

Perfectly

clickstream analysis

The analysis of data that occur in the Web environment.

How well did you know this?

Not at all

Perfectly

clustering

Partitioning a database into segments in which the members of a segment share similar qualities.

How well did you know this?

Not at all

Perfectly

corpus

In linguistics, a large and structured set of texts (now usually stored and processed electronically) prepared for the purpose of conducting knowledge discovery.

How well did you know this?

Not at all

Perfectly

deception detection

A way of identifying deception (intentionally propagating beliefs that are not true) in voice, text, and/or body language of humans.

How well did you know this?

Not at all

Perfectly

hubs

One or more Web pages that provide a collection of links to authoritative pages.

How well did you know this?

Not at all

Perfectly

hyperlink-induced topic search

(HTS) The most popular publicly known and referenced algorithm in Web mining used to discover hubs and authorities.

How well did you know this?

Not at all

Perfectly

polarity identification

Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities or to locate its position on the continuum between these two polarities.

Word / Term level.
1) use a lexicon as a reference library.
2) use a collection of training documents.

How well did you know this?

Not at all

Perfectly

polyseme

Words also called homonyms, they are syntactically identical words (i.e., spelled exactly the same) with different meanings (e.g., bow can mean “to bend forward,”
“the front of the ship,” “the weapon that shoots arrows,” or “a
kind of tied ribbon”).

How well did you know this?

Not at all

Perfectly

search engine

A program that finds and lists Web sites or pages (designated by URLs) that match some user-selected criteria.

How well did you know this?

Not at all

Perfectly

sentiment analysis

The technique used to detect favorable and unfavorable opinions toward specific products and services using a large number of textual data sources (customer feedback in the form of Web postings).

How well did you know this?

Not at all

Perfectly

SentiWordNet

An extension of WordNet used for sentiment identification.

How well did you know this?

Not at all

Perfectly

singular value decomposition

Closely related to principal components analysis, reduces the overall dimensionality of the input matrix (number of input documents by number of extracted terms) to a lower dimensional space, where each consecutive dimension represents the largest degree of variability (between words and documents).

How well did you know this?

Not at all

Perfectly

social media analytics

The systematic and scientific way to consume the vast amount of content created by Web-based social media outlets, tools, and techniques for the betterment of an organization’s competitiveness.

How well did you know this?

Not at all

Perfectly

social network analysis

(SNA) The mapping and measuring of relationships and information flows among people, groups, organizations, computers, and other information - or knowledge-processing entities. The nodes in the network are the people and groups, whereas the links show relationships or flows between the nodes.

How well did you know this?

Not at all

Perfectly

spider

Study These Flashcards

An application used to read through the content of a Web site automatically (Web Crawler).

stemming

Study These Flashcards

A process of reducing words to their respective root forms in order to better represent them in a text mining project.

stop words

Study These Flashcards

Words that are filtered out prior to or after processing of natural language data (i.e., text).

term-document matrix

Study These Flashcards

A frequency matrix created from digitized and organized documents (the corpus) where the columns represent the terms while rows represent the individual documents.

text mining

Study These Flashcards

The application of data mining to nonstructured or less structured text files. It entails the generation of meaningful numeric indices from the unstructured text and then processing those indices using various data mining algorithms.

tokenizing

Study These Flashcards

Categorizing a block of text (token) according to the function it performs

trend anaylsis

Study These Flashcards

The collecting of information and attempting to spot a pattern, or trend, in the information.

voice of the customer (VOC)

Study These Flashcards

Applications that focus on “who and how” questions by gathering and reporting direct feedback from site visitors, by benchmarking against other sites and offline channels, and by supporting predictive modeling of future visitor behavior.

Web analytics

Study These Flashcards

The application of business analytics activities to Web-based processes, including e-commerce.

Web content mining

The extraction of useful information from Web pages

Web crawler

An application used to read through the content of a Web site automatically (Spider).

Web mining

The discovery and analysis of interesting and useful information from the Web, about the Web, and usually through Web-based tools.

Web structure mining

The development of useful information from the links included in Web documents.

Web usage mining

The extraction of useful information from the data being generated through Web page visits, transactions, and so on

WordNet

A popular general-purpose lexicon created at Princeton University.

Chapter 7 Flashcards

(30 cards)