Intro to Big Data Final Flashcards

(55 cards)

1
Q

What is business intelligence?

A

An umbrella term that combines architectures, tools, databases, analytical tools, applications and methodologies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the major objective of business intelligence?

A

To enable interactive, sometimes real time data to give business managers and analysts the ability to conduct appropriate analyses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the process of BI based upon?

A

Transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Who came up with the term Business Intelligence, and when?

A

Gartner Group in the mid-1990s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the four major components of a BI system?

A

A DW
Business analytics
BPM
User interface / dashboard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What legislation requires business leaders to document their business processes and sign off on their legitimacy?

A

Sarbanes-Oxley Act

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is BI not?

A

Transaction processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is OLTP?

A

Online transaction processing, a system that handles a company’s routine ongoing business. Store SCM & CRM data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is OLAP

A

Online analytical processing systems, use DW

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is BAM?

A

Business activity management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are shells?

A

Preprogrammed tools where all you have to do is insert your numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the definition of analytics?

A

The process of developing actionable decisions or recommendations for actions based on insights generated from historical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the three levels of Business Analytics?

A

Descriptive, Predictive, Prescriptive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are descriptive analytics?

A

Reporting analytics, knowing what is happening in the organization and understanding some underlying trends and causes of such occurrences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are predictive analytics?

A

They aim to determine what is likely to happen in the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are prescriptive analytics?

A

goal is to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible (aka decision or normative analytics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a data warehouse?

A

DW is a pool of data produced to support decision making; it is also a repository of current and historical data of potential interest to managers throughout the organization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is teradata?

A

symbolize the ability to manage terabytes (trillions of bytes) of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the characteristics of Data Warehousing?

A
Subject oriented (comprehensive view of org)
Integrated
Time variant (time series)
Nonvolatile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a data mart?

A

a smaller version of a DW that focuses on a particular subject or department

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a dependent data mart?

A

DM created directly from the Data Warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is an independent data mart?

A

Small warehouse designed for a strategic business unit or a department but its source is not an EDW

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is an Operational Data Store?

A

provides a fairly recent form of customer information file
updated throughout the course of business operations
used for short-term decisions

24
Q

What are oper marts?

A

created when operational data needs to be analyzed multidimensionally

25
What is EDW?
Enterprise data warehouse | large-scale data warehouse that is used across the enterprise for decision support
26
What is Metadata?
Data about data describe the structure of and some meaning about data usually either technical or business metadata
27
What are the text mining techniques?
1. Term frequency-Inverse document frequency 2. Named entity recognition 3. Topic modeling 4. Event extraction
28
What is TF-IDF?
Term frequency-Inverse document frequency looks at how frequently a word appears in a document and relative to the whole set of documents Used to build classifiers or predictive models
29
What is NER?
Named entity recognition | Recognizes nouns and could be used to extract persons, organizations, locations, dates, monetary amounts
30
What is topic modeling?
Identifies dominant themes in a vast array of documents
31
What is Latent Dirichlet Allocation?
words automatically clustered by mixture of topics in each document
32
What is probabilistic latent semantic indexing?
models co-occurring probability
33
What is event extraction?
A step further than NER and harder It looks at the relationship between nouns looks at kinds of inferences that can be made from incidents in the text
34
What is the text mining process?
1. Establish the Corpus: Collect & Organize the Domain Specific Unstructured Data 2. Create the Term-Document Matrix: Introduce the structure to the Corpus 3. Extract Knowledge: Discover Novel Patterns from the T-D Matrix
35
What is Web Usage Mining?
extraction of information from data generated through web page visits and transactions
36
What is the goal of sentiment analysis?
What do people feel about a certain topic?
37
What are the characteristics of Big Data?
``` Volume Variety Velocity Variability Veracity Value ```
38
What is Hadoop?
An open source framework for storing, analyzing massive amounts of distributed, unstructure data
39
What are the Big Data core technologies?
MapReduce + Hadoop
40
What is MapReduce?
A programming model that distributes processing of very large multi-structured data files across a large cluster of ordinary machines/processors. Developed and popularized by Google.
41
What are data mining characteristics?
1. Source of data for DM is often a consolidated data warehouse 2. DM environment is usually a client-server of a web-based information systems architecture 3. Data for DM includes only structured data 4. The miner is always an end user
42
What are the most common standard processes for data mining?
1. CRISP-DM | 2. SEMMA
43
What is CRISP-DM?
Cross-Industry Standard Process for Data Mining
44
What is SEMMA?
Sample, Explore, Modify, Model, and Assess
45
What are the phases for data mining?
1. Define the problem 2. Identify required data 3. Prepare and pre-process 4. Model the data 5. Train and test 6. Verify and deploy
46
What is simple split?
splitting the data into 2 mutually exclusive sets training (~70%) and testing (30%)
47
What are the data mining methods?
1. Classification 2. Regression 3. Cluster 4. Association Rule Mining
48
What are the types of Business Reporting?
1. Metric Management Reports 2. Visualizations/Dashboard 3. Balanced Scorecard
49
What does a balanced scorecard do?
Translates an organization's financial, customer, internal process into a set of actionable initiatives.
50
What is the definition of data?
A collection of facts usually obtained as a result of experiences, observations, or experiments
51
What are inferential statistics?
Drawing inferences about the population based on sample data
52
What is a histogram?
A frequency chart
53
What is Kurtosis?
detects the peak/tall/skinny nature of distribution
54
What is skewness?
Measure of asymmetry
55
What is regression?
a part of inferential statistics, used to characterize relationship between explanatory (input) and response (output) variable