Wk3:Chap3 - Data management, big data analytics, and records management Flashcards

1
Q

Databases?

A
  • Collections of data sets or records stored in a systematic way.
    Stores data generated by business apps, sensors, operations, & transaction-processing systems (TPS).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Warehouses

A

Integrate data from multiple databases and data silos, and organize them for complex analysis, knowledge discovery, and to support decision making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Marts

A
  • Small-scale data warehouses that support a single function or one department.
  • Enterprises that cannot afford to invest in data warehousing may start with one or more data marts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Business intelligence (BI)

A

Tools and techniques that process data and conduct statistical analysis for insight and discovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Database Management System (DBMS)

A
  • Integrate with data collection systems such as TPS and business applications.
  • Stores data in an organized way.
  • Provides facilities for accessing and managing data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Relational Management System (DBMS)

A

Provides access to data using a declarative language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Declarative Language

A
  • Simplifies data access by requiring that users only specify what data they want to access without defining how they will be achieved.
  • Structured Query Language (SQL) is an example of a declarative language:
    SELECT column_name(s)
    FROM table_name
    WHERE condition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DBMS Functions

A
  • Data filtering and profiling: Check for errors/ Inconsistencies and redundancies
  • Data integrity and maintenance: Consistency
  • Data synchronization: Integration
  • Data security: Data Integrity over time
  • Data access: Authorisation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Latency

A

The delay or time elapsed between when data is created and when it is available for reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Online Transaction Processing (OLTP)

A
  • DBMSs record and process transactions and supports queries
  • Designed to manage transaction data, which are volatile & break down complex information into simpler data tables to strike a balance between transaction-processing efficiency and query efficiency.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Online Analytics Processing (OLAP)

A
  • A means of organizing large business databases.

- Divided into one or more cubes that fit the way business is conducted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Dirty Data

A
  • Lacks integrity/validation and reduces user trust.
  • Incomplete, out of context, outdated, inaccurate, inaccessible, or overwhelming.
  • Need for integrity checks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data Life Cycle: Model illustrating how data travels throughout an organisation

A
  1. Principle of Diminishing Data Value
    - The value of data diminishes as they age.
    - Blind spots (lack of data availability) of 30 days or longer inhibit peak performance.
    - Global financial services institutions rely on near-real-time data for peak performance.
  2. Principle of 90/90 Data Use
    - As high as 90 percent, is seldom accessed after 90 days (except for auditing purposes).
    - Roughly 90 percent of data lose most of their value after 3 months.
  3. Principle of data in context
    - The capability to capture, process, format, and distribute data in near real time or faster requires a huge investment in data architecture.
    - The investment can be justified on the principle that data must be integrated, processed, analyzed, and formatted into “actionable information.”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Master Reference File and Data Entities:

A

As data volumes explode database performance degrades.
Solution = Master Data and Master Data Management (see chapter 2)
MDM processes integrate data from a variety of sources to create a more complete view of an entity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Market share

A

Percentage of total sales in a market captured by a brand, product, or company.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Operating Margin

A
  • A measure of the percent of a company’s revenue left over after paying variable costs: wages, raw materials, etc.
  • Increased margins mean earning more per dollar of sales.
  • The higher the operating margin, the better.
17
Q

Enterprise data warehouses (EDW)

A
  • Data warehouses that pull together data from disparate sources and databases across an entire enterprise.
  • Warehouses are the primary source of cleansed data for analysis, reporting, and Business Intelligence (BI).
  • Their high costs can be subsidized by using Data marts.
18
Q

3 Procedures to Prepare EDW Data for Analytics

A
  • Extract from designated databases.
  • Transform by standardizing formats, cleaning the data, integration.
  • Loading into a data warehouse.
19
Q

CDC Change Data Capture

A

minimises the resources required by ETL by focusing primarily on data changes.

20
Q

Active Data Warehouse (ADW)

A
  • Real-time data warehousing and analytics.

- Transform by standardizing formats, cleaning the data, integration.

21
Q

Hadoop

A

Is an Apache processing platform that places no conditions on the processed data structure.
Distributes computing problems over a number of servers

22
Q

MapReduce

A

Provides a reliable, fault-tolerant software framework to write applications easily that process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware.

23
Q

Map stage

A

Breaks up huge data into subsets then distributes them across several servers for processing.

24
Q

Reduce stage

A

Recombines partial results and makes them available to analytical tools.

25
Q

Data Mining

A

software that enables users to analyze data from various dimensions or angles, categorize them, and find correlative patterns among fields in the data warehouse.

26
Q

Text Mining

A

broad category involving interpreted words and concepts in context (How could we track what it is said about my Co.)

27
Q

Sentimental Analysis

A

trying to understand consumer intent

28
Q

Text Analytics (Mining) Procedure

A

Exploration

  • Simple word counts
  • Topics consolidation

Preprocessing

  • Standardization
  • May be 80% of processing time
  • Grammar and spell checking

Categorizing and Modelling
- Create business rules and train models for accuracy and precision