Technical Architecture Flashcards

1
Q

Technical Architecture

A

describes the technologies in the various layers of the architecture

hence, probably better called technology architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 main layers of technical architecture

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how has the technical architecture become more complicated

A
  • Data variety, volume and velocity have increased significantly
  • increased need to include external data (customers, suppliers, etc.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Technologies in Data Integration Layer

A

Sophisticated data integration tool suites are used

  • Real-time or near real-time updates

A few years back, tooling started with ETL (Extract, Transform, Load)

  • Take data from SORs, transform it, load it into DW
  • Initially done daily, overnight, in batch mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ETL architecture

A

Uses Data Integration Server in Transform step

  1. ETL extracts data from source systems
  2. Transform it into BI schema using Data Integration processes and using a Data Integration Server
  3. Load it into Data Warehouse

(There are many ETL vendors about; beware simplistic promises from ETL vendors; real life is more complicated.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ELT architecture (not ETL)

A

Runs integration services on source or target

  1. Extract Data from Source Systems
  2. Load it into Data Warehouse
  3. Afterloading, transform it into BI schema using Data Integration processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ELT vs. ETL

A

Trade-offs in capability, performance vs cost, complexity

ETL

  • needs dedicated Data Integration (DI) server

ELT

  • Lower Total Cost of Ownership: No need for dedicated DI server
    • Uses integration services of databases at source or target
  • Less powerful DI capability
  • Performance penalties –using CPU of source or target
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data sources layer

A
  • internal - from front office and back office
  • external - customers, partners, FB etc.
  • structured, unstructured, semi-structured
  • volume, velocity, variety
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

another overview of technical architecture

A

Legend:

Information Access and Data Integration:

  • tools used to query, gather, integrate, cleanse and transform data into information

Data Warehousing

  • the “classic” DW + other databases where data has been transformed for analytics or integrated

Note Master Data Management in here

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Business Intelligence and Analytics

A

Online and Mobile reports

  • BI applications originally built by the IT Department, who produced reports for the business users
  • Eventually these reports were put online and onto mobile.

Dashboards and Ad-hoc Analysis

  • giving business people the tools to write and run their own queries

OLAP (Online Analytical Processing)

  • tools enabling users to analyze multidimensional data interactively from multiple perspectives
  • OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing;

Excel

  • integration of spreadsheets to BI

Emerging Tools

  • Predictive Analytics, Data Discovery, Data Visualization, in-Memory Analytics, BI appliances, Big Data Analytics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe

  • BI targets
  • data access APIs
  • integration services
  • integration applications
A

Targets

BI is no longer targeted at just business people; business processes and applications are increasingly important

Data Access Application Programming Interfaces (APIs)

often used in information access and data integration

Integration Services

for when BI applications may need to integrate and transform data to complete a business analysis

Integration Applications

the domain of the application developer, who may need to deploy one or more of the integration applications above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Technology architecture - Databases

A
  • much unstructured data from sources such as emails, social media, medical records and legal documents
  • Internet of Things, with networked devices monitoring, measuring and transmitting data about all sorts of things adds humongous amounts of data
  • the choice of data storage systems is much more complicated; factors include type of analytics, data capture, data integration and data storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Alternative technologies in the data layer

A

RDBMS still predominates, but there are alternates used in particular parts of the architecture

  • OLAP databases
  • Massively Parallel Processing (MPP) databases
  • Data Virtualization
  • In-database analytics
  • In-memory analytics (e.g. SAP HANA)
  • Cloud-based BI, DW, or data integration
  • BI appliances
  • NoSQLdatabases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

MPP Databases + how we got there

A
  • CPU = Central Processing Unit
  • PU = Processing Unit
  • Core = the instruction execution components of the CPU
  • I/O = input/outpout

Initially a mainframe with one CPU, connected through an I/O subsystem to disks – a uniprocessor

Later added second CPU, sharing the operating system (which was modified to run across more than one CPU) – a multiprocessor.

  • The CPUs tended to be identical, so it was called a Symmetrical Multiprocessor.

Nowadays, computers have more than two PUs

In a cluster, there is a shared database and the servers in the clusters work together. They use some sort of heartbeat mechanism to know if the other component has failed. If so, the surviving server may take over the workload of the disappeared server.

Massively Parallel Processing

  • each server operates independently
  • connected by a network
  • software splits processing and coordinates communication across servers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data Virtualisation

A

aka. Enterprise Information Integration

Where an application can retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located.

Data remains in place

  • unlike the traditional “ETL” process
  • real-time access is given to the source system for the data, thus reducing data errors risks and less workload of moving data around that may never be used

Abstraction techniques used

  • To resolve differences in source and consumer formats and semantics, various abstraction and transformation techniques are used. This concept and software is a subset of data integration.
  • Unlike a federated database system, it does not attempt to impose a single data model on the data (heterogeneous data).
  • The technology also supports the writing of transaction data updates back to the source systems.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In-database analytics

A
  • Database vendors adding BI and analytics
  • Do analytics directly on the database
    • compute intensive analytical processing moved directly into a DW based on top of an analytical database
  • Reduces setup and data retrieval times
    • Faster analytics performance
17
Q

In-memory analytics

A
  • Enabled by 64-bit architectures
    • Allow for 16 exabytes of addressable memory
  • Hold most or all of database in memory
    • Rather than on slower disk
    • Balance of cost vsspeed
  • SAP HANA is a fully in-memory solution
18
Q

Cloud-based BI / DW / Data Integration

A

▪ Cloud vendors provide, manage shared resources

▪ On-demand, fast provisioning and de-provisioning

▪ Apparently unlimited resources available as needed

▪ Flexible pricing; good security

19
Q

BI Appliances: Data Warehouse Appliances

A
  • Designed for high performance big data analytics
  • Delivered as an easy-to-use packaged solution
  • Hardware and Software
    • integrated set of servers, storage, OS,andDBMS
    • Example: IBM Netezza
20
Q

Why use NoSQLDatabases

A
  • Massive sizes of data
  • Ease of programming
    • Map-Reduce, Spark etc.

NoSQL (not only SQL) – distributed databases, with “eventual consistency” and a different programming model

21
Q

NoSQL database categories

A

Four Categories

Why do we have this relatively new class of databases?

  1. A NoSQLdata model may make application programming easier
  2. To handle very large amounts of data; Many NoSQLdatabases are a better fit for Big Data

Most of businesses’ valuable data is stored in Relational Databases

22
Q

Product Architecture

A

defines

  1. the products
  2. their configurations
  3. how they implement the technology requirements of the BI architecture
  • Decide what the business wants
    • Power users – go deep into analytics
    • Managers, executives – need a higher level view to support management decisions
    • Operational users – information and analytics to support day-to-day operations
  • Offer a portfolio of analytical styles
    • also offer data, technology architectures lead to product selection
  • Build the product portfolio iteratively
    • Add or change based on changing needs
    • Will return to this in a later lecture
23
Q

How do we define requirements and priorities in BI architecture?

How do we create and implement stuff in BI architecture?

A

define requirements and priorities top-down

create and implement bottom-up