L5 - Information-oriented System Integration Flashcards
What is the goal of the lecture?
To explain an overview and requirements of information-oriented system integration and the main existing solutions.
What are the three existing IoSI solutions for enterprise system integration?
Data Warehousing
Data Federation
Data Replication
What is metadata?
Metadata is data about data (information about the actual data).
What is a schema in databases?
A schema is a blueprint that defines the structure of database components such as tables, attributes, and data types.
What is a schema in XML?
A schema is a set of rules defining allowed elements, attributes, structures, and data types in an XML document.
What are data types in databases?
Data types define the type of data that can be stored in a column, such as char, varchar, int, tinyint, float, decimal, and money.
What is latency?
Latency is the time delay when data is sent from one source to a destination.
What is Information-oriented System Integration (IoSI)?
IoSI is an integration approach based on the exchange of simple data between systems, usually between databases.
Why is IoSI needed?
Enterprises process large amounts of data from various sources, which may use different tools, technologies, and formats, making integration necessary.
What are the advantages of IoSI?
Easy to understand and develop
Standardized API support, such as ODBC
What are the disadvantages of IoSI?
Deals only with data sharing
Does not handle business logic, states, or behaviors
Not sufficient for functional or behavior integration
What is the first requirement for IoSI?
Represent data in a canonical format, such as XML, to ensure interoperability.
What is the second requirement for IoSI?
Maintain exchanged data using metadata to enable efficient interoperability.
What is the third requirement for IoSI?
Facilitate data integration in real-time.
What are the three data latency levels in IoSI?
Real-time integration
Near real-time integration
Batch processing (non-real-time)
What is a schema conflict in data integration?
A schema conflict occurs when tables containing the same concept are structured differently, e.g., one table includes an “email” attribute while another does not.
What is a naming conflict in data integration?
A naming conflict occurs when different terms are used for the same metadata, such as “telephone” vs. “phone_number” vs. “TeleF”.
What is a data representation conflict in data integration?
A data representation conflict occurs when different formats are used for the same data, such as “Donald Trump” vs. “D. Trump”.
What is a precision conflict in data integration?
A precision conflict occurs when different data types are used for the same data, such as Double vs. Money vs. Float vs. Int.
What is data warehousing?
A batch information integration technique that extracts, transforms, and loads data from multiple databases into a single repository.
What is data federation?
A real-time information integration technique that allows querying disparate databases through a unified schema without copying data.
What is data replication?
A near-real-time integration technique where data is copied between databases at specific intervals.
What are the key features of data warehousing?
Extracts, transforms, and loads data (ETL).
Stores data in a single database.
Uses dimensional data models with fact and dimension tables.
What are the advantages of data warehousing?
Standardized ETL services make it easy to implement and maintain.
Low cost and reliable as it integrates data into one place.