Get Data Flashcards by Ilaria Sanguigni

You have 8 processes running with many different Schedules and Data Jobs, what is a good way to get a full overview of your Data Pipeline’s health?

A. Set up alerts on all Data Jobs
B. Enable Custom Monitoring and work with Analyses
C. Subscribe to all your Data Models
D. Regularly check all logs for all of your processes

How well did you know this?

Not at all

Perfectly

A business expert tells you the number of cases in your Data Model is incorrect, what can you do? Select 2

A. Add a “Distinct to all of your SELECT statements in transformations
B. Check the number of cases on the database
C. Check the foreign key links of your Data Model
D. Check filters and joins in your Data Job tasks

B
D

How well did you know this?

Not at all

Perfectly

A business expert asks you how a specific activity was defined, what information should you provide? 2 answer

• A. The Data Job and Schedule used
• B. The timestamp used
• C. The filters/joins applied
• D. The parameters used

B
C

How well did you know this?

Not at all

Perfectly

A business expert reports to you that the Total Net Value of all cases seems off. What is tvpically the
primary source of errors here?

• A. Wrong Data Model Calendars
• B. Wrong joins and filters
• C. Wrong Currensy Conversions
• D. Wrong activity timestamps

How well did you know this?

Not at all

Perfectly

You’ve set up a Data Pipeline using both Data Jobs and the Replication Cockpit, what can you do to get
notified if anything fails? Select 3 correct answers

A. Set up Data Job alerts
B. Set up Replication alerts
C. Set up a regular export of the Monitor Dashboard to be sent to your email on a daily basis
D. Subscribe to your Data Model (s)

A
B
D

How well did you know this?

Not at all

Perfectly

You have three of the same source systems running the same process in different countries. What steps
would you take to bring the process data into a Data Model?

A. Create 3 Data Pools each with a Data Model, and bring the Data Models together in an Analysis
B. Create 3 Data Jobs and use templates to re-use tasks across them.
C. Create a Global Data Job to merge the Data Model tables
D. Share one Data Connection across three Data Pools

How well did you know this?

Not at all

Perfectly

One of your source systems contains information on 2 separate processes you would not like to see any correlation on. How could you set up your Data Jobs to minimize system load and keep your work at a minimum?

A. With 2 Data Jobs, each containing extractions and transformations for the separate processes
• B. With 1 Data Job for all extractions and 1 per process for all table creations
• C. With 1 Data lob for all extractions and transformations
• D. With 1 Global Data Job handling all extractions and transformations

How well did you know this?

Not at all

Perfectly

You’ve set up multiple Activity and case tables in your Data Model, what happens when you turn on “Eventlog automerge” and load your Data Model?

• A. The Data Model automatically finds and adds foreign key relationships to your Data Model and merges all all tables
B. All Activity tables are merged into the default Activity Table
C. The separate Activity tables and merged and all the different Case IDs are kept in the merged table
• D. The Activity and Case tables are all merge into one table

How well did you know this?

Not at all

Perfectly

You would like to test and save your work in Data Integration before deploying to a productive
environment. How can vou do this in Data Integration?

• A. With Data Pool Duplication
• B. By sharing Data Connections
C. By using templates and downloading your sgl scripts regularly
• D. With Data Pool Versioning

How well did you know this?

Not at all

Perfectly

You are merging Case Tables in a Global Data Job, which SOL statement should you use?

• A. GROUP BY
• B. UNION ALL
• C. MERGE ALL
D. INNER JOIN

How well did you know this?

Not at all

Perfectly

Which statement is used in Vertica to display the query plan?

A.ANALYSE
B.SHOW_COST
• C.EXPLAIN
• D.SHOW_QUERY PLAN

How well did you know this?

Not at all

Perfectly

Which database processes SQL queries in Celonis Data Integration?”

A Vertica
B.Postegres sql
C.PQL Engine
D Oracle DB

How well did you know this?

Not at all

Perfectly

You are comparing two query execution plans.
Query plan A has an estimated query cost of 1000
Query plan B has an estimated query cost of 500
What can we conclude from this?

A.Query B will take 50% less time to execute than query A
B.Querx B is more efficient but that has no impact on query execution time
•C query A is less efficient and most likely slower than a query B.

How well did you know this?

Not at all

Perfectly

If both tables are pre-sorted on the join columns the optimizer chooses a HASH join, which is faster and uses
considerably fewer resources.

• ATrue
• B false

How well did you know this?

Not at all

Perfectly

A query execution plan provides crucial assistance for query optimization by showing the following elements:
Select 3

A. Join type
•
B. Alternative (more suitable) join condition
•
C. Estimated query cost
•
D. NO STATISTICS indicator

A
C
D

How well did you know this?

Not at all

Perfectly

When joining two large tables, which type of join should we aim for?

A merge join
B.HASH JOIN
C Left join

How well did you know this?

Not at all

Perfectly

When should you explicitly collect table statistics?

A After the full extraction of each raw table from a source system
B.After the creation of every temporary join tables
C. For each table used in transformation scripts

How well did you know this?

Not at all

Perfectly

How can you check if tables are missing table statistics? Select TWO correct answers,
A. By reading the query execution plan (EXPLAIN)
B. By executing SELECT SHOW_STATISTICS(‘TableName’)
C. By querying one of the system tables (@.g, Projections)
• D. By checking the JOIN type. HASH join indicates missing statistics

A
C

How well did you know this?

Not at all

Perfectly

For tables extracted using Celonis extractor (i.e., “Raw” tables such as VBAP, VBAK, EKKO, EKPO), table statistics have to be created explicitly.

A true
B false

How well did you know this?

Not at all

Perfectly

One of the ways to check if all tables in certain queries contain statistics is by reading the query
execution plan (EXPLAIN function). If there is “NO STATISTICS next to a table name, the given table has no statistics.

A true
B false

How well did you know this?

Not at all

Perfectly

In the following query, where should you add the clause that will add statistics to the table created?

• A.After the table’s creation
B. Before the DROP TABLE statement
• c Before the CREATE TABLE statement
• D.After the SELECT keyword

How well did you know this?

Not at all

Perfectly

What is the exact statement you should add to a query for Vertica to gather statistics on a table?

A.SELECT ADD_STATISTICS (TABLE_NAME’);
• B.CREATE STATISTICS (‘TABLE_NAME’);
C . SELECT ANALYZE_STATISTICS (‘TABLE _NAME’):
• D.CREATE ADD_STATISTICS (‘TABLE_NAME’);

How well did you know this?

Not at all

Perfectly

SQL formatting skills are an important element of SQL best practices.
A true
B.False

How well did you know this?

Not at all

Perfectly

When writing your queries, for which of the following should you use uppercase? Select 3 correct

•
•
A. All SQL Keywords (e.g. SELECT)
B. Only SELECT, FROM, WHERE
C. Only asregate functions (eg AVG)
D. All SQL Functions (es. CAST, AVG)
E. All SQL Operators (e.g. LIKE, IN)

A
D
E

How well did you know this?

Not at all

Perfectly

Which approach is better for the following scenario? (destra!) Generally, WHERE EXISTS: Select TWO correct answers. • A. is more performant than JOIN • B. is less performant than jOIN • C. is interchangeable with jOIN • D. should be used instead of JOIN for filtering purposes solely

A D

If you want to only select records from table A that have corresponding records in table B, but you don't need any column from table B, you should use: A join B.WHERE EXISTS C any of the two

SELECT DISTINCT is computationally expensive and causes overhead for a query, slowing it down. A true B false

In which situations does DISTINCT often appear to be required? Select TWO correst answers. A. Poor data quality B. Incorrect / incomplete joins C. More than three tables are being joined D. Use of INNER instead of the LEFT JOIN

A B

Which of the following are best practices for queries containing table joins? Select 3 A. Join only tables that are really required and used (e.g, in SELECT) B. Filter tables within JOIN instead of within WHERE C. Filter tables within WHERE instead of within JOIN D. Use EXISTS when you want to combine columns from two tables E. Apply a proper filter so that only relevant records are processed

A B E

Which elements of a temporary join table have a major impact on query performance? Select TWO correct A. Table sorting B. Number of columna. G. Existence of table statisti

A C

Which additional statement should be added after each CREATE TABLE statement? •A.SELECT ANALYZE STATISTICS ('TableName'); • B.CREATE STATISTICS ("TableName'); C.UPDATE STATISTICS ('TableName);

You have several transformations that are repeatedly executing the same joins between two tables. What can you do to optimise the data pipeline? • • A. Use LEFT JOINS B: Join tables during the extraction phase • C. Create a temporary join table

The Vertica query optimizer automatically optimizes inadequate table designs. •a true •b false

Which columns should you place at the beginning when creating temporary join tables? A,Solumns containing big number of characters • B.Columns used for joins with other tables • C non key solumns. • D.Columns containing aggregate values

When creating a temporary join table, you should sort it by the columns most frequently used for joins in subsequent transformations. This can be done either by placing the key columns in the beginning of the query that creates and populates the table or by having an explicit ORDER by clause at the end of the statement. Sorting by the join columns enables the query optimizer to perform a MERGE join, which is faster and uses considerably fewer resources A true B false

The usage of views should be limited solely to Activity transformations. A true B.False

Using views in transformations often negatively affects overall performance and significantly increases the total transformation time. A true B false

You should create a table instead of a view if many transformations need to access the results. A true B false

Creating a view instead of a temporary join table... A is always wrong B.is the preferred approach because it saves the storage C should be limited to Data Model tables only

You want to create an object used several times in other Transformations. Should you create a table or a view A table B view C. Either of the two

If the query contains complex definitions (e.g. multiple joins and conditions), what object should you create? • A temporary join table B view C either of two

One of the ways to check if all tables in certain queries contain statistics is by reading the query execution plan (EXPLAIN function). If there is "NO STATISTICS" next to a table name, the given table has no statistics. A true B false

In the following query, where would you insert EXPLAIN to generate a query execution plan? • A before CREATE TABLE • B Before SELECT • c at the end of the query • D.After SELECT

Which columns should you place at the beginning when creating temporary join tables? • A.Columns containing big number of characters • B.Columns used for joins with other tables • c Columns, containing aggregate yalues •d date type columns.

If the query contains complex definitions (e.g, multiple joins and conditions), what object should you create? A Temporary join table B view C either of the two

Adding a JOIN solely for filtering purposes is a bad practice that could lead to record duplication and require DISTINCT. Consequently, it will slow down the query. This situation can be resolved by using: A group by B where exist instead of join C left join instead of join

You are joining table A and table B, both of which are not sorted on the same join columns. What type of join will be performed? A full outer join B merge join • C.NATURAL JOIN • D.HASH JOIN

If you want to only select records from table A that have corresponding records in table B, but you don't need any column from table B, you should use: A.JOIN • B.WHERE EXISTS C.Any of the two

A query execution plan provides crucial assistance for query optimization by showing the following elements. Select 3 • A. Join tupe • B. Alternative (more suitable) join condition • C. Estimated query cost • D. NO STATISTICS indicator.

A C D

When creating a temporary join table, there are two ways of ensuring proper table sorting: A. Placing the key columns at the beginning of the CREATE TABLE statement B. Run ANALYZE_STATISTICS C. adding an explicit ORDER BY clause at the end of the CREATE TABLE statement D. Use WHERE EXISTS instead of a jOIN

A C

Using views in transformations often negatively affects the overall performance and significantly increases the total transformation time. Generally, creating a view instead of a temporary join table._ • A.is always wrong B.is the preferred approach because it saves the storage C Should be limited to Data Model tables only

You should add the SELECT ANALYZE STATISTICS (TABLE NAME'); after each "Create table" query state that creates and populates a table, 'The database then gathers statistics when the transformation or query is run ATrue B false

Which of the following two statements) are NOT true: A. Celenis cannot map out a process that is executed in multiple systems. B. No matter how many systems involved in a process, it can always be reconstructed end-to-end as long. as we can trace and link the cases across these systems. C. A unique identifier is a way to recognize a case as it moves through the process. D. It is not necessary to know the ideal flow of a process.

A D

What are the top three points to consider when selecting a unique identifier? A. The unique identifier should be as granular as possible. B. The unique identifier must consist of numbers only. C. The project scope and strategic goals must be kept in mind. D. There must be clear 1-to-1 mappings between the unique identifiers in respective systems.

A C D

What is an event log? Select 2 correct answers • A. A log of all activities that occur in the lifecycle of each case. • B. A system used to keep track of internal events. • C. Used by Celenis to reconstruct the process that occurs across multiple systems. • D. A log of activities for individual occurrences of events for multiple cases.

A C

When scoping data requirements for a process, which points should you consider? A. Activities B. System Performance C. Dimensions D. Key Metrics E. Branding

A C D

When scoping Key Metrics, what should you consider? A. The formula used to calculate B. The frequency of the Key Metric calculation C. The object used to display the key metric in an analysis D The tables and fields needed

A D

-Where can the data engineer find table requirements for process connectors? 2 option a. In the app views that use the process connectors b. In a process connector's extractions c. In the celonis help documentation d. In the celonis academy

C + altra (a o b)

OK- Which statements are true about data job transformations and replication cockpit transformations? 2 a. Data job full transformations d elete and recreate tables b. Replication codpit real-ti transformations insert rem and updated records C Data jobs can only handle transformations d. Replication codepit transformations can only handle full transformation

A B

What is the purpose of this SQL statement? CREATE TABLE BSIK_CLEAN AS( SELECT RON_NUMBER() ………C) AS NUM, BSIK.) Delete from > 1 a. To identify and remove duplicate records b. To create a table, count rows per partitions and keep the most frequent records c. To create a new table based on another and keep only records newer than a fixed date

OK- A project requires data engineer to set up two SAl o O20 data models - one oyerational and one analytical, How should the extraction be set up optimize load performance? A All tables need for both data models should be entracted with scheduded hal bads. B Tables needed for the operational data model should to be ensacted with data iph tella extractions C Tables needed for the operational data model should be extracted in real time

OK - A team would like to perform a one time upload of SAP data into the EMS. How can a data engineer achieve this? a. With the data push API b. With the extractor builder c. With the ABAP generator / upload tool

OK - When setting up a transformation for a data model table in the replication cockpit, which of the following should the data engineer consider in the script? 2 option a. Replace views with tables BSelect a delta filter approach. C.Use a drop table and create table approach d. Use a delete and insert approach

A D

OK - What would a data engineer insert to generate a query execution plan for this query? a. Explain after select b. Explain before select C. Explain before create table

2 What can a data engineer do with WHERE conditions to optimize the performance of transformations? a. Change them to where exists statements b. Move theme directly into join statements c. Use select distinct statements instead

A ?

? _ What can a data engineer do to improve the performance of this query? 3 options Add table statistics b. Use a left join instead of inner join C. Remove the join d. Replace the join with a where exists e. Avoid using distinct

A D E

How can a data engineer backup the work in data integration and revert changes if needed? a. By exporting data connections to celonis dedicated SFT server b. By working with data pool versions C. By scheduling a regular backup of all data pools in admins & settings

OK - What are advantages of trigger-based schedules? a. To continuously run full data job extractions b. To reduce the time gap between two sequential schedules c. To trigger schedules based on action flows

Get Data Flashcards

(68 cards)