Get Data Flashcards
(68 cards)
You have 8 processes running with many different Schedules and Data Jobs, what is a good way to get a full overview of your Data Pipeline’s health?
A. Set up alerts on all Data Jobs
B. Enable Custom Monitoring and work with Analyses
C. Subscribe to all your Data Models
D. Regularly check all logs for all of your processes
B
A business expert tells you the number of cases in your Data Model is incorrect, what can you do? Select 2
A. Add a “Distinct to all of your SELECT statements in transformations
B. Check the number of cases on the database
C. Check the foreign key links of your Data Model
D. Check filters and joins in your Data Job tasks
B
D
A business expert asks you how a specific activity was defined, what information should you provide? 2 answer
• A. The Data Job and Schedule used
• B. The timestamp used
• C. The filters/joins applied
• D. The parameters used
B
C
A business expert reports to you that the Total Net Value of all cases seems off. What is tvpically the
primary source of errors here?
• A. Wrong Data Model Calendars
• B. Wrong joins and filters
• C. Wrong Currensy Conversions
• D. Wrong activity timestamps
C
You’ve set up a Data Pipeline using both Data Jobs and the Replication Cockpit, what can you do to get
notified if anything fails? Select 3 correct answers
A. Set up Data Job alerts
B. Set up Replication alerts
C. Set up a regular export of the Monitor Dashboard to be sent to your email on a daily basis
D. Subscribe to your Data Model (s)
A
B
D
You have three of the same source systems running the same process in different countries. What steps
would you take to bring the process data into a Data Model?
A. Create 3 Data Pools each with a Data Model, and bring the Data Models together in an Analysis
B. Create 3 Data Jobs and use templates to re-use tasks across them.
C. Create a Global Data Job to merge the Data Model tables
D. Share one Data Connection across three Data Pools
C
One of your source systems contains information on 2 separate processes you would not like to see any correlation on. How could you set up your Data Jobs to minimize system load and keep your work at a minimum?
A. With 2 Data Jobs, each containing extractions and transformations for the separate processes
• B. With 1 Data Job for all extractions and 1 per process for all table creations
• C. With 1 Data lob for all extractions and transformations
• D. With 1 Global Data Job handling all extractions and transformations
B
You’ve set up multiple Activity and case tables in your Data Model, what happens when you turn on “Eventlog automerge” and load your Data Model?
• A. The Data Model automatically finds and adds foreign key relationships to your Data Model and merges all all tables
B. All Activity tables are merged into the default Activity Table
C. The separate Activity tables and merged and all the different Case IDs are kept in the merged table
• D. The Activity and Case tables are all merge into one table
B
You would like to test and save your work in Data Integration before deploying to a productive
environment. How can vou do this in Data Integration?
• A. With Data Pool Duplication
• B. By sharing Data Connections
C. By using templates and downloading your sgl scripts regularly
• D. With Data Pool Versioning
D
You are merging Case Tables in a Global Data Job, which SOL statement should you use?
• A. GROUP BY
• B. UNION ALL
• C. MERGE ALL
D. INNER JOIN
B
Which statement is used in Vertica to display the query plan?
A.ANALYSE
B.SHOW_COST
• C.EXPLAIN
• D.SHOW_QUERY PLAN
C
Which database processes SQL queries in Celonis Data Integration?”
A Vertica
B.Postegres sql
C.PQL Engine
D Oracle DB
A
You are comparing two query execution plans.
Query plan A has an estimated query cost of 1000
Query plan B has an estimated query cost of 500
What can we conclude from this?
A.Query B will take 50% less time to execute than query A
B.Querx B is more efficient but that has no impact on query execution time
•C query A is less efficient and most likely slower than a query B.
C
If both tables are pre-sorted on the join columns the optimizer chooses a HASH join, which is faster and uses
considerably fewer resources.
• ATrue
• B false
B
A query execution plan provides crucial assistance for query optimization by showing the following elements:
Select 3
A. Join type
•
B. Alternative (more suitable) join condition
•
C. Estimated query cost
•
D. NO STATISTICS indicator
A
C
D
When joining two large tables, which type of join should we aim for?
A merge join
B.HASH JOIN
C Left join
A
When should you explicitly collect table statistics?
A After the full extraction of each raw table from a source system
B.After the creation of every temporary join tables
C. For each table used in transformation scripts
B
How can you check if tables are missing table statistics? Select TWO correct answers,
A. By reading the query execution plan (EXPLAIN)
B. By executing SELECT SHOW_STATISTICS(‘TableName’)
C. By querying one of the system tables (@.g, Projections)
• D. By checking the JOIN type. HASH join indicates missing statistics
A
C
For tables extracted using Celonis extractor (i.e., “Raw” tables such as VBAP, VBAK, EKKO, EKPO), table statistics have to be created explicitly.
A true
B false
B
One of the ways to check if all tables in certain queries contain statistics is by reading the query
execution plan (EXPLAIN function). If there is “NO STATISTICS next to a table name, the given table has no statistics.
A true
B false
A
In the following query, where should you add the clause that will add statistics to the table created?
• A.After the table’s creation
B. Before the DROP TABLE statement
• c Before the CREATE TABLE statement
• D.After the SELECT keyword
A
What is the exact statement you should add to a query for Vertica to gather statistics on a table?
A.SELECT ADD_STATISTICS (TABLE_NAME’);
• B.CREATE STATISTICS (‘TABLE_NAME’);
C . SELECT ANALYZE_STATISTICS (‘TABLE _NAME’):
• D.CREATE ADD_STATISTICS (‘TABLE_NAME’);
C
SQL formatting skills are an important element of SQL best practices.
A true
B.False
A
When writing your queries, for which of the following should you use uppercase? Select 3 correct
•
•
A. All SQL Keywords (e.g. SELECT)
B. Only SELECT, FROM, WHERE
C. Only asregate functions (eg AVG)
D. All SQL Functions (es. CAST, AVG)
E. All SQL Operators (e.g. LIKE, IN)
A
D
E