Misc Flashcards

1
Q

You can use SSAS data source in an ADF Copy activity

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ADF Copy activity can invoke the Polybase feature to load Azure synapse analytics SQL pool

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You can implement incremental load from Azure SQL database by using change tracking combined with an ADF copy activity

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which type of transactional database system would work best for product data?

A

OLTP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Suppose a retailer’s operations to update inventory and process payments are in the same transaction. A user is trying to apply a $30 store credit on an order from their laptop and is submitting the exact same order by using the store credit (for the full amount) from their phone. Two identical orders are received. The database behind the scenes is an ACID-compliant database. What will happen?

A

One order will be processed and use the in-store credit, and the other order won’t be processed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following describes a good strategy for creating storage accounts and blob containers for your application?

  • Create both your Azure Storage accounts and containers before deploying your application.
  • Create Azure Storage accounts in your application as needed. Create the containers before deploying the application.
  • Create Azure Storage accounts before deploying your app. Create containers in your application as needed.
A

Create Azure Storage accounts before deploying your app. Create containers in your application as needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following can be used to initialize the Blob Storage client library within an application?

  • An Azure username and password.
  • The Azure Storage account connection string.
  • A globally-unique identifier (GUID) that represents the application.
  • The Azure Storage account datacenter and location identifiers.
A

The Azure Storage account connection string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens when you obtain a BlobClient reference from BlobContainerClient with the name of a blob?

  • A new block blob is created in storage.
  • A BlobClient object is created locally. No network calls are made.
  • An exception is thrown if the blob does not exist in storage.
  • The contents of the named blob are downloaded.
A

A BlobClient object is created locally. No network calls are made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which is the default distribution used for a table in Synapse Analytics?

HASH.

Round-Robin.

Replicated Table.

A

Round-Robin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which Index Type offers the highest compression?

Columnstore.

Rowstore.

Heap.

A

Columnstore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do column statistics improve query performance?

By keeping track of which columns are being queried.

By keeping track of how much data exists between ranges in columns.

By caching column values for queries.

A

By keeping track of how much data exists between ranges in columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In what language can the Azure Synapse Apache Spark to Synapse SQL connector be used?

Python.

SQL.

Scala.

A

Scala

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When is it unnecessary to use import statements for transferring data between a dedicated SQL and Apache Spark pool?

Use the integrated notebook experience from Azure Synapse Studio.

Use the PySpark connector.

Use token-based authentication.

A

Use the integrated notebook experience from Azure Synapse Studio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which language can be used to define Spark job definitions?

Transact-SQL

PowerShell

PySpark

A

PySpark

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What Transact-SQL function verifies if a piece of text is valid JSON?

JSON_QUERY

JSON_VALUE

ISJSON

A

ISJSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What Transact-SQL function is used to perform a HyperLogLog function?

APPROX_COUNT_DISTINCT

COUNT_DISTINCT_APPROX

COUNT

A

APPROX_COUNT_DISTINCT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which ALTER DATABASE statement parameter allows a dedicated SQL pool to scale?

SCALE.

MODIFY

CHANGE.

A

MODIFY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which workload management feature influences the order in which a request gets access to resources?

Workload classification.

Workload importance.

Workload isolation.

A

Workload importance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which Dynamic Management View enables the view the active connections against a dedicated SQL pool?

sys. dm_pdw_exec_requests.
sys. dm_pdw_dms_workers.

DBCC PDW_SHOWEXECUTIONPLAN.

A

sys.dm_pdw_exec_requests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What would be the best approach to investigate if the data at hand is unevenly allocated across all distributions?

Grouping the data based on partitions and counting rows with a T-SQL query.

Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions.

Monitor query speeds by testing the same query for each partition.

A

Using DBCC PDW_SHOWSPACEUSED to see the number of table rows that are stored in each of the 60 distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

To achieve improved query performance, which one would be the best data type for storing data that contains less than 128 characters?

VARCHAR(MAX)

VARCHAR(128)

NVARCHAR(128)

A

VARCHAR(128)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which of the following statements is a benefit of materialized views?

Reducing the execution time for complex queries with JOINs and aggregate functions.

Increased resiliency benefits.

Increased high availability.

A

Reducing the execution time for complex queries with JOINs and aggregate functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

You want to configure a private endpoint. You open up Azure Synapse Studio, go to the manage hub, and see that the private endpoints is greyed out. Why is the option not available?

Azure Synapse Studio does not support the creation of private endpoints.

A Conditional Access policy has to be defined first.

A managed virtual network has not been created.

A

A managed virtual network has not been created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

You require an Azure Synapse Analytics Workspace to access an Azure Data Lake Store using the benefits of the security provided by Azure Active Directory. What is the best authentication method to use?

Storage account keys.

Shared access signatures.

Managed identities.

A

Managed identities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Which definition best describes Apache Spark? A highly scalable relational database management system. A virtual server with a Python runtime. A distributed platform for parallel data processing using multiple languages.
A distributed platform for parallel data processing using multiple languages.
26
You need to use Spark to analyze data in a parquet file. What should you do? Load the parquet file into a dataframe. Import the data into a table in a serverless SQL pool. Convert the data to CSV format.
Load the parquet file into a dataframe.
27
You want to write code in a notebook cell that uses a SQL query to retrieve data from a view in the Spark catalog. Which magic should you use? %%spark %%pyspark %%sql
%%sql
28
Which of the following descriptions best fits Delta Lake? A Spark API for exporting data from a relational database into CSV files. A relational storage layer for Spark that supports tables based on Parquet files. A synchronization solution that replicates data between SQL pools and Spark pools.
A relational storage layer for Spark that supports tables based on Parquet files.
29
You've loaded a Spark dataframe with data, that you now want to use in a Delta Lake table. What format should you use to write the dataframe to storage? CSV PARQUET DELTA
DELTA
30
What feature of Delta Lake enables you to retrieve data from previous versions of a table? Spark Structured Streaming Time Travel Catalog Tables
Time Travel
31
You have a managed catalog table that contains Delta Lake data. If you drop the table, what will happen? The table metadata and data files will be deleted. The table metadata will be removed from the catalog, but the data files will remain intact. The table metadata will remain in the catalog, but the data files will be deleted.
The table metadata and data files will be deleted.
32
When using Spark Structured Streaming, a Delta Lake table can be which of the following? Only a source Only a sink Either a source or a sink
Either a source or a sink
33
What is one of the possible ways to optimize an Apache Spark Job? Remove all nodes. Remove the Apache Spark Pool. Use bucketing.
Use bucketing.
34
What can cause a slower performance on join or shuffle jobs? Data skew. Enablement of autoscaling Bucketing.
Data skew.
35
Which of the following descriptions matches a hybrid transactional/analytical processing (HTAP) architecture. Business applications store data in an operational data store, which is also used to support analytical queries for reporting. Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis. Business applications store operational data in an analytical data store that is optimized for queries to support reporting and analysis.
Business applications store data in an operational data store, which is synchronized with low latency to a separate analytical store for reporting and analysis.
36
You want to use Azure Synapse Analytics to analyze operational data stored in a Cosmos DB core (SQL) API container. Which Azure Synapse Link service should you use? Azure Synapse Link for SQL Azure Synapse Link for Dataverse Azure Synapse Link for Cosmos DB
Azure Synapse Link for Cosmos DB
37
You plan to use Azure Synapse Link for Dataverse to analyze business data in your Azure Synapse Analytics workspace. Where is the replicated data from Dataverse stored? In an Azure Synapse dedicated SQL pool In an Azure Data Lake Gen2 storage container. In an Azure Cosmos DB container.
In an Azure Data Lake Gen2 storage container.
38
You have an Azure Cosmos DB core (SQL) account and an Azure Synapse Analytics workspace. What must you do first to enable HTAP integration with Azure Synapse Analytics? Configure global replication in Azure Cosmos DB. Create a dedicated SQL pool in Azure Synapse Analytics. Enable Azure Synapse Link in Azure Cosmos DB.
Enable Azure Synapse Link in Azure Cosmos DB.
39
You have an existing container in a Cosmos DB core (SQL) database. What must you do to enable analytical queries over Azure Synapse Link from Azure Synapse Analytics? Delete and recreate the container. Enable Azure Synapse Link in the container to create an analytical store. Add an item to the container.
Enable Azure Synapse Link in the container to create an analytical store.
40
You plan to use a Spark pool in Azure Synapse Analytics to query an existing analytical store in Cosmos DB. What must you do? Create a linked service for the Cosmos DB database where the analytical store enabled container is defined. Disable automatic pausing for the Spark pool in Azure Synapse Analytics. Install the Azure Cosmos DB SDK for Python package in the Spark pool.
Create a linked service for the Cosmos DB database where the analytical store enabled container is defined.
41
You're writing PySpark code to load data from a Cosmos DB analytical store into a dataframe. What format should you specify? cosmos. json cosmos. olap cosmos. sql
cosmos.olap
42
You're writing a SQL code in a serverless SQL pool to query an analytical store in Cosmos DB. What function should you use? OPENDATASET ROW OPENROWSET
OPENROWSET
43
From which of the following data sources can you use Azure Synapse Link for SQL to replicate data to Azure Synapse Analytics? Azure Cosmos DB SQL Server 2022 Azure SQL Managed Instance
SQL Server 2022
44
What must you create in your Azure Synapse Analytics workspace to implement Azure Synapse Link for Azure SQL Database? A serverless SQL pool A linked service for your Azure SQL Database A link connection for your Azure SQL Database
A link connection for your Azure SQL Database
45
You plan to use Azure Synapse Link for SQL to replicate tales from SQL Server 2022 to Azure Synapse Analytics. What additional Azure resource must you create? Azure Data Lake Storage Gen2 Azure Key Vault Azure Application Insights
Azure Data Lake Storage Gen2
46
How many drivers does a Cluster have? Only one Two, running in parallel Configurable between one and eight
Only one
47
Spark is a distributed computing environment. Therefore, work is parallelized across executors. At which two levels does this parallelization occur? The Executor and the Slot The Driver and the Executor The Slot and the Task
The Executor and the Slot
48
What type of process are the driver and the executors? Java processes Python processes C++ processes
Java processes
49
Which notebook format is used in Databricks? DBC .notebook .spark
DBC
50
When creating a new cluster in the Azure Databricks workspace, what happens behind the scenes? Azure Databricks provisions a dedicated VM that processes all jobs, based on your VM type and size selection. Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections. When an Azure Databricks workspace is deployed, you are allocated a pool of VMs. Creating a cluster draws from this pool.
Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.
51
To parallelize work, the unit of distribution is a Spark Cluster. Every Cluster has a Driver and one or more executors. Work submitted to the Cluster is split into what type of object? Stages Arrays Jobs
Jobs
52
How do you list files in DBFS within a notebook? ls /my-file-path %fs dir /my-file-path %fs ls /my-file-path
%fs ls /my-file-path
53
How do you infer the data types and column names when you read a JSON file? spark. read.option("inferSchema", "true").json(jsonFile) spark. read.inferSchema("true").json(jsonFile) spark. read.option("inferData", "true").json(jsonFile)
spark.read.option("inferSchema", "true").json(jsonFile)
54
Which DataFrame method do you use to create a temporary view? createTempView() createTempViewDF() createOrReplaceTempView()
createOrReplaceTempView()
55
How do you create a DataFrame object? Introduce a variable name and equate it to something like myDataFrameDF = Use the createDataFrame() function Use the DF.create() syntax
Introduce a variable name and equate it to something like myDataFrameDF =
56
How do you cache data into the memory of the local executor for instant access? .save().inMemory() .inMemory().save() .cache()
.cache()
57
What is the Python syntax for defining a DataFrame in Spark from an existing Parquet file in DBFS? IPGeocodeDF = parquet.read("dbfs:/mnt/training/ip-geocode.parquet") IPGeocodeDF = spark.read.parquet("dbfs:/mnt/training/ip-geocode.parquet") IPGeocodeDF = spark.parquet.read("dbfs:/mnt/training/ip-geocode.parquet")
IPGeocodeDF = spark.read.parquet("dbfs:/mnt/training/ip-geocode.parquet")
58
Which of the following statements describes a wide transformations? A wide transformation can be applied per partition/worker with no need to share or shuffle data to other workers A wide transformation requires sharing data across workers. It does so by shuffling data. A wide transformation applies data transformation over a large number of columns
A wide transformation requires sharing data across workers. It does so by shuffling data.
59
Which feature of Spark determines how your code is executed? Catalyst Optimizer Tungsten Record Format Java Garbage Collection
Catalyst Optimizer
60
If you create a DataFrame that will read some data from Azure Blob Storage, and then you create another DataFrame by filtering the initial DataFrame. What feature of Spark causes these transformation to be analyzed? Tungsten Record Format Java Garbage Collection Lazy Execution
Lazy Execution
61
Which command orders by a column in descending order? df. orderBy("requests desc") df. orderBy("requests").desc() df. orderBy(col("requests").desc())
df.orderBy(col("requests").desc())
62
Which command specifies a column value in a DataFrame's filter? Specifically, filter by a productType column where the value is equal to book? df. filter(col("productType") == "book") df. filter("productType = 'book'") df. col("productType").filter("book")
df.filter(col("productType") == "book")
63
When using the Column Class, which command filters based on the end of a column value? For example, a column named verb and filtered by words ending with "ing". df. filter().col("verb").like("%ing") df. filter("verb like '%ing'") df. filter(col("verb").endswith("ing"))
df.filter(col("verb").endswith("ing"))
64
Which method for renaming a DataFrame's column is incorrect? df. select(col("timestamp").alias("dateCaptured")) df. alias("timestamp", "dateCaptured") df. toDF("dateCaptured")
df.alias("timestamp", "dateCaptured")
65
You need to find the average of sales transactions by storefront. Which of the following aggregates would you use? df. select(col("storefront")).avg("completedTransactions") df. groupBy(col("storefront")).avg(col("completedTransactions")) df. groupBy(col("storefront")).avg("completedTransactions")
df.groupBy(col("storefront")).avg("completedTransactions")
66
Which statement about the Azure Databricks Data Plane is true? The Data Plane contains the Cluster Manager and coordinates data processing jobs The Data Plane is hosted within a Microsoft-managed subscription The Data Plane is hosted within the client subscription and is where all data is processed and stored
The Data Plane is hosted within the client subscription and is where all data is processed and stored
67
In which modes does Azure Databricks provide data encryption? At-rest and in-transit At-rest only In-transit only
At-rest and in-transit
68
What does Azure Data Lake Storage (ADLS) Passthrough enable? Automatically mounting ADLS accounts to the workspace that are added to the managed resource group User security groups that are added to ADLS are automatically created in the workspace as Databricks groups Commands running on a configured cluster can read and write data in ADLS without configuring service principal credentials
Commands running on a configured cluster can read and write data in ADLS without configuring service principal credentials
69
What is an Azure Key Vault-backed secret scope? It is the Key Vault Access Key used to securely connect to the vault and retrieve secrets A Databricks secret scope that is backed by Azure Key Vault instead of Databricks It is a method by which you create a secure connection to Azure Key Vault from a notebook and directly access its secrets within the Spark session
A Databricks secret scope that is backed by Azure Key Vault instead of Databricks
70
What is the Databricks Delta command to display metadata? MSCK DETAIL tablename DESCRIBE DETAIL tableName SHOW SCHEMA tablename
DESCRIBE DETAIL tableName
71
How do you perform UPSERT in a Delta dataset? Use UPSERT INTO my-table Use UPSERT INTO my-table /MERGE Use MERGE INTO my-table USING data-to-upsert
Use MERGE INTO my-table USING data-to-upsert
72
What optimization does the following command perform: OPTIMIZE Students ZORDER BY Grade? Creates an order-based index on the Grade field to improve filters against that field Ensures that all data backing, for example, Grade=8 is colocated, then updates a graph that routes requests to the appropriate files Ensures that all data backing, for example, Grade=8 is colocated, then rewrites the sorted data into new Parquet files
Ensures that all data backing, for example, Grade=8 is colocated, then rewrites the sorted data into new Parquet files
73
What size does OPTIMIZE compact small files to? Around 100 MB Around 1 GB Around 500 MB
Around 1 GB
74
When doing a write stream command, what does the outputMode("append") option do? The append mode allows records to be updated and changed in place The append outputMode allows records to be added to the output sink The append mode replaces existing records and updates aggregates
The append outputMode allows records to be added to the output sink
75
In Spark Structured Streaming, what method should be used to read streaming data into a DataFrame? spark. readStream spark. read spark. stream.read
spark.readStream
76
What happens if the command option("checkpointLocation", pointer-to-checkpoint directory) is not specified? It will not be possible to create more than one streaming query that uses the same streaming source since they will conflict The streaming job will function as expected since the checkpointLocation option does not exist When the streaming job stops, all state around the streaming job is lost, and upon restart, the job must start from scratch
When the streaming job stops, all state around the streaming job is lost, and upon restart, the job must start from scratch
77
What is a lambda architecture and what does it try to solve? An architecture that defines a data processing pipeline whereby microservices act as compute resources for efficient large-scale data processing An architecture that splits incoming data into two paths - a batch path and a streaming path. This architecture helps address the need to provide real-time processing in addition to slower batch computations. An architecture that employs the latest Scala runtimes in one or more Databricks clusters to provide the most efficient data processing platform available today
An architecture that splits incoming data into two paths - a batch path and a streaming path. This architecture helps address the need to provide real-time processing in addition to slower batch computations.
78
What command should be issued to view the list of active streams? Invoke spark.streams.active Invoke spark.streams.show Invoke spark.view.active
Invoke spark.streams.active
79
What is required to specify the location of a checkpoint directory when defining a Delta Lake streaming query? .writeStream.format("delta").checkpoint("location", checkpointPath) ... .writeStream.format("delta").option("checkpointLocation", checkpointPath) ... .writeStream.format("parquet").option("checkpointLocation", checkpointPath) ...
.writeStream.format("delta").option("checkpointLocation", checkpointPath) ...
80
What's the purpose of linked services in Azure Data Factory? To represent a data store or a compute resource that can host execution of an activity To represent a processing step in a pipeline To link data stores or computer resources together for the movement of data between resources
To represent a data store or a compute resource that can host execution of an activity
81
How can parameters be passed into an Azure Databricks notebook from Azure Data Factory? Use the new API endpoint option on a notebook in Databricks and provide the parameter name Use notebook widgets to define parameters that can be passed into the notebook Deploy the notebook as a web service in Databricks, defining parameter names and types
Use notebook widgets to define parameters that can be passed into the notebook
82
What happens to Databricks activities (notebook, JAR, Python) in Azure Data Factory if the target cluster in Azure Databricks isn't running when the cluster is called by Data Factory? If the target cluster is stopped, Databricks will start the cluster before attempting to execute The Databricks activity will fail in Azure Data Factory – you must always have the cluster running Simply add a Databricks cluster start activity before the notebook, JAR, or Python Databricks activity
If the target cluster is stopped, Databricks will start the cluster before attempting to execute
83
What does the CD in CI/CD mean? Continuous Delivery Continuous Deployment Both are correct
Both are correct
84
What sort of pipeline is required in Azure DevOps for creating artifacts used in releases? An Artifact pipeline A Build pipeline A Release pipeline
A Build pipeline
85
What steps are required to authorize Azure DevOps to connect to and deploy notebooks to a staging or production Azure Databricks workspace? Create an Azure Active Directory application, copy the application ID, then use that as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline In the production or staging Azure Databricks workspace, enable Git integration to Azure DevOps, then link to the Azure DevOps source code repo Create a new Access Token within the user settings in the production Azure Databricks workspace, then use the token as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline
Create a new Access Token within the user settings in the production Azure Databricks workspace, then use the token as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline
86
What are the two prerequisites for connecting Azure Databricks with Azure Synapse Analytics that apply to the Azure Synapse Analytics instance? Create a database master key and configure the firewall to enable Azure services to connect Use a correctly formatted ConnectionString and create a database master key Add the client IP address to the firewall's allowed IP addresses list and use the correctly formatted ConnectionString
Create a database master key and configure the firewall to enable Azure services to connect
87
Which is the correct syntax for overwriting data in Azure Synapse Analytics from a Databricks notebook? df. write.mode("overwrite").option("...").option("...").save() df. write.format("com.databricks.spark.sqldw").overwrite().option("...").option("...").save() df. write.format("com.databricks.spark.sqldw").mode("overwrite").option("...").option("...").save()
df.write.format("com.databricks.spark.sqldw").mode("overwrite").option("...").option("...").save()
88
What is SCIM? An optimization that removes orphaned data from a given dataset An open standard that enables users to bring their own auth key to the Databricks environment An open standard that enables organizations to import both groups and users from Azure Active Directory into Azure Databricks
An open standard that enables organizations to import both groups and users from Azure Active Directory into Azure Databricks
89
If mounting an Azure Data Lake Storage (ADLS) account to a workspace, what cluster feature must be used to have ACLS within ADLS applied to the user executing commands in a notebook? Enable ADLS Passthrough on a cluster Enable SCIM Set spark.config.adls.impersonateuser(true)
Enable ADLS Passthrough on a cluster
90
Mike is creating an Azure Data Lake Storage Gen2 account. He must configure this account to be able to process analytical data workloads for best performance. Which option should he configure when creating the storage account? On the Basic tab, set the Performance option to Standard. On the Basic Tab, set the Performance option to ON. On the Advanced tab, set the Hierarchical Namespace to Enabled
On the Advanced tab, set the Hierarchical Namespace to Enabled.
91
In which phase of big data processing is Azure Data Lake Storage located? Ingestion Store Model & Serve
Store
92
You are working on a project with a 3rd party vendor to build a website for a customer. The image assets that will be used on the website are stored in an Azure Storage account that is held in your subscription. You want to give read access to this data for a limited period of time. What security option would be the best option to use? CORS Support Storage Account Shared Access Signatures
Shared Access Signatures
93
When configuring network access to your Azure Storage Account, what is the default network rule? To allow all connections from all networks To allow all connection from a private IP address range To deny all connections from all networks
To allow all connections from all networks
94
Which Azure service detects anomalies in account activities and notifies you of potential harmful attempts to access your account? Azure Defender for Storage Azure Storage Account Security Feature Encryption in transit
Azure Defender for Storage
95
Which of the following technologies typically provide an ingestion point for data streaming in an event processing solution that uses static data as a source? Azure IoT Hub Azure Blob storage Azure Event Hubs
Azure Blob storage
96
To consume processed event streaming data in near-real-time to produce dashboards containing rich visualizations, which of the following services should you use? Azure Cosmos DB Event Hubs Power BI
Power BI
97
Applications that publish messages to Azure Event Hub very frequently will get the best performance using Advanced Message Queuing Protocol (AMQP) because it establishes a persistent socket. True False
True
98
By default, how many partitions will a new Event Hub have? 1 2 3 4
4
99
What is the maximum size for a single publication (individual or batch) that is allowed by Azure Event Hub? 256 KB 512 KB 1 MB 2 MB
1 MB
100
Which of the definitions below best describes a Tumbling window? A windowing function that clusters together events that arrive at similar times, filtering out periods of time in which there is no data. A windowing function that segment a data stream into a contiguous series of fixed-size, non-overlapping time segments and operate against them. Events cannot belong to more than one tumbling window. A windowing function that groups events by identical timestamp values.
A windowing function that segment a data stream into a contiguous series of fixed-size, non-overlapping time segments and operate against them. Events cannot belong to more than one tumbling window.
101
Which of the following services is an invalid input for an Azure Stream Analytics job? Blob storage Azure Cosmos DB Azure Event Hubs
Azure Cosmos DB
102
Below is a list of key benefits of using Azure Stream Analytics to process streaming data. Which of the following statements is incorrect? The ability to write and test transformation queries in the Azure portal Being able to rapidly deploy queries into production by creating and starting an Azure Stream Analytics job Integration with Azure Blob storage
Integration with Azure Blob storage
103
Which technology is typically used as a staging area in a modern data warehousing architecture? Azure Data Lake. Azure Synapse SQL Pools. Azure Synapse Spark Pools.
Azure Data Lake.
104
Which component enables you to perform code free transformations in Azure Synapse Analytics? Studio. Copy activity. Mapping Data Flow.
Mapping Data Flow.
105
Which transformation in the Mapping Data Flow is used to route data rows to different streams based on matching conditions? Lookup. Conditional Split. Select.
Conditional Split.
106
Which transformation is used to load data into a data store or compute resource? Window. Source. Sink.
Sink.
107
In which of the following table types should an insurance company store details of customer attributes by which claims will be aggregated? Staging table Dimension table Fact table
Dimension table
108
You create a dimension table for product data, assigning a unique numeric key for each row in a column named ProductKey. The ProductKey is only defined in the data warehouse. What kind of key is ProductKey? A surrogate key An alternate key A business key
A surrogate key
109
What distribution option would be best for a sales fact table that will contain billions of records? HASH ROUND_ROBIN REPLICATE
HASH
110
You need to write a query to return the total of the UnitsProduced numeric measure in the FactProduction table aggregated by the ProductName attribute in the FactProduct table. Both tables include a ProductKey surrogate key field. What should you do? Use two SELECT queries with a UNION ALL clause to combine the rows in the FactProduction table with those in the FactProduct table. Use a SELECT query against the FactProduction table with a WHERE clause to filter out rows with a ProductKey that doesn't exist in the FactProduct table. Use a SELECT query with a SUM function to total the UnitsProduced metric, using a JOIN on the ProductKey surrogate key to match the FactProduction records to the FactProduct records and a GROUP BY clause to aggregate by ProductName.
Use a SELECT query with a SUM function to total the UnitsProduced metric, using a JOIN on the ProductKey surrogate key to match the FactProduction records to the FactProduct records and a GROUP BY clause to aggregate by ProductName.
111
You use the RANK function in a query to rank customers in order of the number of purchases they have made. Five customers have made the same number of purchases and are all ranked equally as 1. What rank will the customer with the next highest number of purchases be assigned? 2 6 1
6
112
You need to compare approximate production volumes by product while optimizing query response time. Which function should you use? COUNT NTILE APPROX_COUNT_DISTINCT
APPROX_COUNT_DISTINCT
113
How does splitting source files help maintain good performance when loading into Synapse Analytics? optimized processing of smaller file sizes. Compute node to storage segment alignment. Reduced possibility of data corruptions.
Compute node to storage segment alignment.
114
Which Workload Management capability manages minimum and maximum resource allocations during peak periods? Workload Isolation. Workload Importance. Workload Containment.
Workload Isolation.
115
Which T-SQL Statement loads data directly from Azure Storage? LOAD DATA. COPY. INSERT FROM FILE.
COPY.
116
How does splitting source files help maintain good performance when loading into Synapse Analytics? optimized processing of smaller file sizes. Compute node to storage segment alignment. Reduced possibility of data corruptions.
Compute node to storage segment alignment.
117
Which Workload Management capability manages minimum and maximum resource allocations during peak periods? Workload Isolation. Workload Importance. Workload Containment.
Workload Isolation.
118
Which T-SQL Statement loads data directly from Azure Storage? LOAD DATA. COPY. INSERT FROM FILE.
COPY.
119
Which Azure Data Factory component orchestrates a transformation job or runs a data movement command? Linked Services Datasets Activities
Activities
120
You are moving data from an Azure Data Lake Gen2 store to Azure Synapse Analytics. Which Azure Data Factory integration runtime would be used in a data copy activity? Azure-SSIS Azure Self-hosted
Azure
121
In Azure Data Factory authoring tool, where would you find the Copy data activity? Move & Transform Batch Service Databricks
Move & Transform
122
You want to ingest data from a SQL Server database hosted on an on-premises Windows Server. What integration runtime is required for Azure Data Factory to ingest data from the on-premises server? Azure-SSIS Integration Runtime Self-Hosted Integration Runtime Azure Integration Runtime
Self-Hosted Integration Runtime
123
By default, how long are the Azure Data Factory diagnostic logs retained for? 15 days 30 days 45 days
45 days
124
Which transformation in the Mapping Data Flow is used to route data rows to different streams based on matching conditions? Lookup. Conditional Split. Select.
Conditional Split
125
Which transformation is used to load data into a data store or compute resource? Window. Source. Sink.
Sink
126
Which SCD type would you use to keep history of changes in dimension members by adding a new row to the table for each change? Type 1 SCD. Type 2 SCD. Type 3 SCD.
Type 2 SCD.
127
Which SCD type would you use to update the dimension members without keeping track of history? Type 1 SCD. Type 2 SCD. Type 3 SCD.
Type 1 SCD.
128
What is a supported connector for built-in parameterization? Azure Data Lake Storage Gen2 Azure Synapse Analytics Azure Key Vault
Azure Synapse Analytics
129
What is an example of a branching activity used in control flows? The If-condition Until-condition Lookup-condition
The If-condition
130
In which version of SQL Server was SSIS Projects introduced? SQL Server 2008. SQL Server 2012. SQL Server 2016.
SQL Server 2012.
131
Which tool is used to perform an assessment of migrating SSIS packages to Azure SQL Database services? Data Migration Assistant. Data Migration Assessment. Data Migration Service.
Data Migration Assistant.
132
Which tool is used to create and deploy SQL Server Integration Packages on an Azure-SSIS integration runtime, or for on-premises SQL Server? SQL Server Data Tools. SQL Server Management Studio. dtexec.
SQL Server Data Tools.
133
Which version control software does Azure Data Factory integrate with? Team Foundation Server. Source Safe. Git repositories.
Git repositories.
134
Which feature commits the changes of Azure Data Factory work in a custom branch created with the main branch in a Git repository? Repo. Pull request. Commit.
Pull request.
135
Which feature in alerts can be used to determine how an alert is fired? Add rule. Add severity. Add criteria.
Add criteria.
136
Suppose you have two video files stored as blobs. One of the videos is business-critical and requires a replication policy that creates multiple copies across geographically diverse datacenters. The other video is non-critical, and a local replication policy is sufficient. Which of the following options would satisfy both data diversity and cost sensitivity consideration. Create a single storage account that makes use of Local-redundant storage (LRS) and host both videos from here. Create a single storage account that makes use of Geo-redundant storage (GRS) and host both videos from here. Create two storage accounts. The first account makes use of Geo-redundant storage (GRS) and hosts the business-critical video content. The second account makes use of Local-redundant storage (LRS) and hosts the non-critical video content.
Create two storage accounts. The first account makes use of Geo-redundant storage (GRS) and hosts the business-critical video content. The second account makes use of Local-redundant storage (LRS) and hosts the non-critical video content.
137
The name of a storage account must be: Unique within the containing resource group. Unique within your Azure subscription. Globally unique.
Globally unique.
138
In a typical project, when would you create your storage account(s)? At the beginning, during project setup. After deployment, when the project is running. At the end, during resource cleanup.
At the beginning, during project setup.
139
How many access keys are provided for accessing your Azure storage account? 1 2 3 4
2
140
You can use either the REST API or the Azure client library to programmatically access a storage account. What is the primary advantage of using the client library? Cost Availability Localization Convenience
Convenience
141
Which of the following is a good analogy for the access keys of a storage account? IP Address REST Endpoint Username and password Cryptographic algorithm
Username and password
142
You are working on a project with a 3rd party vendor to build a website for a customer. The image assets that will be used on the website are stored in an Azure Storage account that is held in your subscription. You want to give read access to this data for a limited period of time. What security option would be the best option to use? CORS Support Storage Account Shared Access Signatures
Shared Access Signatures
143
When configuring network access to your Azure Storage Account, what is the default network rule? To allow all connections from all networks To allow all connection from a private IP address range To deny all connections from all networks
To allow all connections from all networks
144
Which Azure service detects anomalies in account activities and notifies you of potential harmful attempts to access your account? Azure Defender for Storage Azure Storage Account Security Feature Encryption in transit
Azure Defender for Storage