Ingest and transform data Flashcards by Brad Lynch

Which tool is best suited for data transformation in Fabric when dealing with large-scale data that will continue to grow?

Notebooks

How well did you know this?

Not at all

Perfectly

Which Microsoft Fabric Real-Time Intelligence component should you use to ingest and transform a stream of real-time data?

Eventstream

How well did you know this?

Not at all

Perfectly

What do temporal window transformation enable you to do?

Aggregate event data in a stream based on specific time periods.

How well did you know this?

Not at all

Perfectly

What are the four data ingestion options available in Microsoft Fabric for loading data into a data warehouse?

COPY (Transact-SQL) statement, data pipelines, dataflows, and cross-warehouse.

How well did you know this?

Not at all

Perfectly

What are the supported data sources and file formats for the COPY (Transact-SQL) statement in Warehouse?

Azure Data Lake Storage (ADLS) Gen2 and Azure Blob Storage, with PARQUET and CSV file formats.

How well did you know this?

Not at all

Perfectly

What is the recommended minimum file size when working with external data on files in Microsoft Fabric?

At least 4 MB.

How well did you know this?

Not at all

Perfectly

What is a data pipeline?

A sequence of activities to orchestrate a data ingestion or transformation process

How well did you know this?

Not at all

Perfectly

You want to use a pipeline to copy data to a folder with a specified name for each run. What should you do?

Add a parameter to the pipeline and use it to specify the folder name for each run

How well did you know this?

Not at all

Perfectly

You have previously run a pipeline containing multiple activities. What’s the best way to check how long each individual activity took to complete?

View the run details in the run history.

How well did you know this?

Not at all

Perfectly

You want to include data in an external Azure Data Lake Store Gen2 location in your lakehouse, without the requirement to copy the data. What should you do?

Create a shortcut.

How well did you know this?

Not at all

Perfectly

You want to use Apache Spark to interactively explore data in a file in the lakehouse. What should you do?

Create a notebook.

How well did you know this?

Not at all

Perfectly

What is a Dataflow Gen2?

A way to import and transform data with Power Query Online.

How well did you know this?

Not at all

Perfectly

Which workload experience lets you create a Dataflow Gen2?

Data Factory.

How well did you know this?

Not at all

Perfectly

You need to connect to and transform data to be loaded into a Fabric lakehouse. You aren’t comfortable using Spark notebooks, so decide to use Dataflows Gen2. How would you complete this task?

Connect to Data Factory workload > Create a Dataflow Gen2 to transform data > add your lakehouse as the data destination.

How well did you know this?

Not at all

Perfectly

Which tool is best suited for data transformation in Fabric when dealing with large-scale data that will continue to grow?
- Dataflows(Gen2)
- Pipelines
- Notebooks

Notebooks

How well did you know this?

Not at all

Perfectly

Your company is implementing real-time data processing using Spark Structured Streaming in Microsoft Fabric. Data from IoT devices needs to be stored in a Delta table.

You need to ensure efficient processing of streaming data while preventing errors due to data changes.

What should you do?

Study These Flashcards

Use ignoreChanges.

Your company has implemented a Microsoft Fabric lakehouse to store and analyze data from multiple sources. The data is used for generating Microsoft Power BI reports and requires regular updates.

You need to ensure that the data in the Microsoft Fabric lakehouse is updated incrementally to reflect changes from the source systems.

Which method should you use to achieve incremental updates?

Study These Flashcards

Implement a watermark strategy.

You are implementing a data warehouse using Microsoft Fabric. You need to integrate data from multiple sources, including Microsoft Azure Data Lake Storage Gen2 and Microsoft SQL Server databases.

You need to design a process to efficiently load data into tables while ensuring data quality and consistency.

Each correct answer presents part of the solution. Which two actions should you take?

Study These Flashcards

Use Data Factory pipelines for ETL orchestration and T-SQL execution.
Use dataflows to ingest and transform data from Azure Data Lake Storage Gen2.

Your company uses a Microsoft Fabric data warehouse to store frequently updated customer transaction data.

You need to design an ETL process that minimizes load on source systems while ensuring only new or changed data is loaded.

What should you do?

Study These Flashcards

Use Change Data Capture (CDC) for tracking source data changes.

You are implementing a data warehouse solution using Microsoft Fabric. The data warehouse integrates data from multiple sources, including Microsoft Azure Data Lake Storage Gen2 and Microsoft SQL Server databases. You need to transform and load the data into dimensional model tables for reporting purposes.

Design an ETL process that efficiently loads data while ensuring high data quality and consistency.

Each correct answer presents part of the solution. Which three actions should you perform?

Study These Flashcards

Stage data before loading.
Transform data to match the model.
Use Data Factory pipelines.

A company uses a lakehouse architecture with Microsoft Fabric. The data engineering team needs to transform large datasets in Delta format for machine learning.

You need to perform data transformations efficiently using Microsoft Fabric tools.

What should you use?

Study These Flashcards

Apache Spark notebooks.

You are using Microsoft Fabric to manage data ingestion and transformation. Your task is to set up a data pipeline to ingest batch data from multiple CSV files stored in Azure Blob Storage.

You need to ensure the process is efficient and handles errors effectively.

Each correct answer presents part of the solution. Which three actions should you perform?

Study These Flashcards

Specify a location for rejected rows.
Use the COPY statement to load data.
Use wildcards in the path to load multiple files.

You use Microsoft Fabric to manage data across warehouses and lakehouses.

You need to integrate data from a warehouse and a lakehouse into a single table for analysis.

What should you use?

Study These Flashcards

CREATE TABLE AS SELECT (CTAS) statement.

Your organization uses Microsoft Fabric to process real-time IoT data monitoring environmental conditions. The data includes temperature and humidity readings streamed into a Microsoft KQL database.

You need to ensure efficient data ingestion and near real-time querying for reporting.

What should you do?

Study These Flashcards

Implement Spark Structured Streaming to write data to Delta table.

Your organization is implementing a real-time analytics solution using Microsoft Fabric to process streaming data from IoT devices. You need to configure a job to ingest streaming data into a table for near real-time querying. Which three actions should you perform as part of the solution?

Create a Spark Job Definition with a Python script for Structured Streaming. Set the checkpoint location in the streaming job. Use Delta table as a sink for streaming data.

Your organization implements a real-time analytics solution using Microsoft Fabric to process streaming data from IoT devices. You need to ensure efficient processing of streaming data for prompt querying. Each correct answer presents part of the solution. Which two actions should you perform?

Implement Spark Structured Streaming. Set up a SQL analytics endpoint.

Your company is implementing a data warehouse solution using Microsoft Fabric. The data engineering team needs to load data from multiple CSV files stored in Azure Blob Storage into the warehouse. You need to efficiently import data from these CSV files into the warehouse using T-SQL. Each correct answer presents part of the solution. Which three actions should you take?

Specify FIELDTERMINATOR in the COPY statement. Use a SAS token for Azure Blob Storage access. Use COPY statement with wildcard for file path.

Ingest and transform data Flashcards

(28 cards)