DP-600 Part 1 Flashcards
(88 cards)
How to use on-premise data gateway
- Install the data gateway on the on-premise server
- In Fabric, create a new on-premise data gateway connection
- Use the gateway in a dataflow or data pipeline to get data into Fabric
Fabric Admin Portal - Tenant Settings
- Allows users to create fabric items
- Allows users to create workspace
- Whole host of security features
- Allow service principal access to Fabric APIs
- Allow Git integration
- Allow Copilot
Some settings can be enabled for entire organization, specific security groups, all except for certain security groups. Other settings are either enabled or disabled
Fabric Admin portal - Capacity Settings
- Create new capacities
- Delete capacities
- Management capacity permissions
- Change the size of the capacity
Where to increase the SKU of the capacity
Go to Admin portal > Capacity settings and click through to Azure to update your capacity
Structure of a Fabric implementation and where admin happens
- Tenant: Fabric Admin Portal (Tenant Settings)
- Capacity: Azure Portal and Fabric Admin Portal (Capacity Settings)
- Workspace: Workspace settings (in the workspace)
- Item level: data warehouse, lakehouse, semantic model
- Object level: dbo.customer, VW.MySQL View
Capacity adminstration tasks in Fabric capacity settings
- Enabled disaster recovery
- View capacity usage report
- Define who can create workspaces
- Define who is a capacity administrator workspace creation permissions
- Update Power BI connection settings from/to this capacity
- Permit workspace admins to size their own custom Spark tools based on workspace compute requirements
- Assign workspaces to the Capacity
Workspace administrator settings
- Edit license for the workspace (Pro, PPU, Fabric, Trial,etc.)
- Configure Azure connections
- Configure Azure DevOps connection (Git)
- Setup workspace identity
- Power BI settings
- Spark settings
What is xmla endpoint
The XMLA endpoint is essentially a gateway that lets external tools communicate with the data stored in Microsoft Fabric. This feature is particularly useful for those who need more control or prefer working with tools they are already comfortable with, like SSMS or Excel.
Difference XMLA with Fabric loading options like mirroring, copy data acitivity, etc.
XMLA: ETL (transform data using tools you’re familiar with outside of Microsoft Fabric and then load it into the lakehouse)
Others: ELT
load raw data into Fabric and then transform it using tools within the platform
Dataflow vs Data Pipeline
- Dataflows are for straightforward ETL processes and data preparation, with a focus on user-friendly transformation of data for analytics.
- Data Pipelines are for more complex orchestration and management of data workflows, handling multiple steps and dependencies, often in an ELT scenario. They are more suited for technical users who need to automate and manage comprehensive data workflows.
Shortcut vs mirroring in Fabric
Shortcut: reference to a dataset that exists in OneLake
Mirroring: process that creates a replicated copy of data in OneLake. Useful when you need a local copy of data for performance reasons, redundancy, or to ensure that your operations are not impacted by the availability or performance of the original data source.
What is Azure Blob Storage
cloud-based storage service provided by Microsoft Azure, designed to store large amounts of unstructured data. “Blob” stands for Binary Large Object and refers to data types like images, videos, documents, and other types of unstructured data.
What determines the number of capacities required
- Compliance with data residency regulations (e.g. maybe if the data must be located in the EU, then another data must be in the US, then you must separate the capacity)
- Billing preference within the organization
- Segregating by workload type (i.e. Data Engineering, Business Intelligence )
- Segregating by department
What determines the required sizing of a capacity
- Intensity of expected workloads (high volume of data ingestion)
- Heavy data transformation (i.e. Spark)
- The higher the SKU, the more expensive (budget)
- Can you afford to wait?
- Access to F64+ features or not? >F64: co-pilot
Options of data ingestion
- shortcut: ADLS Gen 2, Amazon S3, Google Cloud Storage or dataverse
- database mirroring: Azure SQL, Azure Cosmos DB, Snowflake
- ETL - dataflow: On-premise SQL
- ETL - data pipeline: On-premise SQL
- ETL - notebook
- Eventstream: real-time events
Other: ETL by dataflow, data pipeline or notebook
*above shows the preferred options and possibilities that are open
Data Ingestion requirement
Location of the data
- On-premise data gateway: if data is living in on-premise sql
- Vnet Data gateway: if data is living in azure virtual network or private endpoint
- Fast copy
Volume of the data
- low (megabytes per day):
- medium (gigabytes per day): fast copy and staging
- high (many GB or terabytes per day): fast copy and staging
Difference between Virtual network data gateway and On-premises data gateway
Virtual network data gateway: used when all your data is stored within Azure Virtual Network (VNet). Enables secure connections between Azure services (like Power BI) and data sources that are inside an Azure VNet.
On-Premises data gateway: data is stored outside of Azure like on your local network or in another cloud provider’s environment (AWS, Google Cloud), if you have direct network connectivity like VPN or ExpressRoute to these environment. Enables secure connections between cloud services (like power BI) and data sources that are not within azure
Data Storage Options
- Lakehouse
- Warehouse
- KQL database
Deciding factors for data storage
Data type:
- Lakehouse: structured, semi-structured and or unstrcutred
- Relational /strctured: lakehouse or warehouse
- Real-time/streaming - KQL data warehouse
Skills exist in the team:
- T-SQL: data warehouse
- Spark: lakehouse
- KQL: KQL database
The admin portal can only be accessed by
Someone with a Fabric license and either a:
- Global admin
- Power platform admin
- Fabric admin
Toby creates a new workspace with some Fabric items to be used by Data Analysts. Toby creates a new security group called Data analyst. He includes himself as a member of this security group. Toby gives the data analysts security a viewer role in the workspace. What workspace role does Toby have?
Admin. Since he is the creator of the workspace, his admin role supersedes the viewer role
Toby wants to delegate some of the management responsibilites in the workspace. He wants to give this person the ability to share content within the workspace, invite new Contributors to the workspace but no add new Admins to the workspace. Which role should Toby give this person?
Member
You have admin role in a workspace. Sheila is a data engineer in your team. Currently she has no access to the workspace. Sheila needs to update a data transformation script in a PySpark notebook. The script gets data from a Lakehouse table, cleans it and then writes it to a rable in the same Lakehouse. You want to adhere to the principle of least privilege. What actions should you take to enable this?
Share the lakehouse data with ReadAll Spark Data permission and share the Notebook with Edit permission
You have admin role in a workspace. You want to pre-install some useful Python packages to be used across all notebooks in your workspace. How do you achieve this?
Create an environment, install the packages in the environment and then go to workspace settings > spark settings and set the default environment.