2A. Get data in Power BI Flashcards
What is a “flat” file?
A type of file that has only one data table and every row of data is in the same structure. The file doesn’t contain hierarchies. For example csv, xlsx
What are some examples of locations from which to import flat files into Power BI? And what are some differences between them?
Local - only use for data that doesn’t change, since changes to the original file will not be reflected in Power BI
OneDrive for Business - better for keeping Power BI synchronized with changes in the data
OneDrive - Personal
SharePoint - Team Sites - better for keeping Power BI synchronized with changes in the data
What are three places where you can change the location of a source file?
Data source settings
Query settings
Advanced Editor
What are the two main data connectivity modes when connecting to SQL databases?
- Import (selected by default, recommended) and
- DirectQuery.
Mostly, you select Import.
What are the three login options when connecting to an SQL database?
- Windows - Use your Windows account (Azure Active Directory credentials).
- Database - Use your database credentials. For instance, SQL Server has its own sign-in and authentication system that is sometimes used. If the database administrator gave you a unique sign-in to the database, you might need to enter those credentials on the Database tab.
- Microsoft account - Use your Microsoft account credentials. This option is often used for Azure services.
What are two ways of selecting data to import from an SQL database?
- Using the navigator to click and select tables.
- Writing an SQL query (under advanced options on the SQL Server database window)
Where do you change data source settings, and why might you have to do so from time to time?
On the Home tab, select Transform data, and then select the Data source settings option. You can also change data source settings from inside power query and in advanced settings
This action is often required due to a security policy within the organization, for example, when the password needs to be updated every 90 days. You can change the data source, edit permissions or clear permissions.
You can change the data source, edit permissions or clear permissions.
What does SQL stand for?
SQL stands for Structured Query Language and is a standardized programming language that is used to manage relational databases and perform various data management operations.
What are some operations you can perform on the data before loading it into power query, when you load data using an SQL statement?
SQL is beneficial because it allows you to load only the required set of data by specifying exact columns and rows in your SQL statement and then importing them into your semantic model. You can also join different tables, run specific calculations, create logical statements, and filter data in your SQL query.
What is the structure of of an SQL statement?
The SQL query starts with a Select statement, which allows you to choose the specific fields (think of these as columns) that you want to pull from your database.
FROM specifies the name of the table that you want to pull the data from.
All queries should also have a WHERE clause. This clause will filter the rows to pick only filtered records that you want.
For example:
SELECT
ID
, NAME
, SALESAMOUNT
FROM
SALES
WHERE
OrderDate >= ‘1/1/2020’
Why is it best to avoid filtering data imports with SQL statements directly in Power BI, what is the preferred alternative, and why is it preferred?
It is preferable to write SQL statement queries as a “view,” an object in a relational database. Power BI can then retrieve data using the view object, which allows Power Query to participate in Query Folding, which optimizes data retrieval according to how the data is being used later.
What does the wildcard character (*) do in an SQL statement, and why is it not recommended?
The wildcard character (*) will import all columns within a table. This method isn’t recommended because it will lead to redundant data in your semantic model, which will cause performance issues and require extra steps to normalize your data for reporting.
What are Dynamic reports, and what are their benefits?
They are reports in which the data can be changed by a developer according to user specifications. They are valuable because a single report can be used for multiple purposes. If you use thems, you’ll have fewer individual reports to create, which will save organizational time and resources.
What are two ways of creating dynamic reports?
- Use a parameter (good if filtering tables by one value at a time)
- Use a function (for filtering on multiple values at a time)
How do you use a parameter to create a dynamic report?
On the Home tab in Power Query, select Manage parameters > New parameter.
Then right click the table query, select advanced editor, and replace the existing value in the WHERE execute statement with the parameter name (i.e. the parameter instead of the value/string)
Now you can edit the value in the parameter, and the query will show only the data matching that parameter
How do you use a function to create a dynamic report?
- create a spreadsheet with one column. The values in this column will be the ones used as a filter
- load that s/s into power query
- right click the main table query (the one you want to filter) and click Create Function…
- select the s/s you just loaded with the “filter values,” click on the Add Column tab, and then Invoke Custom Function
- then load the individual columns you want to load (it will look like you have merged two queries)
What is a NoSQL database (also referred to as non-SQL, not only SQL or non-relational)?
A flexible type of database that doesn’t use tables to store data.
What do you have to do when accessing an Azure end-point for the first time from Power BI?
Make sure that you enter your account key. You can find this key in the Primary Key box in the Read-only Keys blade of your Azure portal.
What do you need to do with data from no-SQL databases before being able to use it in Power BI?
You need to normalize it. After connecting to a database in the Navigator pane of the Get Data process, selected Edit instead of loading, and you’ll be able to turn the unstructured data into data in columns. It will likely first look like a table with only one “merged” column that you can then expand and then transform as usual.
How do you connect to data in an application, for example Share Point?
By referencing the SharePoint site URL. You only need the top-level url, and can select the specific data to load in the navigator pane in the next step.
What are the three standard types of storage modes you can choose from when loading data into Power BI? For which connection type are the options different?
For most connections:
- Import
- DirectQuery
- Dual (Composite)
For (Azure) Analysis Services
- Import
- Connect live
Which is the most common method of storing/importing data, and why is it sometimes not used?
Most often Import is used.
But DirectQuery is sometimes used, for example if security concerns means data is not allowed to be stored other than in the original data-file. Some semantic models may also be too large to load into a Power BI file. DirectQuery also ensures that you will always view up-to-date data.
What is Import storage mode, and what is its features?
This mode allows you to create a local Power BI copy of your semantic models from your data source. You can use all Power BI service features with this storage mode, including Q&A and Quick Insights. Data refreshes can be scheduled or on-demand. It is the default for creating new Power BI reports.
What is DirectQuery storage mode, and what is its features?
This option is useful when you don’t want to save local copies of your data because your data won’t be cached. Instead, you can query the specific tables that you’ll need by using native Power BI queries, and the required data will be retrieved from the underlying data source. Essentially, you’re creating a direct connection to the data source.
Using this model ensures that you’re always viewing the most up-to-date data, and that all security requirements are satisfied.
Additionally, this mode is suited for when you have large semantic models to pull data from. Instead of slowing down performance by having to load large amounts of data into Power BI, you can use this mode to create a connection to the source, solving data latency issues as well.