Prep the Data Flashcards

Extract, profile, clean, transform, and load data from different sources (52 cards)

1
Q

What is power query?

A

It is the query editor in power bi where we shape and transform data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the underlying code for the query editor?

A

M code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In power query what does the column profiling default to be based on?

A

Top 1000 rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does power bi store data

A

in a structured column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 3 types of structured columns?

A

List
Record
Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a list structured column?

A

a column where each row holds a list of values like [1,2,3] it is the simplest structured column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a record structured column?

A

a column where each row holds a dictionary of key-value pairs like [FirstName = Adam, LastName = Smith, HireDate=1/1/21]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a table structured column?

A

a column where each row holds a table of data, normally the current row has a one to many relationship with another table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the steps for connecting to a JSON file?

A
  1. Connect to source JSON file
  2. Convert the the source to a table
  3. Use double arrow to expand the attributes into the columns
  4. Convert the data types
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the steps for connecting to SQL Server Database?

A
  1. Enter Server Name
  2. Optionally enter database name
  3. Choose between Import and DirectQuery storage mode
  4. Optionlly open advanced options and fill out command time out and sql statement (sql statement requires database)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the steps to connecting to an Excel workbook locally?

A

Get data > from excel workbook > choose file > select sheet or table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the steps to connect to a folder

A

Get Data > from folder > choose folder > transform data > combine files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a PBIDS file?

A

Power BI Data Source file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Power BI Data Source Files

A

they are files that store data source connection settings making data sources easier to share with others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How many data sources can a Power BI Data Source file support?

A

They only support a single data source connection per file. So think of connecting to a single db or a single excel work book

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens when you open a power bi data source file for the first time?

A

You will be prompted for authentication credentials since the file does not store that information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you create a PBIDS file?

A

File
Options and Settings
Data Source Settings
Choose your data source
Click Export PBIDS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you connect to a data source using a PBIDS file?

A

Just click the file and open

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the dataverse?

A

a cloud data storage platform for the Microsoft power platform with both predefined tables for typical use cases for an organization and custom tables that you can create and populate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you connect to a dataverse instance?

A

Click Data verse icon
Type the environment domain
Choose the connectivity mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is SSAS?

A

Sql Server Analysis Services, it holds multiple tabular models

22
Q

What are the two methods for connecting to SSAS models?

A

Import and Live Connection

23
Q

If you have a live connection what is needed to publish to the power bi service?

A

a data gateway

24
Q

What is SharePoint?

A

An internal website to store and organize information

25
How is SharePoint structured?
At the highest level you have a sharepoint site. Inside a site you have multiple items like a document library, lists, pages, one note, etc. The document library is an item that can hold folders and files for the site. A site can have multiple document libraries.
26
What do you need to do to connect to data that is in a SharePoint document library folder. This SharePoint site has multiple document libraries
Get Data > SharePoint Folder Enter the Root URL Click Transform Filter to the correct document library
27
whats the difference between live connection and direct query?
Live Connection is a way of connecting a power bi report to an existing semantic model. (Power BI Semantic Model in the service, AAS, SSAS) DirectQuery is a method you can use to connect a datasource adding it to a semantic model you are building
28
What happens when you connect to a sharepoint folder
when you are using the connect to a sharepoint folder connector after you give it a site you access all document libraires on the site
29
How do you get a particular sharepoint folder?
After connecting to the site you hit transform data and filter to the folder path that has the files you want. You then run the combine files action on the binary column by hitting the double down arrow since you are connecting to a folder instead of a single file
30
What are the three storage modes in power bi?
Import Storage DirectQuery Storage Dual Storage
31
What is Import Storage Mode?
When data is imported and stored in-memory in power bi, queries are executed on the cached data
32
What is DirectQuery Storage?
When data is not stored in power bi rather just the connection and queries are execute on the source
33
What is Dual Storage?
When data can be either imported or can be a direct query. Power BI decides at run-time on the most efficient way to get the data
34
What is true of direct query and dual storage modes
They both have the same functional limitations
35
When to use DirectQuery Storage mode
1. when dataset is too large to store in memory (>1GB) 2. Source data changes frequently and reports must show most recent data 3. Company policy states data can only be access from the source
36
Where does RLS happen in Import Storage Mode?
User-level roles are defined in the model
37
Where does RLS happen in DirectQuery Storage Mode?
User-Level roles are defined on the data sources that support it
38
Why do you use data source parameters in power query?
to change data source values dynamically, esp if you have DEV, UAT, PROD envs for a database for exmaple
39
If you need a location for files that need to have a scheduled refresh using Microsoft 365 credentials where should you put it?
OneDrive Business - uses AD creds SharePoint Teams Site - uses AD creds note: local file uses no creds and personal one drive requires user's creds of the corresponding Microsoft account
40
If you have a timeout expired error message when connecting to SQL server what should you do?
divide the sql statement into separate data sources to reduce the amount of processing. Grouping, Aggregation and nested queries would not help in reducing the amount of processing from the query
41
What setting in power query shows the percentage of Valid, Error, and Empty values of a column?
Column quality
42
What setting in power query shows the min, max, count values of a column?
Column profile > column statistics
43
What setting in power query graphs the number of times a value is in a column?
Column profile > Value distribution
44
How should you identify outliers in a column in power query?
Look at the top and bottom values in the value distribution
45
When connecting to a file in a sharepoint folder what is the table storage mode?
Import, it can only be import
46
What is a pivot?
Going from cross tab to tall and skinny (attribute/value pair
47
What is unpivot?
Going from tall and skinny (attr/value pair) to cross tab
48
What is transpose
taking a set rows and making them column headers (like another crosstab)
49
What are the three data profiling tools in power query?
Column Quality Column Distribution Column Profile
50
What does Column Quality show?
% of Valid, Error, and Empty above the columns
51
What does Column Distribution show?
Bar chart of distribution with distinct and unique counts above each column
52
What does Column Profile show?
A column statistics and value distribution chart underneath the dataset