Terms Flashcards

1
Q

What is heart of lake house

A

Delta lake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is delta lake

A

An open approach to bringing data management and governance to data lakes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Benefits of delta lake

A

Better reliability
48x faster data processing with indexing
Data governance at scale with fine grained access control lists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Benefits of data bricks

A

Simple data only needs to exist once
Open based on open source
Collaborative. Can share across data engineering data analytics data science data applications. No longer siloed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Lake house exists on top of

A

Data lake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Control plane

A

Back end services that data bricks managed in its own cloud account

Notebook commands and workplace configurations stored here

Encrypted at rest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data plane

A

Where data is processed
Resides in your own cloud account
Hooks into data bricks and other proprietary systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Clusters

A

A set of computational resources and configurations on which your run data engineering, data science and data analytics workload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Clusters live

A

In the data plane, the cluster management is in control plane

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Clusters are

A

Made up of one or more virtual machine instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Driver

A

Part of a cluster, coordinators activities of executors

Distributes workload across worker nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Executor

A

Runs tasks composing a spark job

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

All purpose clusters

A

Analyze data collaboratively using interactive notebooks
Create clusters from the workspace or api
Retains up to 70 clusters for up to 30 days
Can manually stop and start
Multiple users can share them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Job clusters

A

Run automated jobs
The databricks job scheduler creates job clusters when running jobs
Created by a schedule and terminated when job is complete
Cannot restart a job cluster
Retains up to 30 clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is job cluster retention

A

30 days unless manually pinned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Notebooks

A

Primary way to interact with code

17
Q

Notebook languages

A

Sql
Python
ArA
Scala

18
Q

What can go in notebooks

A

Plots
Images
Markdown texcode

19
Q

Do you need to restart cluster if you edit it

A

Depends on edit

20
Q

What resources returned when cluster is terminated

A

Associates vm purged
Operational memory purged
Attached volume storage deleted
Network connection between nodes removed

21
Q

When are clusters configurations terminated

A

Idle for 30 days

After 70 terminations

22
Q

What should I do with results from a cluster job

A

If it needs to persist move it to permanent storage otherwise it will be removed with the cluster

23
Q

Does cluster purge affect code

A

No

24
Q

Do notebooks need a cluster

A

Yes

25
Q

Can you mix languages

A

Yes, even if you set a default.

26
Q

What is the magic databricks symbol

A

%

27
Q

Where can you restart a cluster

A

From Custer menu

From Custer drop down In notebook.

28
Q

Magic command to run one notebook from o other

A

%run

29
Q

Databricks Utica name

A

Dbutils.fs.lis

30
Q

Databricks versioning is inmjtable

A

False. Attached to a notebook copy the notebook history is lost.