spa_rk_db Flashcards

1
Q

topandas

A

spark.toPandas()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

from pandas

A

spark.createDataFrame(pandasdf, schema)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

read and write

A

spark. read()
spark. write()
spark. read.jdbl(url=database jdbc url path, table=”tablename”)
spark. write.mode(“overwrite/append”).jdbl(url=database jdbc url path, table=”tablename”)

#csv write
#save without changing the schema
??
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

show

A

spark.show(no of rows to show as integer)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

filter based on condition

A

df. filter(“experiment_id = 1”).show()
df. filter((df.col1 == condition) & (df.col2 == condition))
df. filter((df.col1 == condition) & (df.col2 == condition)).select(colname)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

replace values on condition

A

df.replace(old,new)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sparkdf select

A

sparkdf.select(“col name”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

select row

A

df = spark.createDataFrame([[1,2], [3,4]], [‘a’, ‘b’])

n=1
df.select(df.columns[n]).show()
\+---+                                                                           
|  b|
\+---+
|  2|
|  4|
\+---+
df.drop(df.columns[n]).show()
\+---+
|  a|
\+---+
|  1|
|  3|
\+---+

df.select(df.columns[:n] + df.columns[n+1:]).show()

\+---+
|  a|
\+---+
|  1|
|  3|
\+---+
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

import pyspark types to construct schema

A

from pypark.sql.types import *

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

run sql query

A

spark.sql(“select * from table –limt 100”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

connect to an external database using pyodbc

A

import pyodbc

Driver=’{ODBC Driver 17 for SQL Server}’
Server= sql_server
Database=sql_db

cnxn = pyodbc.connect(‘DRIVER=’+Driver+’;SERVER=’+Server.split(“:”)[0]+’;DATABASE=’+Database+’;UID=’+sql_user+’;PWD=’+sql_pass)

cnxn = pyodbc.connect(‘DRIVER=’+Driver+’;SERVER=’+Server.split(“:”)[0]+’;DATABASE=’+Database+’;UID=’+sql_user+’;PWD=’+sql_pass)

cursor = cnxn.cursor()

cursor. execute(“update Experiment set experimentStatus = ‘Completed’ where experimentId = {}”.format(experiment_id))
cnxn. commit()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

get secrets in databricks

A

dbutils.secrets.get(‘nlpexplorer-secret-scope ‘,’sql-username ‘)

first create the key:value pair in key vault

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

create secrets in databricks

A

https://docs.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

databricks jdbc url

A

Obtain from database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

curl comments for installing sql driver

A

%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

change datatype in pyspark

A

df2 = df.withColumn(“age”,col(“age”).cast(StringType())

https://sparkbyexamples.com/pyspark/pyspark-cast-column-type/

17
Q

change datatype in pyspark.

With column
with select
with select exp
with spark sql

A

// Convert String to Integer Type

df. withColumn(“salary”,col(“salary”).cast(IntegerType))
df. withColumn(“salary”,col(“salary”).cast(“int”))
df. withColumn(“salary”,col(“salary”).cast(“integer”))

// Using select
df.select(col("salary").cast("int").as("salary"))

//Using selectExpr()

df. selectExpr(“cast(salary as int) salary”,”isGraduated”)
df. selectExpr(“INT(salary)”,”isGraduated”)

//Using with spark.sql()

spark. sql(“SELECT INT(salary),BOOLEAN(isGraduated),gender from CastExample”)
spark. sql(“SELECT cast(salary as int) salary, BOOLEAN(isGraduated),gender from CastExample”)

18
Q

databricks dbutils to get parameters

A

dbutils.widgets.text(“ExperimentId”,”DefaultName”)

experiment_id = dbutils.widgets.get(“ExperimentId”)

19
Q

call db notebook from another notebook

A

dbutils.notebook.run(“notebook name”,timeout, {})

20
Q

send results out of databricks after run

A

dbutils.notebook.exit(“value”)

import json
dbutils.notebook.exit(json({key:value}))

21
Q

add arbitrary column to spark

A

from pyspark.sql.functions import lit

df_with_x4 = df.withColumn(“x4”, lit(0))
df_with_x4.show()

22
Q

get no of rows

A

sparkdf.count()

23
Q

get column

A

sparkdf.column

24
Q

turn databricks notebook cell to run sql commands

A

%sql

sql query

25
``` #http requests Using job id ```
1. create a job in databricks using jobs 2. copy job id 3. in postman create post request 4. add authorisation token (get it from user settings) 5. . add API request https: //adb-3184120056443208.8.azuredatabricks.net/api/2.0/jobs/run-now 6. set body to json 7. add json strings ``` { "job_id":5670, "notebook_params":{ "lob": "4", "dataSource": "CAP", "customerGeos": "107,1", "textFields": "9", "channel": "13", "cssProducts": "16", "issueCodes": "37056"}, "notebook_task": {"notebook_path": "/Nlp_explorer/Casecount" }} ```
26
``` #http requests Using run id ```
https://adb-3184120056443208.8.azuredatabricks.net/api/2.0/jobs/runs/submit ``` { "existing_cluster_id": "0510-115652-stop848", "notebook_params":{ "lob": "4", "dataSource": "CAP", "customerGeos": "107,1", "textFields": "9", "channel": "13", "cssProducts": "16", "issueCodes": "37056"}, "notebook_task": {"notebook_path": "/Nlp_explorer/Casecount" }} ```
27
get for run id
https://adb-3184120056443208.8.azuredatabricks.net/api/2.0/get { "run_id": 5670 }
28
create empty dataframe
spark.createDataFrame([],["col1", "col2"])
29
how to check if df is empty
df.rdd.isEmpty()
30
read csv with header
spark.read.format("csv).option("header"."true").load(filepath) #medium article has more on read and write with jsons also
31
overwrite table without changing schema
trans something............ forgot
32
storing the password or key in Azure Key Vault
storing the password or key in Azure Key Vault as a secret instead in a notebook (SQL query). dbutils.secrets.get(scope = "bitools_secrets", key = "blobkey")