load data Flashcards
(8 cards)
… (existing content unchanged) …
How can you load data from existing Databricks tables in Lakeflow?
Use a CREATE OR REFRESH MATERIALIZED VIEW
or define a DLT table that reads from the existing table. This supports further transformation within the pipeline.
How do you load data from cloud object storage using Auto Loader in SQL?
Use the read_files
function in a CREATE OR REFRESH STREAMING TABLE
query, specifying the file path and format.
How do you ingest data from Kafka into a streaming table?
Use read_kafka
in a SQL CREATE OR REFRESH STREAMING TABLE
statement, providing Kafka bootstrap servers and topic name.
Can Lakeflow ingest data from systems like PostgreSQL?
Yes, using Python and Spark DataFrame readers (e.g., .format("postgresql")
with connection options), you can read external database tables.
How can you ingest small or static files directly into a materialized view?
Use read_files
in a CREATE OR REFRESH MATERIALIZED VIEW
SQL command to load the file contents.
How do you ignore updates and deletes in a streaming table source?
Use the skipChangeCommits
option in spark.readStream.option()
to ignore change operations from the source table.
How do you securely load data from Azure Data Lake using secrets?
Store credentials in Databricks Secrets and use them in the spark.hadoop
config. Then define a DLT table using Auto Loader to read from the ADLS path.