Connect and Read, Write to Azure Blob Storage from Databricks
Summary
In this post I’ll demonstrate how to Read & Write to Azure Blob Storage from within Databricks. Databricks can be either the Azure Databricks or the Community edition.
Cluster Details
Notebook Details
Notebook created with base language: Scala
Locate Azure Storage Details
Note that the following variables will be used throughout. These variables will need to be changed where necessary (Storage Account Name, Storage Account Key and Storage Account Source Container).
%python
# Azure Storage Account Name
storage_account_name = "azurestorage"
# Azure Storage Account Key
storage_account_key = "1Vmkb3OQNgOoVI6MnhwerjhewrjhweFZVZ9w=="
# Azure Storage Account Source Container
container = "source"
# Set the configuration details to read/write
spark.conf.set("fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name), storage_account_key)
Mount the filesystem & Blob Storage
%python
# Check if file exists in mounted filesystem, if not create the file
if "Master.xlsm" not in [file.name for file in dbutils.fs.ls("/mnt/azurestorage")]:
dbutils.fs.mount(
source = "wasbs://{0}@{1}.blob.core.windows.net".format(container, storage_account_name),
mount_point = "/mnt/azurestorage",
extra_configs = {"fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name): storage_account_key}
)
# Unmount filesystem if required
# dbutils.fs.unmount("/mnt/azurestorage")
Check if files exist
%python
# Check is all files exist
dbutils.fs.ls("dbfs:/mnt/azurestorage")
Output command should be similar to this:
Install the xlrd library (optional)
I’m installing this library as I intend to manipulate Excel & CSV files. This is an optional Step.
%python
%pip install xlrd
Read a file (optional)
To ensure that the filesystem and file is accessible – read a file.
%python
df = spark.read.text("/mnt/azurestorage/b_Contacts.csv")
df.show()
Write back to Azure Blob Storage container
%scala
// Write the file back to Azure Blob Storage
val df = spark.read
.option("header","true")
.option("inferSchema", "true")
.csv("/mnt/azurestorage/b_Contacts.csv")
spark.conf.set("fs.azure.account.key.azurestorage.blob.core.windows.net","1Vmkb3OQNgOoVI6MnhwerjhewrjhweFZVZ9w==")
// Save to the source container
df.write.mode(SaveMode.Append).json("wasbs://source@expclarionstorage.blob.core.windows.net/source/")
// Display the output in a table
display(df)
Tags: blob storage databricks
Content from : https://eax360.com/read-write-to-azure-blob-storage-from-databricks/
0 Comments