Store Azure Databricks logs into Azure Data Lake Gen2

Photo by Joshua Sortino on Unsplash

One of the common question I receive from my customers who uses Azure Databricks (ADB) is how can I store my Azure Databricks notebook execution/error logs to Azure Data Lake Storage (ADLS) Gen2 storage account. While you can view the Spark driver and executor logs in the Spark UI, Databricks can also deliver the logs to ADLS Gen2 destination. See the following examples.


  • Azure subscription with sign in access to the Azure Portal
  • Azure Databricks workspace
  • Azure Data Lake Storage Gen2 account

Generate Azure Databricks personal access token

  • In the Azure portal, search for Azure Databricks and open Azure Databricks workspace
  • Click the user profile icon in the upper right corner of your Databricks workspace.
  • Click User Settings.
  • Go to the Access Tokens tab.
  • Click the Generate New Token button.
  • Optionally enter a description (comment) and expiration period.
  • Click the Generate button.
  • Copy the generated token and store in a secure location.

Mount Azure Data Lake storage Gen2 containers to DBFS

  • Follow the instruction provided at here to mount the ADLS Gen2 container to DBFS

Create a cluster with logs delivered to ADLS Gen2 location

The following cURL command creates a cluster named cluster_log_dbfs and requests Databricks to sends its logs to dbfs:/mnt/logs with the cluster ID as the path prefix.

curl -X POST -H ‘Authorization: Bearer <Access Token>’ ‘Content-Type: application/json’ -d \


“cluster_name”: “cluster_log_dbfs”,

“spark_version”: “7.3.x-scala2.12”,

“node_type_id”: “Standard_DS3_v2”,

“num_workers”: 1,

“cluster_log_conf”: {

“dbfs”: {

“destination”: “dbfs:/mnt/logs”



}’ https://<databricks-instance>/api/2.0/clusters/create

Replace <databricks-instance> with the workspace URL of your Databricks deployment.

The response should contain the cluster ID: The deployment may take few mins. You can validate by going to ADB workspace à Clusters. Here you will find cluster state in pending.

After cluster creation, Databricks syncs log files to the destination every 5 minutes. It uploads driver logs to dbfs:/mnt/logs/1111–223344-abc55/driver and executor logs to dbfs:/logs/1111–223344-abc55/executor.

Hope this helps you to save ADB logs into ADLS Gen2 account for future audits and troubleshooting purpose.

Disclaimer: I work for @Microsoft Azure Cloud & my opinions are my own.




Enabling Organizations with IT Transformation & Cloud Migrations | Principal CSM Architect at IBM, Ex-Microsoft, Ex-AWS. My opinions are my own.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Singleton Design Pattern…

Sending Gmail on AWS Lambda via Python

What I was doing wrong — dependency management and monorepo

The Best Way To Accept Payments Online

Starting ‘The JD Suite’

Dissecting the Cloudy Cloudtrail logs | Simple way to setup multiple projects in local environment in Mac/Windows

Proof of Concept for a Real-Time Operation Analytics SCM Platform

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
kapil rajyaguru

kapil rajyaguru

Enabling Organizations with IT Transformation & Cloud Migrations | Principal CSM Architect at IBM, Ex-Microsoft, Ex-AWS. My opinions are my own.

More from Medium

SCD Delta tables using Synapse Spark Pools

Challenges with data lakes

Realtime Streaming Data Pipeline On Azure Cloud Platform

How to connect to Azure Synapse in Azure Databricks

Azure Databricks