Visual Studio Code Extension for Databricks

Опубликовано: 11 Октябрь 2024
на канале: Dustin Vannoy
14,955
137

In this video I show how I installed, configured, and tested out the VS Code extension for Azure Databricks. This provides a way to develop PySpark code in your Visual Studio Code IDE and run the code on a Databricks cluster. It works well with Databricks Git Repos so you can keep your team in sync whether they work in VS Code or in Notebooks on the Databricks workspace.

IMPORTANT UPDATE to how I explained this in the video:
The repo used for syncing from local will not be an existing Databricks repo if using the update version (0.3.0+). This is to avoid overwriting work done in the workspace. Instead it creates a Databricks repo to use only for syncing your code used when developing locally with this extension.
Please see the documentation for more details. Specifically, the warning in the documentation is, "After you set the repository and then begin synchronizing, any existing files in your remote workspace repo that have the same filenames in your local code project will have their contents forcibly overwritten. This is because the Databricks extension for Visual Studio Code treats the files in your local code project as the “single source of truth” for both your local code project and its connected remote repo within your Azure Databricks workspace."
https://learn.microsoft.com/en-us/azu...

All thoughts and opinions are my own *

Databricks blog: https://www.databricks.com/blog/2023/...

Download from Marketplace: https://marketplace.visualstudio.com/...