Databricks is a data analytics platform powered by Apache Spark for data engineering, data science, and machine learning. This training teaches how to use Azure Databricks to design and build a data lakehouse architecture.
Data Engineering with Azure Databricks
Who should attend this course?
Prerequisites
No prior knowledge of Azure Databricks is required.
Getting Started with Azure Databricks
Azure Databricks allows us to use the power of Apache Spark without the configuration hassle of manually creating and configuring Apache Spark clusters.
In this chapter you will learn how to setup an Azure Databricks environment and work with Databricks workspaces.
- What is Azure Databricks
- Introducing Apache Spark
- Workspaces in Azure Databricks
- Provision Azure Databricks Workspaces
- Navigating Workspaces
- Azure Databricks Configuration and Security
- Azure Databricks Pricing
- LAB: Getting started with Azure Databricks
Azure Storage and Data Lakes
Databricks does not come with it’s own cloud object storage. When you are using Databricks on the Azure platform,
it stores it’s data and metadata in one ore more Azure Data Lake Gen2 storage accounts.
- Storing Data in Azure Databricks
- An introduction to Azure Storage
- Accessing an Azure Storage Account
- Storing Data in a Data Lake
- The Medallion Architecture
- Storage Formats in Data Lakes
- Delta Lake
- Other Open Table Formats
- LAB: Provision an Azure Storage Account
Introduction to the Unity Catalog
Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces.
In this chapter you will learn how to setup and configure a Unity Catalog metastore for your workspaces
- Introduction to the Unity Catalog
- Create a Unity Catalog Metastore
- Creating Unity Catalog Artifacts
- Working with Schemas, Tables and Volumes
- LAB: Setup and configure a Unity Catalog Metastore
Configure Databricks Compute
Databricks compute refers to the selection of computing resources available in the Databricks workspace. Users need access to compute to run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.
Learn about the different types of Compute that can be provisioned in Azure Databricks.
- Apache Spark
- The Databricks Runtime
- Databricks Compute Types
- Provisioned Compute Types
- Databricks Serverless Compute
- Attaching Notebooks to Compute
- Usage Monitoring
- LAB: Creating and Using Databricks Compute
Using Notebooks in Azure Databricks
Using popular languages such as Python, SQL and R data can be loaded, visualized, transformed and analyzed via interactive notebooks.
- The Databricks File System (DBFS)
- Working with Notebooks in Databricks
- Magic Commands
- Databricks Utilities
- The Databricks Assistant
- Working with IPython Widgets
- Working with Databricks Widgets
- Notebook Dashboards
- Scheduling Notebooks
- LAB: Using Notebooks in Azure Databricks
Accessing data in Azure Databricks
There are many ways to access data in Azure Databricks. From uploading small files via the portal over ad-hoc connections up to mounting Azure Storage or data lakes.
The files can also be treated as a table, providing easy access.
- The Spark Framework
- Introduction to Spark DataFrames
- Reading and writing data using Spark DataFrames
- Mounting Azure Blob and Data Lake Gen2 Storage
- Cleaning and Transforming data using the Spark DataFrame API
- Schemas and Tables in Databricks
- Managed vs Unmanaged Tables
- Tables in the Unity Catalog
- LAB: Working with Data in Azure Databricks
Building a Lakehouse using Azure Databricks
Delta Lake is an optimized storage layer that provides the foundation for storing data and tables in a Databricks lakehouse.
Learn how to create, query and optimize Delta Tables in a Databricks lakehouse
- Implementing a Delta Lake
- Working with Delta Tables
- Managing Schema change
- Version and Optimize Delta Tables
- Data skipping and Z-order
- Delta Tables and Change Data Feeds
- Delta Tables and the Unity Catalog
- Securing Tables in the Unity Catalog
- LAB: Building a Lakehouse using Delta Tables
Data Warehousing and Analysis with Databricks SQL
The lakehouse architecture and Databricks SQL Warehouse bring cloud data warehousing capabilities to your data lakes.
A SQL warehouse is a compute resource that lets you run SQL commands on objects within Databricks SQL.
Learn about the available warehouse types and how to query them.
- What are SQL Warehouses?
- SQL Warehouse Compute Types
- Writing queries using the SQL Editor
- Working with Query Parameters
- Add Visualizations to a Query
- Creating and using AI/BI Dashboards
- Using Databricks SQL Alerts
- Monitoring Databricks SQL
- LAB: Using SQL Warehouses
Introducing Databricks Lakeflow
Databricks Lakeflow is a new solution that contains everything you need to build and operate production data pipelines.
It includes native, highly scalable connectors for databases like SQL Server and for enterprise applications like Salesforce and SharePoint.
Users can transform data in batch and streaming using standard SQL and Python using Declarative ETL pipelines.
- Introducing Databricks Lakeflow
- Procedural versus Declarative pipelines
- Overview of Lakeflow Connect
- Overview of Lakeflow Declarative Pipelines
- Overview of Lakeflow Jobs
Lakeflow Connect
Lakeflow Connect offers simple and efficient connectors to ingest data from local files, popular enterprise applications, databases, cloud storage, message buses, and more.
Learn how to efficiently ingest data using managed connectors from database systems like SQL Server and with unmanaged connectors from cloud storage systems or event systemsInd using Auto Loader.
- The Data Ingestion Landscape in Databricks
- What are Managed Connectors
- Ingesting SQL Server Data
- What are Standard Connectors
- Ingesting data from Cloud Object Storage using Auto Loader
- Lakeflow Connect Billing
- LAB: Ingesting data using Lakeflow Connect
Lakeflow Jobs
Lakeflow Jobs is workflow automation for Databricks, providing orchestration for data processing workloads so that you can coordinate and run multiple tasks as part of a larger workflow.
You can optimize and schedule the execution of frequent, repeatable tasks and manage complex workflows.
- What are Lakeflow Jobs?
- Creating Lakeflow Jobs
- Overview of Job Tasks
- Defining Precedence Constraints between Tasks
- Working with Task Parameters and Values
- Conditions and Loops in Lakeflow Jobs
- Working with Job Triggers
- Monitoring Lakeflow Jobs
- LAB: Creating and Running Lakeflow Jobs
Databricks and Power BI
Microsoft Power BI is a business analytics tool that provides interactive visualizations with self-service business intelligence capabilities,
enabling end users to create reports and dashboards.
You can connect Power BI Desktop to your Databricks clusters and Databricks SQL warehouses
- Power BI Introduction
- Connect Power BI Desktop to Databricks using Partner Connect
- Connect Power BI Desktop to Databricks manually
- LAB: Connection Power BI to Databricks

Book your training
Enter your information to confirm your booking.