IT Training

Data Engineering with Azure Databricks

Databricks is a data analytics platform powered by Apache Spark for data engineering, data science, and machine learning. This training teaches how to use Azure Databricks to design and build a data lakehouse architecture.

Who should attend this course?
Prerequisites

No prior knowledge of Azure Databricks is required.

    The Modern Data Warehouse

  • The cloud requires to reconsider some of the choices made for on-premises data handling. This module introduces the different services in Azure that can be used for data processing, and compares them to the traditional on-premises data stack. It also provides a brief intro in Azure and the use of the Azure portal.
  • Getting Started with Azure Databricks

  • Azure Databricks allows us to use the power of Apache Spark without the configuration hassle of manually creating and configuring Apache Spark clusters. In this chapter you will learn how to setup an Azure Databricks environment and work with Databricks workspaces.
  • Using Notebooks in Azure Databricks

  • Using popular languages such as Python, SQL and R data can be loaded, visualized, transformed and analyzed via interactive notebooks.
  • Storing data in Azure

  • This module discusses the different types of storage available in Azure Storage and how to configure them for Big Data Analytics. Also some of the tools to load and manage files in Azure Storage are covered.
  • Accessing data in Azure Databricks

  • There are many ways to access data in Azure Databricks. From uploading small files via the portal over ad-hoc connections up to mounting Azure Storage or data lakes. The files can also be treated as a table, providing easy access.
  • Building a Lakehouse using Azure Databricks

  • Delta Lake is an optimized storage layer that provides the foundation for storing data and tables in a Databricks lakehouse. Learn how to create, query and optimize Delta Tables in a Databricks lakehouse.
  • Delta Live Tables and Data Pipelines

  • You can use Databricks for near real-time data ingestion and processing. Most incremental and streaming workloads on Databricks are powered by Structured Streaming, including Delta Live Tables and Auto Loader. The main focus of this chapter is on how you can incrementally load data in a Lakehouse.
  • Data Warehousing and Analysis with Databricks SQL

  • The lakehouse architecture and Databricks SQL Warehouse bring cloud data warehousing capabilities to your data lakes. A SQL warehouse is a compute resource that lets you run SQL commands on objects within Databricks SQL. Learn about the available warehouse types and how to query them.
  • Databricks and Power BI

  • Microsoft Power BI is a business analytics tool that provides interactive visualizations with self-service business intelligence capabilities, enabling end users to create reports and dashboards. You can connect Power BI Desktop to your Databricks clusters and Databricks SQL warehouses

Practical information

Duration

Publish

Languages

EN

Price

€ 1.750 + 21% VAT

Location

Classroom Courses

Schedule

Guaranteed to run

Sessions in English
17-20/02/2025Book
14-17/04/2025Book
30/06/2025&01-03/07/2025Book
22-25/09/2025Book
01-04/12/2025Book
Sessions in Dutch
Contact us for more infoBook
Sessions in French
Contact us for more infoBook

Share this course on

Book your training

Enter your information to confirm your booking.

    Prerequisite test

    Looking for a tailor made solution?