This repository is dedicated to mastering the fundamental concepts and hands-on skills required for the Databricks Certified Data Engineer Associate exam. It serves as a comprehensive technical sandbox for building robust data platforms on the Lakehouse Architecture.
To provide a structured environment for practicing ELT workflows, Delta Lake management, and data governance within the Databricks ecosystem.
- Lakehouse Architecture: Implementing Medallion architecture (Bronze, Silver, Gold).
- Delta Lake: Deep dive into ACID transactions, time travel, and performance optimization.
- ELT Pipelines: Developing scalable pipelines using PySpark, SQL, and Delta Live Tables (DLT).
- Unity Catalog: Implementing data governance, PII masking, and fine-grained access control.
- Workflow Orchestration: Managing job clusters and automated deployments.
This repository contains resources aligned with the top-rated preparation course:
- Course: Databricks Certified Data Engineer Associate (Udemy)
- Original Reference: Based on the framework by Derar Alhussein.
To practice these labs in your own environment:
- Open your Databricks Workspace.
- Navigate to Repos in the sidebar.
- Click Add Repo and paste this repository URL.
- Clone the repo to start interacting with the notebooks directly.
Carlos Araque L. Senior Data Engineer with 10+ years of experience delivering enterprise-scale data platforms across Australia and internationally. Renewed Databricks Certified Data Engineer Associate (2026).