Use this quick start guide to collect all the information about Databricks Developer for Apache Spark - Python Certification exam. This study guide provides a list of objectives and resources that will help you prepare for items on the Databricks Certified Associate Developer for Apache Spark - Python exam. The Sample Questions will help you identify the type and difficulty level of the questions and the Practice Exams will make you familiar with the format and environment of an exam. You should refer this guide carefully before attempting your actual Databricks Certified Associate Developer for Apache Spark certification exam.
The Databricks Developer for Apache Spark - Python certification is mainly targeted to those candidates who want to build their career in Associate domain. The Databricks Certified Associate Developer for Apache Spark exam verifies that the candidate possesses the fundamental knowledge and proven skills in the area of Databricks Apache Spark Developer Associate.
Databricks Developer for Apache Spark - Python Exam Summary:
Exam Name | Databricks Certified Associate Developer for Apache Spark |
Exam Code | Developer for Apache Spark - Python |
Exam Price | $200 (USD) |
Duration | 90 mins |
Number of Questions | 45 |
Passing Score | 70% |
Books / Training | Apache Spark™ Programming with Databricks |
Schedule Exam | Databricks Webassesor |
Sample Questions | Databricks Developer for Apache Spark - Python Sample Questions |
Practice Exam | Databricks Developer for Apache Spark - Python Certification Practice Exam |
Databricks Apache Spark Developer Associate Exam Syllabus Topics:
Topic | Details | Weights |
---|---|---|
Apache Spark Architecture and Components |
- Identify the advantages and challenges of implementing Spark - Identify the role of core components of Apache Spark™'s Architecture including cluster, driver node, worker nodes/executors, CPU cores, memory - Describe the architecture of Apache Spark™, including DataFrame and Dataset concepts, SparkSession lifecycle, caching, storage levels, and garbage collection - Explain the Apache Spark™ Architecture execution hierarchy. - Configure Spark partitioning in distributed data processing including shuffles and partitions - Describe the execution patterns of the Apache Spark™ engine, including actions, transformations, and lazy evaluation - Identify the features of the Apache Spark Modules including Core, Spark SQL, DataFrames, Pandas API on Spark, Structured Streaming, and MLib. |
20% |
Using Spark SQL |
- Utilize common data sources such as JDBC, files, etc. to efficiently read from and write to Spark DataFrames using SparkSQL, including overwriting and partitioning by column - Execute SQL queries directly on files including ORC Files, JSON Files, CSV Files, Text Files, and Delta Files, and understand the different save modes for outputting data in Spark SQL. - Access different file formats using SparkSQL -2.3 - Save data to persistent tables while applying sorting, and partitioning to optimize data retrieval - Register DataFrames as temporary views in Spark SQL, allowing them to be queried with SQL syntax. |
20% |
Developing Apache Spark™ DataFrame/DataSet API Applications |
- Manipulate columns, rows, and table structures by adding, dropping, splitting, renaming column names, applying filters, and exploding arrays - Perform data deduplication and validation operations on DataFrames - Perform aggregate operations on DataFrames such as count, approximate count distinct, and mean, summary - Manipulate and utilize Date data type such as Unix epoch to date string, extract date component - Combine DataFrames with operations such as Inner join, left join, broadcast join, multiple keys, cross join, union, union all - Manage input and output operations by writing, overwriting, and reading DataFrames with schemas - Perform operations on DataFrames such as sorting, iterating, printing schema, and conversion between DataFrame and sequence/list formats - Create and invoke user-defined functions with or without stateful operators including StateStores - Describe different types of variables in Spark including broadcast variables and accumulators - Describe the purpose and implementation of broadcast joins |
30% |
Troubleshooting and Tuning Apache Spark DataFrame API Applications |
- Implement performance tuning strategies & optimize cluster utilization including partitioning, repartitioning, coalescing, identifying data skew, and reducing shuffling - Describe Adaptive Query Execution (AQE) and its benefits. - Perform logging and monitoring of Spark applications - publish, customize, and analyze Driver logs and Executor logs to diagnose out-of-memory errors, cluster underutilization, etc. |
10% |
Structured Streaming |
- Explain the Structured Streaming engine in Spark, including its functions, programming model, micro-batch processing, exactly-once semantics, and fault tolerance mechanisms - Create and write Streaming DataFrames and Streaming Datasets including the basic output modes and output sinks - Perform basic operations on Streaming DataFrames and Streaming Datasets such as selection, projection, window and aggregation - Perform Streaming Deduplication in Structured Streaming, both with and without watermark usage |
10% |
Using Spark Connect to deploy applications |
- Describe the features of Spark Connect - Describe the different deployment mode types (Client, Cluster, Local) in Apache Spark™ environment |
5% |
Using Pandas API on Apache Spark |
- Explain the advantages of using Pandas API on Spark - Create and invoke Pandas UDF |
5% |
To ensure success in Databricks Apache Spark Developer Associate certification exam, we recommend authorized training course, practice test and hands-on experience to prepare for Databricks Certified Associate Developer for Apache Spark - Python exam.