Databricks Developer for Apache Spark - Python Certification Sample Questions

The purpose of this Sample Question Set is to provide you with information about the Databricks Certified Associate Developer for Apache Spark - Python exam. These sample questions will make you very familiar with both the type and the difficulty level of the questions on the Developer for Apache Spark - Python certification test. To get familiar with real exam environment, we suggest you try our Sample Databricks Apache Spark Developer Associate Certification Practice Exam. This sample practice exam gives you the feeling of reality and is a clue to the questions asked in the actual Databricks Certified Associate Developer for Apache Spark certification exam.

These sample questions are simple and basic questions that represent likeness to the real Databricks Certified Associate Developer for Apache Spark - Python exam questions. To assess your readiness and performance with real-time scenario based questions, we suggest you prepare with our Premium Databricks Developer for Apache Spark - Python Certification Practice Exam. When you solve real time scenario based questions practically, you come across many difficulties that give you an opportunity to improve.

Databricks Developer for Apache Spark - Python Sample Questions:

01. Which of the following DataFrame methods is classified as a transformation?

a) DataFrame.count()

b) DataFrame.show()

c) DataFrame.select()

d) DataFrame.foreach()

e) DataFrame.first()

02. If we want to create a constant integer 1 as a new column ‘new_column’ in a dataframe df, which code block we should select?

a) df.withColumnRenamed('new_column', lit(1))

b) df.withColumn(new_column, lit(1))

c) df.withColumn(”new_column”, lit(“1”))

d) df.withColumn(“new_column”, 1)

e) df.withColumn(“new_column”, lit(1))

03. Which of the following three DataFrame operations are classified as an action?

(Choose 3 answers)

a) PrintSchema()

b) Show()

c) First()

d) limit()

e) foreach()

f) cache

04. The code block displayed below contains an error. The code block is intended to join DataFrame itemsDf with the larger DataFrame transactionsDf on column itemId. Find the error.

Code block: transactionsDf.join(itemsDf, "itemId", how="broadcast")

a) The syntax is wrong, how= should be removed from the code block.

b) The join method should be replaced by the broadcast method.

c) Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.

d) The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf

e) broadcast is not a valid join type.

05. If spark is running in client mode, which of the following statement about is correct?

a) Spark driver is randomly attributed to a machine in the cluster

b) Spark driver is attributed to the machine that has the most resources

c) Spark driver remains on the client machine that submitted the application

d) The entire spark application is run on a single machine.

06. What command we can use to get the number of partition of a dataframe named df?

a) df.rdd.getPartitionSize()

b) df.getPartitionSize()

c) df.getNumPartitions()

d) df.rdd.getNumPartitions()

07. Which of the following are valid execution modes?

a) Kubernetes, Local, Client

b) Client, Cluster, Local

c) Server, Standalone, Client

d) Cluster, Server, Local

e) Standalone, Client, Cluster

08. The code blown down below intends to join df1 with df2 with inner join but it contains an error. Identify the error.

d1.join(d2, “inner”, d1.col(“id”) === df2.col(“id"))

a) The join type is not in right order. The correct query should be d2.join(d1, d1.col(“id”) === df2.col(“id"), “inner”)

b) There should be two == instead of ===. So the correct query is d1.join(d2, “inner”, d1.col(“id”) == df2.col(“id"))

c) Syntax is not correct d1.join(d2, d1.col(“id”) == df2.col(“id"), “inner”)

d) We cannot do inner join in spark 3.0, but it is in the roadmap.

09. Which of the following statements is NOT true for broadcast variables?

a) It provides a mutable variable that a Spark cluster can safely update on a per-row basis.

b) It is a way of updating a value inside of a variety of transformations and propagating that value to the driver node in an efficient and fault-tolerant way.

c) You can define your own custom broadcast class by extending org.apache.spark.util.BroadcastV2 in Java or Scala or pyspark.AccumulatorParams in Python.

d) Broadcast variables are shared, immutable variables that are cached on every machine in the cluster instead of serialized with every single task.

e) The canonical use case is to pass around a small large table that does fit in memory on the executors.

10. Which of the following code blocks adds a column predErrorSqrt to DataFrame transactionsDf that is the square root of column predError?

a) transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))

b) transactionsDf.withColumn("predErrorSqrt", sqrt(predError))

c) transactionsDf.select(sqrt(predError))

d) transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())

e) transactionsDf.select(sqrt("predError"))

Answers:

Question: 01 Answer: c	Question: 02 Answer: e	Question: 03 Answer: b, c, e	Question: 04 Answer: e	Question: 05 Answer: c
Question: 06 Answer: d	Question: 07 Answer: b	Question: 08 Answer: c	Question: 09 Answer: a, b, c	Question: 10 Answer: a

Note: For any error in Databricks Certified Associate Developer for Apache Spark certification exam sample questions, please update us by writing an email on feedback@certfun.com.