The purpose of this Sample Question Set is to provide you with information about the Databricks Certified Machine Learning Associate exam. These sample questions will make you very familiar with both the type and the difficulty level of the questions on the Machine Learning Associate certification test. To get familiar with real exam environment, we suggest you try our Sample Databricks Machine Learning Associate Certification Practice Exam. This sample practice exam gives you the feeling of reality and is a clue to the questions asked in the actual Databricks Certified Machine Learning Associate certification exam.
These sample questions are simple and basic questions that represent likeness to the real Databricks Certified Machine Learning Associate exam questions. To assess your readiness and performance with real-time scenario based questions, we suggest you prepare with our Premium Databricks Machine Learning Associate Certification Practice Exam. When you solve real time scenario based questions practically, you come across many difficulties that give you an opportunity to improve.
Databricks Machine Learning Associate Sample Questions:
01. A data scientist is working on a machine learning project to develop a model that predicts whether a customer will churn from a subscription service. The dataset is highly imbalanced, with only 10% of the instances representing customers who churn. They want to ensure that your model effectively identifies the minority class without being biased towards the majority class.
Which strategy directly mitigates the model’s bias towards the non-churn customers due to class imbalance?
a) Normalize the features to ensure they are on the same scale, improving model performance.
b) Increase the size of the training dataset by collecting more data on non-churn customers.
c) Use cost-sensitive learning by assigning a higher misclassification cost to the minority class during model training.
d) Use a simpler model to reduce overfitting, ensuring it generalizes better to the minority class.
02. A data scientist wants to create a feature table to use in their models. They are working in a workspace with Unity Catalog enabled and want this feature table to be stored and governed by it.
What is the correct way of creating this feature table?
a) Use the create_table method of the FeatureEngineeringClient in Python to create the table, then write data to it.
b) Create an empty Delta table on Unity Catalog with the AS FEATURE STORE clause via SQL, then write data to it.
c) Create a Delta table with data in it, as usual, then use the register_table method from the FeatureStoreClient in Python to register it as a feature table in Unity Catalog.
d) Create a Delta table with data in it in Unity Catalog then use the ALTER TABLE command in SQL to configure it as a feature table with the SET AS FEATURE STORE clause.
03. A data scientist is tuning a Support Vector Machine (SVM) model using 5 fold cross-validation and GridSearchCV in scikit-learn. The parameter grid includes three hyperparameters to optimize: C with values [0.1, 1, 10], kernel with choices ['linear', 'rbf'], and gamma with values [0.01, 0.1, 1].
How many different models will be trained in total?
a) 18
b) 90
c) 1
d) None of the above.
04. A company has a podcast platform that has thousands of users. The company has implemented an anomaly detection algorithm to detect low podcast engagement based on a 10-minute running window of user events such as listening, pausing, and exiting the podcast.
A machine learning engineer wants to deploy this model into a production data pipeline that needs to handle up to tens of thousands of events per second. As the volume of events fluctuates throughout the day, the engineer needs the pipeline compute to be resized dynamically.
Which pipeline design approach meets these requirements?
a) Create a Delta Live Tables pipeline that applies the algorithm as a Spark UDF.
b) Create a Structured Streaming Job that applies the algorithm as a Spark UDF.
c) Create a model serving endpoint, create a Delta Live Tables pipeline that calls a custom UDF which invokes the endpoint.
d) Create a model serving endpoint, create a Structured Streaming job that calls a custom UDF which invokes the endpoint.
05. A data scientist needs to impute the missing values in a continuous feature. They want to do this with the least amount of effort but with correct results. Which strategy will do this?
a) Use sklearn SimpleImputer, which automatically selects the best methodology based on the feature distribution
b) Use .mode(), which is the most appropriate imputation on continuous columns
c) Use .mean(), which is the most appropriate imputation on continuous columns
d) Examine the distribution of the values and select the appropriate imputation upon review
06. A senior machine learning engineer is developing a machine learning pipeline. They set up the pipeline to automatically transition a new version of a registered model to the Production stage in the Model Registry once it passes all tests using the MLflow Client API client.
Which operation was used to transition the model to the Production stage?
a) Client.update_model_stage
b) client.transition_model_version_stage
c) client.transition_model_version
d) client.update_model_version
07. When AutoML explores the key attributes of a dataset, which of the following elements does it typically not assess?
a) The dataset's memory footprint.
b) The potential impact of outliers on model performance.
c) The balance or imbalance of classes in classification tasks.
d) The encryption level of the dataset.
08. Which of the following are key components of ML workflows in Databricks?
a) Data ingestion
b) Model serving
c) Feature extraction
d) Manual model tuning
09. A machine learning team wants to use the Python library newpackage on all of their projects. They share a cluster for all of their projects. Which approach makes the Python library newpackage available to all notebooks run on a cluster?
a) Edit the cluster to use the Databricks Runtime for Machine Learning
b) Set the runtime-version variable in their Spark session to "ml"
c) Running %pip install newpackage once on any notebook attached to the cluster
d) Adding /databricks/python/bin/pip install newpackage to the cluster’s bash init script
e) There is no way to make the newpackage library available on a cluster
10. How does MLflow help in the model development process?
a) It creates synthetic training data
b) It tracks experiments, manages models, and logs metrics
c) It optimizes hyperparameters automatically
d) It speeds up model inference
Answers:
|
Question: 01 Answer: c |
Question: 02 Answer: a |
Question: 03 Answer: b |
Question: 04 Answer: a |
Question: 05 Answer: d |
|
Question: 06 Answer: b |
Question: 07 Answer: d |
Question: 08 Answer: a, b, c |
Question: 09 Answer: d |
Question: 10 Answer: b |
Note: For any error in Databricks Certified Machine Learning Associate certification exam sample questions, please update us by writing an email on feedback@certfun.com.
