 The purpose of this Sample Question Set is to provide you with information about the Databricks Certified Machine Learning Professional exam. These sample questions will make you very familiar with both the type and the difficulty level of the questions on the Machine Learning Professional certification test. To get familiar with real exam environment, we suggest you try our Sample Databricks Machine Learning Professional Certification Practice Exam. This sample practice exam gives you the feeling of reality and is a clue to the questions asked in the actual Databricks Certified Machine Learning Professional certification exam.
The purpose of this Sample Question Set is to provide you with information about the Databricks Certified Machine Learning Professional exam. These sample questions will make you very familiar with both the type and the difficulty level of the questions on the Machine Learning Professional certification test. To get familiar with real exam environment, we suggest you try our Sample Databricks Machine Learning Professional Certification Practice Exam. This sample practice exam gives you the feeling of reality and is a clue to the questions asked in the actual Databricks Certified Machine Learning Professional certification exam.
These sample questions are simple and basic questions that represent likeness to the real Databricks Certified Machine Learning Professional exam questions. To assess your readiness and performance with real-time scenario based questions, we suggest you prepare with our Premium Databricks Machine Learning Professional Certification Practice Exam. When you solve real time scenario based questions practically, you come across many difficulties that give you an opportunity to improve.
Databricks Machine Learning Professional Sample Questions:
- Track performance metrics and hyperparameters for each configuration
- Associate the final model with its hyperparameter search process
- Enable easy comparison in the MLflow UI
Which approach will structure their MLflow runs to meet their needs?
a) Use flat runs with tags to indicate hyperparameter sets and fold numbers, storing all metrics and models at the same level.
b) Use nested runs with a parent run for the entire experiment, child runs for each hyperparameter set (logging aggregated CV metrics), and the final model evaluation as a separate run of the best-performing parameter set.
c) Use nested runs with a parent run for each cross-validation fold and child run for the hyperparameter set best-performing parameter set.
d) Use nested runs with a parent run for the entire experiment and child runs only for the final model training, logging all cross-validation results as metrics in the parent run.
However, since development will take several months, they’re concerned that the production model might be updated during that time. How should they approach this problem?
a) Use the MLflow Client's get_latest_version() method to get the most recent version of the registered model.
b) Use the Databricks Model Serving SDK to query the served model when validating against the new model.
c) Implement a shadow deployment strategy to test the new model with real world data.
d) Update model deployment code to add an alias to the model version in production and use the model alias when validating against the new model.
- If the model predicts a fraudulent transaction with 95% confidence or higher, the credit card is immediately frozen and the transaction is declined.
- If the model predicts the transaction is non-fraudulent with 95% confidence or higher, the transaction is allowed with no action taken.
- Any other scenario allows the transaction, but notifies the owner that suspicious activity was observed on their card.
Which loss metric should the engineer use to select the top performing model during model retrains?
a) F1
b) Log Loss
c) AUROC
d) Accuracy
New models need to be rolled out in a safe manner that minimizes the financial risk should there be an issue with the new model. They need to configure the Model Serving Endpoint to support the current scaling requirements while also minimizing the risk of deploying a new model.
Which set of actions will meet the requirements?
a) - Set the Compute scale-out to allow for horizontal scaling to meet traffic requirements.
- Ensure Route Optimization is enabled to further improve the endpoints throughput.
- When deploying a new model, leverage a second Served Entity and incrementally increase traffic to that entity while monitoring that entity's performance.
b) - Set the Compute scale-out to allow for horizontal scaling to meet traffic requirements.
- When deploying a new model, leverage a second Served Entity and incrementally increase traffic to that entity while monitoring that entity's performance.
c) - Set the Compute scale-out to allow for horizontal scaling to meet traffic requirements.
- Ensure Route Optimization is enabled to further improve the endpoints throughput.
- When deploying a new model, leverage a second Served Entity and set traffic to the new model at 100% and the original model at 0%.
d) - Set the Compute scale-out to allow for horizontal scaling to meet traffic requirements.
- When deploying a new model, leverage a second Served Entity and set traffic to the new model at 100% and the original model at 0%.
a) Binary Logistic Regression
b) Multinomial Naive Bayes
c) Linear Regression
d) Softmax Classifier
They need to encode categorical variables and combine all features for model training, ensuring the workflow scales as data volume increases. Which approach meets these requirements?
a) Use SparkML’s StringIndexer and OneHotEncoder to transform categorical features, then VectorAssembler to combine all features before model training.
b) Use pandas’ get_dummies for categorical encoding and concatenate features manually before model training.
c) Use scikit-learn’s LabelEncoder and OneHotEncoder, then combine features with ColumnTransformer.
d) Use TensorFlow’s tf.feature_column API to preprocess and combine features.
Which of these integration tests, at a minimum, will need to be re-run due to the change in the hyperparameters after updating their code?
a) Only the training integration test will be affected since only model parameters are being updated, while feature engineering and deployment processes remain unchanged.
b) Both training and evaluation integration tests will be affected, as changing model parameters impacts model performance metrics, but feature engineering and deployment tests remain unaffected.
c) Training, evaluation, and deployment integration tests will be affected, as changing model parameters requires retraining, re-evaluation, and redeployment of the model, but feature engineering tests remain unaffected.
d) All integration tests (feature engineering, training, evaluation, and deployment) will be affected, as changing model parameters requires validation across the entire ML pipeline to ensure end-to-end functionality.
The DevOps team requires that all infrastructure components be version-controlled, reproducible, and deployable across multiple environments (dev, staging, production) using a streamlined deployment process.
Which deployment strategy meets these requirements?
a) Configure all ML resources (experiments, models, endpoints) in a Databricks Asset Bundle file and deploy.
b) Use MLflow Projects to package the code and Databricks Jobs to orchestrate the deployment of each component separately.
c) Create Terraform modules for infrastructure provisioning and use the Databricks CLI for ML component configuration.
d) Use the Databricks CLI to deploy Python scripts as Job Runs.
Which approach meets these requirements?
a) Provide the MLflow Experiment ID to Optuna’s MLflowCallback function, then use Python’s async functionality to asynchronously trigger multiple parallel optimization runs.
b) Provide the Mlflow Experiment ID to MLflowStorage and initialize MLflowSparkStudy to distribute the trials across the multi-node Databricks cluster.
c) Provide the MLflow Experiment ID to Optuna's MLflowCallback function and then use MlflowSparkStudy to distribute the trials across the multi-node Databricks cluster.
d) Provide the MLflow Experiment path to Optuna’s MLflowCallback function, then distribute the trials across a multi-node Ray on Spark cluster.
Which method should they use to send data to the model and retrieve predictions?
a) Use the ai_query function from the Databricks SDK with the model’s serving endpoint and input parameters.
b) Send a POST request to the model’s REST API endpoint using the requests library and pass the access token and input data as query string parameters.
c) Use the MLflow Tracking client to log the request data and then retrieve predictions from the model serving endpoint.
d) Call the predict method from the MLflow Deployments class with the endpoint name and model inputs.
Answers:
| Question: 01 Answer: b | Question: 02 Answer: d | Question: 03 Answer: b | Question: 04 Answer: a | Question: 05 Answer: c | 
| Question: 06 Answer: a | Question: 07 Answer: c | Question: 08 Answer: a | Question: 09 Answer: b | Question: 10 Answer: d | 
Note: For any error in Databricks Certified Machine Learning Professional certification exam sample questions, please update us by writing an email on feedback@certfun.com.
