Prévia do material em texto
<p>Contents</p><p>Design and prepare a machine learning solution (20–25%) 1</p><p>Design a machine learning solution 1</p><p>Design a data ingestion strategy for machine learning projects 2</p><p>Design a machine learning model training solution 2</p><p>Design a model deployment solution 3</p><p>Design a machine learning operations solution 4</p><p>Manage an Azure Machine Learning workspace 5</p><p>Explore and configure the Azure Machine Learning workspace 5</p><p>Explore Azure Machine Learning workspace resources and assets 5</p><p>Explore developer tools for workspace interaction 6</p><p>Make data available in Azure Machine Learning 7</p><p>Work with compute targets in Azure Machine Learning 8</p><p>Explore data, and train models (35–40%) 9</p><p>Prepare a model for deployment (20–25%) 10</p><p>Deploy and retrain a model (10–15%) 10</p><p>Study resources 11</p><p>Design and prepare a machine learning solution (20–25%)</p><p>Design a machine learning solution</p><p>· Determine the appropriate compute specifications for a training workload</p><p>· Describe model deployment requirements</p><p>· Select which development approach to use to build or train a model</p><p>Design a data ingestion strategy for machine learning projects</p><p>1. Every minute, a JSON object is extracted from an Internet of Things (IoT) device. What is the type of data that is extracted?</p><p>Structured</p><p>Semi-structured</p><p>Correct. A JSON object is considered semi-structured.</p><p>Unstructured</p><p>2. When a data scientist extracts JSON objects from an IoT device, and combines all transformed data in a CSV file, which data store would be best to use?</p><p>Azure Blob Storage</p><p>Azure Data Lake Storage</p><p>Correct. You can store CSV files in a data lake without having any capacity constraints.</p><p>Azure SQL Database</p><p>Design a machine learning model training solution</p><p>1. A data scientist wants to train a machine learning model to predict the sales of supermarket items to adjust the supply to the projected demand. What type of machine learning task will the model perform?</p><p>Classification.</p><p>Regression</p><p>Incorrect. Regression predicts a numerical value.</p><p>Time-series forecasting</p><p>Correct. Time-series forecasting is used to predict future sales.</p><p>2. The data scientist received data to train a model to predict the sales of supermarket items. The data scientist wants to quickly iterate over several featurization and algorithm options by only providing the data and editing some configurations. Which tool would best be used in this situation?</p><p>Designer</p><p>Automated Machine Learning</p><p>Correct. You'll only have to provide the data and Automated Machine Learning will iterate over different featurization approaches and algorithms.</p><p>Azure AI Services</p><p>Incorrect. Azure AI Services doesn't provide an API to customize a time-series forecasting model.</p><p>Design a model deployment solution</p><p>Consider the requirements</p><p>Requirement</p><p>Description</p><p>Consider the frequency. The plan is that a doctor enters a patient's information into the app, like their age and BMI. After entering, a doctor can select the Analyze button, after which the model should predict whether or not a patient is likely to have diabetes.</p><p>Consider the compute. A doctor consultation typically takes less than 10 minutes. If we want doctors to use this app, we need the answers to be returned as quickly as possible. The deployed model should always be available as we don't know when a doctor may use it.</p><p>Consider the size. A doctor will only use the app to get a prediction on an individual's situation. There's no need for generating the predictions of multiple patients at once.</p><p>Top of Form</p><p>1. What type of predictions are needed by the mobile application?</p><p>Real-time predictions</p><p>Correct. There is a need for immediate predictions for individual patients.</p><p>Batch predictions</p><p>Local predictions</p><p>2. What kind of compute should be used by the deployed model?</p><p>Virtual machines</p><p>Containers</p><p>Correct. Containers will be the more cost-effective solution as we want the model to be always available and respond immediately.</p><p>Local device</p><p>Bottom of Form</p><p>Design a machine learning operations solution</p><p>Consider the requirements</p><p>· Consider the environments. Currently, we're working in a small team and you're the only data scientist involved. We want to see whether this project is successful before actually scaling up and getting a large team involved.</p><p>· Consider the model. As the model is used to help doctors, accuracy is important to us. The model should only be in use when we know it's performing as expected.</p><p>· Consider the data. We're starting small and will mostly use the deployed model to test our application. The data the deployed model generates predictions on shouldn't be used to retrain the model as it may be biased.</p><p>·</p><p>Top of Form</p><p>1. How many Azure Machine Learning workspaces should the team create?</p><p>One</p><p>Correct. With such a small team, one workspace is enough.</p><p>Two</p><p>Three</p><p>2. When should we retrain the model?</p><p>Every week.</p><p>Incorrect. There doesn't seem to be a reason to retrain the model every week, especially as we don't know whether we have new data that we should use each week.</p><p>When the model's metrics are below the benchmark.</p><p>Correct. The most important thing is that the model performs as expected. When the model's performance is in jeopardy, we should retrain the model.</p><p>When there's data drift.</p><p>Manage an Azure Machine Learning workspace</p><p>· Create an Azure Machine Learning workspace</p><p>· Manage a workspace by using developer tools for workspace interaction</p><p>· Set up Git integration for source control</p><p>· Create and manage registries</p><p>Explore and configure the Azure Machine Learning workspace</p><p>Explore Azure Machine Learning workspace resources and assets</p><p>1. A data scientist needs access to the Azure Machine Learning workspace to run a script as a job. Which role should be used to give the data scientist the necessary access to the workspace?</p><p>Reader.</p><p>Azure Machine Learning Data Scientist.</p><p>Correct. An Azure Machine Learning Data Scientist is allowed to submit a job.</p><p>Azure Machine Learning Compute Operator.</p><p>2. The data scientist wants to run a single script to train a model. What type of job is the best fit to run a single script?</p><p>Command</p><p>Correct. A command job is used to run a single script.</p><p>Pipeline</p><p>Sweep</p><p>Explore developer tools for workspace interaction</p><p>1. A data scientist wants to experiment by training a machine learning model and tracking it with Azure Machine Learning. Which tool should be used to train the model by running a script from their preferred environment?</p><p>The Azure Machine Learning studio.</p><p>The Python SDK.</p><p>Correct. The data scientist is likely to already be familiar with Python and can easily use the Python SDK to run the training script.</p><p>The Azure CLI.</p><p>Incorrect. During experimentation, the data scientist may not need to automate tasks by using the CLI yet. Moreover, the data scientist is probably more familiar with Python than with the CLI.</p><p>2. A machine learning model to predict the sales forecast has been developed. Every week, new sales data is ingested and the model needs to be retrained on the newest data before generating the new forecast. Which tool should be used to retrain the model every week?</p><p>The Azure Machine Learning studio.</p><p>The Python SDK.</p><p>The Azure CLI.</p><p>Correct. The Azure CLI is designed for automating tasks. By using YAML files to define how the model must be trained, the machine learning tasks are repeatable, consistent, and reliable.</p><p>Manage data in an Azure Machine Learning workspace</p><p>· Select Azure Storage resources</p><p>· Register and maintain datastores</p><p>· Create and manage data assets</p><p>Make data available in Azure Machine Learning</p><p>Bottom of Form</p><p>1. A data scientist wants to read data stored in a publicly available GitHub repository. The data will be read in a Jupyter notebook in the Azure Machine Learning workspace for some quick experimentation. Which protocol should be used to read the data in the notebook?</p><p>azureml</p><p>http(s)</p><p>Correct. This protocol is used when accessing data stored in a publicly available http(s) location.</p><p>abfs(s)</p><p>2. What type of data asset should someone create when the schema changes frequently</p><p>and the data is used in many different jobs?</p><p>URI file</p><p>URI folder</p><p>MLTable</p><p>Correct. MLTable is ideal when the schema changes frequently. Then, you only need to make changes in one location instead of multiple.</p><p>Manage compute for experiments in Azure Machine Learning</p><p>· Create compute targets for experiments and training</p><p>· Select an environment for a machine learning use case</p><p>· Configure attached compute resources, including Azure Synapse Spark pools and serverless Spark compute</p><p>· Monitor compute utilization</p><p>Work with compute targets in Azure Machine Learning</p><p>Top of Form</p><p>1. You're working in Visual Studio Code. You cloned a GitHub repository to Visual Studio Code and you're editing code in a Jupyter notebook. To test the code, you want to run a cell within the notebook. Which compute should you use?</p><p>Compute instance</p><p>Correct. You can use a compute instance to run a notebook in Visual Studio Code.</p><p>Compute cluster</p><p>Azure Databricks cluster</p><p>2. You're experimenting with component-based pipelines in the Designer. You want to quickly iterate and experiment, as you're trying out varying configurations of a pipeline. You're using a compute cluster. To minimize the start-up time each time you submit a pipeline, which parameter should you change?</p><p>Compute size</p><p>Idle time before scale down</p><p>Correct. By increasing the idle time before scale down, you can run multiple pipelines consecutively without the compute cluster resizing to zero nodes in between jobs.</p><p>Maximum number of instances</p><p>Incorrect. You shouldn't change the maximum number of instances when you want the compute cluster to remain available in between pipeline job runs.</p><p>Work with environments in Azure Machine Learning</p><p>1. A data scientist created a script that trains a machine learning model using the open-source library scikit-learn. The data scientist wants to quickly test whether the script can run on the existing compute cluster, what type of environment should the data scientist use?</p><p>Default</p><p>Curated</p><p>Correct. Curated environments are ideal to use for faster development time.</p><p>Custom</p><p>2. A command job fails with the error message that a module isn't found. The data scientist used a curated environment and wants to add a specific Python package to create a custom environment and successfully run the job. Which file should be created before creating the custom environment that uses a curated environment as reference?</p><p>Training script</p><p>Docker image</p><p>Conda specification</p><p>Correct. You can list Python packages in a conda specification file.</p><p>Bottom of Form</p><p>Explore data, and train models (35–40%)</p><p>Explore data by using data assets and data stores</p><p>· Access and wrangle data during interactive development</p><p>· Wrangle interactive data with attached Synapse Spark pools and serverless Spark compute</p><p>Create models by using the Azure Machine Learning designer</p><p>· Create a training pipeline</p><p>· Consume data assets from the designer</p><p>· Use custom code components in designer</p><p>· Evaluate the model, including responsible AI guidelines</p><p>Use automated machine learning to explore optimal models</p><p>· Use automated machine learning for tabular data</p><p>· Use automated machine learning for computer vision</p><p>· Use automated machine learning for natural language processing</p><p>· Select and understand training options, including preprocessing and algorithms</p><p>· Evaluate an automated machine learning run, including responsible AI guidelines</p><p>Use notebooks for custom model training</p><p>· Develop code by using a compute instance</p><p>· Track model training by using MLflow</p><p>· Evaluate a model</p><p>· Train a model by using Python SDK v2</p><p>· Use the terminal to configure a compute instance</p><p>Tune hyperparameters with Azure Machine Learning</p><p>· Select a sampling method</p><p>· Define the search space</p><p>· Define the primary metric</p><p>· Define early termination options</p><p>Prepare a model for deployment (20–25%)</p><p>Run model training scripts</p><p>· Configure job run settings for a script</p><p>· Configure compute for a job run</p><p>· Consume data from a data asset in a job</p><p>· Run a script as a job by using Azure Machine Learning</p><p>· Use MLflow to log metrics from a job run</p><p>· Use logs to troubleshoot job run errors</p><p>· Configure an environment for a job run</p><p>· Define parameters for a job</p><p>Implement training pipelines</p><p>· Create a pipeline</p><p>· Pass data between steps in a pipeline</p><p>· Run and schedule a pipeline</p><p>· Monitor pipeline runs</p><p>· Create custom components</p><p>· Use component-based pipelines</p><p>Manage models in Azure Machine Learning</p><p>· Describe MLflow model output</p><p>· Identify an appropriate framework to package a model</p><p>· Assess a model by using responsible AI principles</p><p>Deploy and retrain a model (10–15%)</p><p>Deploy a model</p><p>· Configure settings for online deployment</p><p>· Configure compute for a batch deployment</p><p>· Deploy a model to an online endpoint</p><p>· Deploy a model to a batch endpoint</p><p>· Test an online deployed service</p><p>· Invoke the batch endpoint to start a batch scoring job</p><p>Apply machine learning operations (MLOps) practices</p><p>· Trigger an Azure Machine Learning job, including from Azure DevOps or GitHub</p><p>· Automate model retraining based on new data additions or data changes</p><p>· Define event-based retraining triggers</p><p>Study resources</p><p>We recommend that you train and get hands-on experience before you take the exam. We offer self-study options and classroom training as well as links to documentation, community sites, and videos.</p><p>Expand table</p><p>Study resources</p><p>Links to learning and documentation</p><p>Get trained</p><p>Choose from self-paced learning paths and modules or take an instructor-led course</p><p>Find documentation</p><p>Azure Databricks</p><p>Azure Machine Learning</p><p>Azure Synapse Analytics</p><p>MLflow and Azure Machine Learning</p><p>Ask a question</p><p>Microsoft Q&A | Microsoft Docs</p><p>Get community support</p><p>AI - Machine Learning - Microsoft Tech Community</p><p>AI - Machine Learning Blog - Microsoft Tech Community</p><p>Follow Microsoft Learn</p><p>Microsoft Learn - Microsoft Tech Community</p><p>Find a video</p><p>Microsoft Learn Shows</p><p>Compute</p><p>image2.png</p><p>image3.png</p><p>image4.png</p><p>image1.png</p>