Session 6
Professional Machine Learning Engineer
Session 6 Content Review
Sample Questions Review
Exam Information
Next Steps
Professional Machine Learning Certification
Learning Journey Organized by Google Developer Groups Surrey co hosting with GDG Seattle
Session 1
Feb 24, 2024
Session 2
Mar 2, 2024
Session 3
Mar 9, 2024
Session 4
Mar 16, 2024
Session 5
Mar 23, 2024
Session 6
Apr 6, 2024
Session 6 Content Review
Session 6
Study Group
Exam Tips
- Review
- Registering for the Exam
- Tips for managing your time / test taking strategy
Summarizing the four ML options
Pre-Built APIs BigQuery ML AutoML Vertex AI
Data type
Tabular, image,
text, and video
Tabular, image,
text, and video
No limits
Training data size No data required Medium to large Medium Medium to large
ML and coding
Low Medium Low High
Flexibility to tune
None Medium None High
Time to train a model None Medium Medium Long
01 02 03
Section 1: ML Problem Framing
Translate business challenge into ML use
case. Considerations include:
● Defining business problems
● Identifying non-ML solutions
● Defining output use
● Managing incorrect results
● Identifying data sources
● Mapping business problem to ML
problem. What should our label be?
What should our features be?
● Does this problem actually require
ML? (Ex: Find the avg. number of units
manufactured by month for the last
● Making sure label maps to business
● Knowing that we need labelled data
for supervised ML
Section 1: ML Problem Framing
Define ML problem. Considerations include:
● Defining problem type (classification,
regression, clustering, etc.)
● Defining outcome of model
● Defining the input (features) and
predicted output format
● General ML terminology
● Regression = numeric / continuous
● Classification = discrete class label
● Output of classification models are
probabilities of each class (sigmoid for
binary classification, softmax for N-
● Features must be numeric so one-hot
encode categorical features
Section 1: ML Problem Framing
Define business success criteria.
Considerations include:
● Success metrics
● Key results
● Determination of when a model is
deemed unsuccessful
● Precision = TP / (TP + FP)
○ Positive predictive value: Of all
examples the model predicted
positive, what percentage were
actually positive?
● Recall = TP / (TP + FN)
○ True positive rate: Of all positive
examples, what percentage did
my model correctly predict as
● AUC ROC = Area under the curve by
plotting TPR against FPR
○ Threshold independent
Section 1: ML Problem Framing
Identify risks to feasibility and
implementation of ML solution.
Considerations include:
● Assessing and communicating
business impact
● Assessing ML solution readiness
● Assessing data readiness
● Aligning with Google AI principles and
practices (e.g. different biases)
● ML Readiness = data + infra
● Important section of AI Principles: “AI
algorithms and datasets can reflect,
reinforce, or reduce unfair biases.”
Section 2: ML Solution Architecture
Design reliable, scalable, highly available ML
solutions. Considerations include:
● Optimizing data use and storage
● Data connections
● Automation of data preparation and
model training/deployment
● SDLC best practices
● GCS = Unstructured Data (or
structured data in Parquet, Avro, etc)
● BigQuery = Structured Data
● BigTable = Structured Data
○ Low latency & High throughput
● ML is an iterative process that
follows concrete steps:
○ Data ingest/analysis/exploration
○ Data validation
○ Feature engineering
○ Model training
○ Model evaluation/validation
○ Model deployment
Section 2: ML Solution Architecture
Choose appropriate Google Cloud software
components. Considerations include:
● A variety of component types - data
collection; data management
● Exploration/analysis
● Feature engineering
● Logging/management
● Automation
● Monitoring
● Serving
● Data Collection/Management:
○ BigQuery = Batch or Stream
■ 100,000 rows/second with
insert ID (1M without)
■ Latency ~1-2s
○ GCS = Unstructured (usually)
○ PubSub = Stream Ingest
○ DataFlow = Batch or Stream
● Exploration/analysis:
○ BigQuery (SQL)
○ Vertex Workbench Notebooks
○ Dataprep (Visual ETL)
Section 2: ML Solution Architecture
Choose appropriate Google Cloud software
components. Considerations include:
● A variety of component types - data
collection; data management
● Exploration/analysis
● Feature engineering
● Logging/management
● Automation
● Monitoring
● Serving
● Feature Engineering:
○ TF Transform (DataFlow)
■ Most scalable
○ BigQuery Transform
■ BQML models
○ Keras Lambda Layers
■ Baked into TF graph
(somewhat easier to
implement than tft)
● Logging/management:
○ Understand what gets logged
when using different products.
Ex: BigQuery slot usage, AI
Platform job status, etc
Section 2: ML Solution Architecture
Choose appropriate Google Cloud software
components. Considerations include:
● A variety of component types - data
collection; data management
● Exploration/analysis
● Feature engineering
● Logging/management
● Automation
● Monitoring
● Serving
● Automation:
○ Scheduled BigQuery
○ Kubeflow scheduled runs
○ Vertex Pipelines
○ Composer scheduled runs
○ Cloud Functions for event
triggered runs
● Serverless Serving Infrastructures:
○ Vertex AI (good default choice
for batch/online)
○ Cloud Run (leverage containers,
deploy model as part of app)
○ BigQuery (batch preds on BQML
Section 2: ML Solution Architecture
Design architecture that complies with
regulatory and security concerns.
Considerations include:
● Building secure ML systems
● Privacy implications of data usage
● Identifying potential regulatory issues
● Data Loss Prevention (DLP) API for
identifying sensitive data
● Ways to handle sensitive data:
○ Throw it away (not great)
○ Masking / hashing to anonymize
○ Coarsen
■ Ex: Use first 3 digits of ZIP
code instead of full zip
■ Conceptually this is
bucketizing to make a
feature non-identifiable at
an individual level
Section 3: Data Preparation and Processing
Data ingestion. Considerations include:
● Ingestion of various file types (e.g.
Csv, json, img, parquet or databases,
● Database migration
● Streaming data (e.g. from IoT devices)
● Serializing to TFRecords
● Analytics workloads -> BigQuery
● Hadoop/Spark migration -> DataProc
● Streaming Data
○ Ingest with PubSub
○ Process with DataFlow
Section 3: Data Preparation and Processing
Data exploration (EDA). Considerations
● Visualization
● Statistical fundamentals at scale
● Evaluation of data quality and
● Vertex Workbench Notebooks w/
BigQuery magic to sample and explore
your data in Python (Pandas, Matplotlib,
● Numeric input + numeric output =
Pearson Correlation
● Numeric input + categorical output =
● Categorical input + categorical output =
Section 3: Data Preparation and Processing
Design/build data pipelines. Considerations
● Batching and streaming data pipelines
at scale
● Data privacy and compliance
● Monitoring/changing deployed
● Handling missing data
● Handling outliers
● Managing large samples (TFRecords)
● Transformations (TensorFlow
● For scalable, production systems
leverage Dataflow (TF Transform) for
● Vertex Pipelines with Kubeflow
Pipelines or TFX (Tensorflow extended)
● Options with missing data
○ Throw it away (not great)
○ Impute missing numeric features
○ Create a separate bucket for
missing categorical features
● Outliers
○ Clipping at a max/min value
○ Bucketize and have a “catch-all”
bucket. Try to ensure equal
number of samples in each
Section 3: Data Preparation and Processing
Feature engineering. Considerations
● Data leakage and augmentation
● Encoding structured data types
● Feature selection
● Class imbalance
● Feature crosses
● Dataset Augmentation (common with
images): tweak existing data in some
small way to create a larger training set
● Features crosses
○ Captures feature interactions
○ Lead to sparsity
○ Frequently combined with
embedding layer
● Feature selection
○ See Correlation/ANOVA
○ L1 Regularization
Section 4: ML Model Development
Build a model. Considerations include:
● Choice of framework and model
● Modeling techniques given
interpretability requirements
● Transfer learning
● Model generalization
● Overfitting
● Model Architectures
○ Boosted Trees: Good for
structured data (frequently as
good as DNNs)
○ LSTMs: Time series data
○ Transformers: Popular with NLP
○ CNNs: Images
● Transfer Learning: Repurposing a model
trained on one task to do another task.
Take existing model and train it a bit
more with your data.
● Regularization = techniques to help
model generalize (L1/L2 Reg, Dropout)
● Overfitting = memorizing training data.
Low train loss, high test loss
Section 4: ML Model Development
Train/test a model. Considerations include:
● Productionizing
● Training a model as a job in different
● Tracking metrics during training
● Retraining/redeployment evaluation
● Model performance against baselines,
simpler models, and across the time
● Model explainability on Vertex AI
● Best practice is to use serverless,
distributing training products (like
Vertex AI)
● Model checkpointing & early stopping
are important
● Always have a common sense baseline
to compare your models with
● Model explainability
○ SHAP (good for Boosted Trees bc
they are not differentiable)
○ Integrated Gradients (model must
be piecewise differentiable)
○ Both techniques are supported by
Vertex AI
Section 4: ML Model Development
Scale model training and serving.
Considerations include:
● Distributed training
● Scaling prediction service (e.g. Vertex
AI Prediction, containerized serving)
● Know when to use GPUs / TPUs and
leverage distributed training (Ex:
Training an image classifier with 5M
images on a single machine would take
● Distribution strategies with Tensorflow
in Vertex AI
● Serverless Serving Infrastructures:
○ Vertex AI
○ Cloud Run (leverage containers,
deploy model as part of app)
○ BigQuery (batch preds on BQML
Section 5: Automating and orchestrating ML pipelines
Designing and implementing training
pipeline. Considerations include:
● Identification of components,
parameters, triggers, and compute
needs (Cloud Build, Cloud Run)
● Orchestration framework
● Hybrid or multi-cloud strategies
● System design with TFX
components/Kubeflow DSL
● Vertex Pipelines
○ Kubeflow
■ Lightweight Python Components
■ Custom Components
■ Focused on e2e ML Ops, works great
with TF, but also with others
● Cloud Composer/Airflow (generic
○ Cloud functions to trigger DAG runs
● Cloud Build
○ Github triggers to run Vertex
● Constructing a pipeline with KubeFlow
SDK or with TFX
● Vertex Metadata (TFX)
Section 5: ML Pipeline Automation and Orchestration
Implement serving pipeline. Considerations
● Serving (online, batch, caching)
● Google Cloud serving options
● Testing for target performance
● Configuring trigger and pipeline
● Different types of serializing models. TF
SavedModel is default for TF models.
Scikit learn supports .pkl and joblib
● Serving options (again):
○ Vertex AI predictions
○ App Engine (some customers do it)
○ Cloud Run
○ BigQuery (batch preds)
Section 5: ML Pipeline Automation and Orchestration
Track and audit metadata. Considerations
● Organization and tracking
experiments and pipeline runs
● Hooking into model and dataset
● Model/dataset lineage
● ML Metadata (TFX)
● Model versioning with Vertex AI
● Vertex Pipelines for model/data lineage
Section 6: ML Solution Monitoring, Optimization, and Maintenance
Monitor ML solutions. Considerations
● Performance and business quality of
ML model predictions
● Logging strategies
● Establishing continuous evaluation
● Understand GCP permissions model
● Common training and serving errors
● ML model failure and biases
● See what is captured in Vertex AI logs
(job failures, performance metrics, etc.)
● Continuous evaluation metrics - back in
evaluation to orchestration itself (inside
Vertex Pipelines or Composer DAG)
● Common training/serving errors
○ Train-serve skew
○ Almost always an issue in the data
● Biases
○ Look at subgroups
○ What-if tool
Section 6: ML Solution Monitoring, Optimization, and Maintenance
Tune performance of ML solutions for
training & serving in production.
Considerations include:
● Optimization and simplification of
input pipeline for training
● Simplification technique
● Avoid useless intermediary steps
● Start small: Sample your data,
experiment, get a baseline before using
your whole dataset
● Simulate how the models performance
would degrade over time to influence
retraining policy
○ Retraining policy is a balance of
cost-to-retrain and business
value gained from retraining
Key facts
● Taken online or in person
● Exam length: 2 hours
● 50 multiple-choice or multiple-
select questions.
● Register at
Tips and tricks
● Apply your experience.
● Read the questions
● Mark questions and
review them later.
You need to build an object detection model for a small startup company to identify if and where
the company’s logo appears in an image. You were given a large repository of images, some with
logos and some without. These images are not yet labelled. You need to label these pictures, and
then train and deploy the model. What should you do?
A. Use Google Cloud Data Labelling Service to label your data. Use AutoML Object Detection
to train and deploy the model.
B. Use Vision API to detect and identify logos in pictures and use it as a label. Use AI Platform
to build and train a convolutional neural network.
C. Create two folders: one where the logo appears and one where it doesn’t. Manually place
images in each folder. Use AI Platform to build and train a convolutional neural network.
D. Create two folders: one where the logo appears and one where it doesn’t. Manually place
images in each folder. Use AI Platform to build and train a real time object detection model.
You work for a textile manufacturer and have been asked to build a model to detect and classify
fabric defects. You trained a machine learning model with high recall based on high resolution
images taken at the end of the production line. You want quality control inspectors to gain trust
in your model. Which technique should you use to understand the rationale of your classifier?
A. Use K-fold cross validation to understand how the model performs on different test
B. Use the Integrated Gradients method to efficiently compute feature attributions for each
predicted image.
C. Use PCA (Principal Component Analysis) to reduce the original feature set to a smaller set
of easily understood features.
D. Use k-means clustering to group similar images together, and calculate the Davies-Bouldin
index to evaluate the separation between clusters
You need to write a generic test to verify whether Dense Neural Network (DNN)
models automatically released by your team have a sufficient number of
parameters to learn the task for which they were built. What should you do?
A. Train the model for a few iterations, and check for NaN values.
B. Train the model for a few iterations, and verify that the loss is constant.
C. Train a simple linear model, and determine if the DNN model outperforms
D. Train the model with no regularization, and verify that the loss function is
close to zero.
Your team is using a TensorFlow Inception-v3 CNN model pretrained on ImageNet for
an image classification prediction challenge on 10,000 images. You will use AI Platform
to perform the model training. What TensorFlow distribution strategy and AI Platform
training job configuration should you use to train the model and optimize for wall-clock
A. Default Strategy; Custom tier with a single master node and four v100 GPUs.
B. One Device Strategy; Custom tier with a single master node and four v100 GPUs.
C. One Device Strategy; Custom tier with a single master node and eight v100 GPUs.
D. MirroredStrategy; Custom tier with a single master node and four v100 GPUs.
You work for a maintenance company and have built and trained a deep learning model
that identifies defects based on thermal images of underground electric cables. Your
dataset contains 10,000 images, 100 of which contain visible defects. How should you
evaluate the performance of the model on a test dataset?
A. Calculate the Area Under the Curve (AUC) value.
B. Calculate the number of true positive results predicted by the model.
C. Calculate the fraction of images predicted by the model to have a visible defect.
D. Calculate the Cosine Similarity to compare the model’s performance on the test
dataset to the model’s performance on the training dataset.
You work for a large financial institution that is planning to use Dialogflow to create a chatbot for the
company’s mobile app. You have reviewed old chat logs and tagged each conversation for intent
based on each customer’s stated intention for contacting customer service. About 70% of customer
inquiries are simple requests that are solved within 10 intents. The remaining 30% of inquiries require
much longer and more complicated requests. Which intents should you automate first?
A. Automate a blend of the shortest and longest intents to be representative of all intents.
B. Automate the more complicated requests first because those require more of the agents’ time.
C. Automate the 10 intents that cover 70% of the requests so that live agents can handle the more
complicated requests.
D. Automate intents in places where common words such as “payment” only appear once to avoid
confusing the software.
You work for a gaming company that develops and manages a popular massively multiplayer online
(MMO) game. The game’s environment is open-ended, and a large number of positions and moves
can be taken by a player. Your team has developed an ML model with TensorFlow that predicts the
next move of each player. Edge deployment is not possible, but low-latency serving is required. How
should you configure the deployment?
A. Use a Cloud TPU to optimize model training speed.
B. Use AI Platform Prediction with a NVIDIA GPU to make real-time predictions.
C. Use AI Platform Prediction with a high-CPU machine type to get a batch prediction for the
D. Use AI Platform Prediction with a high-memory machine type to get a batch prediction for the
You work for a large retailer. You want to use ML to forecast future sales leveraging 10 years of
historical sales data. The historical data is stored in Cloud Storage in Avro format. You want to rapidly
experiment with all the available data. How should you build and train your model for the sales
A. Load data into BigQuery and use the ARIMA model type on BigQuery ML.
B. Convert the data into CSV format and create a regression model on AutoML Tables.
C. Convert the data into TFRecords and create an RNN model on TensorFlow on AI Platform
D. Convert and refactor the data into CSV format and use the built-in XGBoost algorithm on AI
Platform Training.
You are an ML engineer at a media company. You want to use machine learning to
analyze video content, identify objects, and alert users if there is inappropriate
content. Which Google Cloud products should you use to build this project?
A. Pub/Sub, Cloud Function, Cloud Vision API
B. Pub/Sub, Cloud IoT, Dataflow, Cloud Vision API, Cloud Logging
C. Pub/Sub, Cloud Function, Video Intelligence API, Cloud Logging
D. Pub/Sub, Cloud Function, AutoML Video Intelligence, Cloud Logging
You work on a team where the process for deploying a model into production starts with data
scientists training different versions of models in a Kubeflow pipeline. The workflow then stores the
new model artifact into the corresponding Cloud Storage bucket. You need to build the next steps of
the pipeline after the submitted model is ready to be tested and deployed in production on AI
Platform. How should you configure the architecture before deploying the model to production?
A. A. Deploy model in test environment -> Evaluate and test model -> Create a new AI Platform
model version
B. Validate model -> Deploy model in test environment -> Create a new AI Platform model version
C. Create a new AI Platform model version -> Evaluate and test model -> Deploy model in test
D. Create a new AI Platform model version - > Deploy model in test environment -> Validate model
Next steps
This week (Session 6):
● Take the PMLE Sample Questions
○ ML Ops - Getting Started
○ ML Pipelines on Google Cloud
○ Build and Deploy ML Solutions on Vertex AI
○ Perform Foundational Data, ML, and AI Tasks in Google Cloud
● Share your result on Slack group if you want.
Next week:
● You’re ready to take the exam!
○ Register at: webassessor.com/googlecloud
