A Tour of TensorFlow’s APIs
Dean R. Wyatte
Google Developer Group Boulder
August 21, 2018
TensorFlow today
About me
• Data Scientist/Machine Learning Engineer (prototype to production)
• TensorFlow user for ~2 years
• Recently transitioned to TensorFlow higher-level APIs
TensorFlow low-level API
• In the beginning, there was the computational graph
• Lazy evaluation + automatic differentiation + Google backing = ML Win
• The TensorFlow session is required for interacting with the graph
• Resource allocation, inter-device communication, etc.
• The entire system was flexible and powerful, but verbose and
complex, likely intimidating would-be users
TensorFlow warm-up:
Anatomy of a TensorFlow script
import tensorflow as tf
images = tf.placeholder(tf.float32, [None, 224, 224, 3])
labels = tf.placeholder(tf.float32, [None, 1000])
W_conv1 = tf.Variable(tf.random_normal([3, 3, 3, 64]))
b_conv1 = tf.Variable(tf.zeros([64]))
W_conv2 = tf.Variable(tf.random_normal([3, 3, 64, 128]))
b_conv2 = tf.Variable(tf.zeros([128]))
W_fc = tf.Variable(tf.random_normal([56*56*128, 4096]))
b_fc = tf.Variable(tf.zeros([4096]))
W_probs = tf.Variable(tf.random_normal([4096, 1000]))
b_probs = tf.Variable(tf.zeros([1000]))
conv1 = tf.nn.relu(tf.nn.conv2d(images, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)
pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
conv2 = tf.nn.relu(tf.nn.conv2d(pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)
pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
fc = tf.nn.relu(tf.matmul(tf.reshape(pool2, [-1, 56*56*128]), W_fc) + b_fc)
probs = tf.nn.softmax(tf.matmul(fc, W_probs) + b_probs)
loss = tf.reduce_mean(-tf.reduce_sum(labels*tf.log(probs), axis=1))
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
Resulting graph
TensorFlow warm-up:
Anatomy of a TensorFlow script
def yield_batches(data_dir):
# load/preprocess data
yield images, labels
with tf.Session() as sess:
for epoch in range(100):
train_loss, test_loss, n_train, n_test = 0
for batch_images, batch_labels in yield_batches(train_data_dir):
l, _ = sess.run([loss, optimizer], feed_dict={images: batch_images, labels: batch_labels})
n_train += len(batch_images)
train_loss += l
for batch_images, batch_labels in yield_batches(test_data_dir):
l = sess.run(loss, feed_dict={images: batch_images, labels: batch_labels})
n_test += len(batch_images)
test_loss += l
print('Train loss: {} Test loss: {}'.format(train_loss/n_train, test_loss/n_test))
Data generator
On Keras
• High-level neural network API “for humans”
• Originally built on top of Theano, also
supports TensorFlow and CNTK backends
• Can now import directly from TensorFlow
Theano TensorFlow CNTK
TensorFlow Keras
import tensorflow as tf
images = tf.placeholder(tf.float32, [None, 224, 224, 3])
labels = tf.placeholder(tf.float32, [None, 1000])
W_conv1 = tf.Variable(tf.random_normal([3, 3, 3, 64]))
b_conv1 = tf.Variable(tf.zeros([64]))
W_conv2 = tf.Variable(tf.random_normal([3, 3, 64, 128]))
b_conv2 = tf.Variable(tf.zeros([128]))
W_fc = tf.Variable(tf.random_normal([56*56*128, 4096]))
b_fc = tf.Variable(tf.zeros([4096]))
W_probs = tf.Variable(tf.random_normal([4096, 1000]))
b_probs = tf.Variable(tf.zeros([1000]))
conv1 = tf.nn.relu(tf.nn.conv2d(images, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)
pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
conv2 = tf.nn.relu(tf.nn.conv2d(pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)
pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
fc = tf.nn.relu(tf.matmul(tf.reshape(pool2, [-1, 56*56*128]), W_fc) + b_fc)
probs = tf.nn.softmax(tf.matmul(fc, W_probs) + b_probs)
loss = tf.reduce_mean(-tf.reduce_sum(labels*tf.log(probs), axis=1))
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.models import Model
images = Input(shape=[224, 224, 3])
conv1 = Conv2D(64, [3, 3], padding='same', activation='relu')(images)
pool1 = MaxPooling2D([2, 2], padding='same')(conv1)
conv2 = Conv2D(128, [3, 3], padding='same', activation='relu')(pool1)
pool2 = MaxPooling2D([2, 2], padding='same')(conv2)
fc = Dense(4096, activation=‘relu’)(Flatten()(pool2))
probs = Dense(1000, activation=‘softmax’)(fc)
optimizer = SGD(lr=0.01)
model = Model(images, probs)
model.compile(optimizer, loss=‘categorical_crossentropy’)
model.fit(train_images, train_labels,
validation_data=[test_images, test_labels])
TensorFlow higher-level APIs
• Try to satisfy simplicity/flexibility
• Provide low-level benefits while still
enabling user to focus on the model
• This talk primarily covers Datasets and
• Rapid development over last ~1 year
• Mostly backward compatible across
releases, but conventions subject to change
• Standard data loading and preprocessing blocks model
• High, consistent GPU utilization can vastly speed up training
def yield_batches(data_dir):
# load/preprocess data
yield images, labels
with tf.Session() as sess:
for epoch in range(100):
train_loss, test_loss, n_train, n_test = 0
for batch_images, batch_labels in yield_batches(data_dir):
l, _ = sess.run([loss, optimizer],
feed_dict={images: batch_images,
labels: batch_labels})
• TensorFlow Datasets provide simple API for designing input pipelines
decoupled from model without having to write threaded data
• Part of core API since TensorFlow 1.4 (November 2017)
• Functionality previously provided by QueueRunner, now deprecated
(incompatible with eager execution)
Dataset example
def load_image(filename):
buffer = tf.read_file(filename)
image = tf.image.decode_png(buffer, channels=3)
image = tf.cast(image, tf.float32)
return image
def input_fn(data_dir, batch_size):
filenames = glob.glob(os.path.join(data_dir, '**', '*.png'), recursive=True)
classnames = [os.path.basename(os.path.dirname(filename)) for filename in filenames]
lookup = {v: k for k, v in enumerate(sorted(set(classnames)))}
labels = [lookup[classname] for classname in classnames]
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.shuffle(buffer_size=1000000)
dataset = dataset.map(lambda filename, label: (load_image(filename), label))
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(1)
iterator = dataset.make_one_shot_iterator()
images, labels = iterator.get_next()
return images, labels
Ensure at least one batch enqueued
Dataset best practices
• In practice, Datasets can increase
training speed by 200-300%
• prefetch overlaps the work of
a producer and consumer
• map accepts a
argument for producer
• Directly managing graph and session
can lead to boilerplate, inefficient
• See Keras, Datasets
• TensorFlow Estimators expose
interface that hides these details
while remaining flexible and efficient
• Other benefits: Distributed training,
pre-made estimators
TensorFlow Estimators: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks
• Part of core API since TensorFlow 1.1 (April 2017), but still maturing
as TensorFlow evolves
• Resemble scikit-learn estimators (uniform methods, separation of
initialization and learning)
Estimator example
def model_fn(features, labels, mode):
W_conv1 = tf.Variable(tf.random_normal([3, 3, 3, 64]))
b_conv1 = tf.Variable(tf.zeros([64]))
W_conv2 = tf.Variable(tf.random_normal([3, 3, 64, 128]))
b_conv2 = tf.Variable(tf.zeros([128]))
W_fc = tf.Variable(tf.random_normal([56*56*128, 4096]))
b_fc = tf.Variable(tf.zeros([4096]))
W_probs = tf.Variable(tf.random_normal([4096, 1000]))
b_probs = tf.Variable(tf.zeros([1000]))
conv1 = tf.nn.relu(tf.nn.conv2d(features, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)
pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=‘SAME')
conv2 = tf.nn.relu(tf.nn.conv2d(pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)
pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
fc = tf.nn.relu(tf.matmul(tf.reshape(pool2, [-1, 56*56*128]), W_fc) + b_fc)
probs = tf.nn.softmax(tf.matmul(fc, W_probs) + b_probs)
loss = tf.reduce_mean(-tf.reduce_sum(labels*tf.log(probs), axis=1))
optimizer = tf.train.GradientDescentOptimizer(0.01)
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
estimator = tf.estimator.Estimator(model_fn=model_fn)
for epoch in range(100):
Note that we no longer define placeholders.
Features (inputs) and labels are tensors
provided by caller
Estimators return an EstimatorSpec
with ops required for training,
evaluation, etc.
Manages session
Estimators and Datasets: Better together
• estimator.train|evaluate|predict are unary methods
accepting a function that returns two tensors, inputs and target
• Or a Dataset
def input_fn(data_dir, ...)
dataset = tf.data.Dataset(...)
dataset = dataset.map(...).batch(batch_size).prefetch(1)
return dataset
def model_fn(features, labels, mode):
model = ...
loss = ...
train_op = ...
return EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
estimator = Estimator(model_fn=model_fn)
for epoch in range(100):
estimator.train(lambda: input_fn(train_data_dir, ...))
estimator.evaluate(lambda: input_fn(test_data_dir, ...))
Resulting graph
Distributed training with Estimators and Datasets
• Early stage feature to abstract away details of distributed training
• https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/distribute
• DistributionStrategy specifies the nature of distribution
• Currently supports model-replication with synchronous updates (MirroredStrategy)
distribution = tf.contrib.distribute.MirroredStrategy(['/device:GPU:0', '/device:GPU:1'])
config = tf.estimator.RunConfig(train_distribute=distribution)
estimator = tf.estimator.Estimator(model_fn=model_fn, config=config)
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp0ar31ejr', '_tf_random_seed': None,
'_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600,
'_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000,
'_log_step_count_steps': 100, '_train_distribute':
<tensorflow.contrib.distribute.python.mirrored_strategy.MirroredStrategy object at 0x7f161d4b7b00>,
'_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec
object at 0x7f161d4b7c18>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master':
'', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Device is available but not used by distribute strategy: /device:CPU:0
INFO:tensorflow:Configured nccl all-reduce.
INFO:tensorflow:batch_all_reduce invoked for batches size = 14 with algorithm = nccl, num_packs = 1,
agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
TensorFlow Serving
• Set of APIs for defining how a model should handle
requests without requiring changes to server
• tensorflow-model-server is a lightweight C++ wrapper
over TensorFlow session that handles API requests via
inputs_info = tf.saved_model.utils.build_tensor_info(inputs_tensor)
inputs_def = {tf.saved_model.signature_constants.CLASSIFY_INPUTS: inputs_info}
outputs_info = tf.saved_model.utils.build_tensor_info(outputs_tensor)
outputs_def = {tf.saved_model.signature_constants.CLASSIFY_OUTPUT_SCORES: outputs_def}
signature = (
* For Estimators: https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators
• TensorFlow high-level APIs allow users to focus on the model without
sacrificing too much flexibility
• Datasets enable users to design performant input pipelines
• Estimators hide the details of the session and reduce boilerplate
The future
• High-level APIs are usable now, but still maturing as TensorFlow evolves
• Compatibility with Keras models via existing methods or conversion to Estimators
• Distributed training with minimal code changes
• Compatibility with eager execution
Thank You
Dean R. Wyatte
Google Developer Group Boulder
August 21, 2018

