A Tour of Tensorflow's APIs

A Tour of TensorFlow’s APIs
Dean R. Wyatte
Google Developer Group Boulder
@drwyatte
August 21, 2018

TensorFlow today
https://www.tensorflow.org/guide/premade_estimators
Eager

About me
• Data Scientist/Machine Learning Engineer (prototype to production)
• TensorFlow user for ~2 years
• Recently transitioned to TensorFlow higher-level APIs

TensorFlow low-level API
• In the beginning, there was the computational graph
• Lazy evaluation + automatic differentiation + Google backing = ML Win
• The TensorFlow session is required for interacting with the graph
• Resource allocation, inter-device communication, etc.
• The entire system was flexible and powerful, but verbose and
complex, likely intimidating would-be users

TensorFlow warm-up:
Anatomy of a TensorFlow script
import tensorflow as tf
images = tf.placeholder(tf.float32, [None, 224, 224, 3])
labels = tf.placeholder(tf.float32, [None, 1000])
W_conv1 = tf.Variable(tf.random_normal([3, 3, 3, 64]))
b_conv1 = tf.Variable(tf.zeros([64]))
W_fc = tf.Variable(tf.random_normal([56*56*128, 4096]))
b_fc = tf.Variable(tf.zeros([4096]))
W_probs = tf.Variable(tf.random_normal([4096, 1000]))
b_probs = tf.Variable(tf.zeros([1000]))
conv1 = tf.nn.relu(tf.nn.conv2d(images, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)
pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
conv2 = tf.nn.relu(tf.nn.conv2d(pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)
fc = tf.nn.relu(tf.matmul(tf.reshape(pool2, [-1, 56*56*128]), W_fc) + b_fc)
probs = tf.nn.softmax(tf.matmul(fc, W_probs) + b_probs)
loss = tf.reduce_mean(-tf.reduce_sum(labels*tf.log(probs), axis=1))
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
Image
Convolution
Subsampling
Convolution
Subsampling
Fully
Connected
Probabilities
Model

TensorFlow warm-up:
Anatomy of a TensorFlow script
def yield_batches(data_dir):
# load/preprocess data
yield images, labels
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(100):
train_loss, test_loss, n_train, n_test = 0
for batch_images, batch_labels in yield_batches(train_data_dir):
l, _ = sess.run([loss, optimizer], feed_dict={images: batch_images, labels: batch_labels})
n_train += len(batch_images)
train_loss += l
for batch_images, batch_labels in yield_batches(test_data_dir):
l = sess.run(loss, feed_dict={images: batch_images, labels: batch_labels})
n_test += len(batch_images)
test_loss += l
print('Train loss: {} Test loss: {}'.format(train_loss/n_train, test_loss/n_test))
Training
loop
Data generator

On Keras
• High-level neural network API “for humans”
• Originally built on top of Theano, also
supports TensorFlow and CNTK backends
• Can now import directly from TensorFlow
Theano TensorFlow CNTK

TensorFlow Keras
import tensorflow as tf
images = tf.placeholder(tf.float32, [None, 224, 224, 3])
labels = tf.placeholder(tf.float32, [None, 1000])
conv1 = tf.nn.relu(tf.nn.conv2d(images, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
...
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.models import Model
images = Input(shape=[224, 224, 3])
conv1 = Conv2D(64, [3, 3], padding='same', activation='relu')(images)
pool1 = MaxPooling2D([2, 2], padding='same')(conv1)
conv2 = Conv2D(128, [3, 3], padding='same', activation='relu')(pool1)
pool2 = MaxPooling2D([2, 2], padding='same')(conv2)
fc = Dense(4096, activation=‘relu’)(Flatten()(pool2))
probs = Dense(1000, activation=‘softmax’)(fc)
optimizer = SGD(lr=0.01)
model = Model(images, probs)
model.compile(optimizer, loss=‘categorical_crossentropy’)
model.fit(train_images, train_labels,
validation_data=[test_images, test_labels])

TensorFlow higher-level APIs
• Try to satisfy simplicity/flexibility
tradeoff
• Provide low-level benefits while still
enabling user to focus on the model
• This talk primarily covers Datasets and
Estimators
• Rapid development over last ~1 year
• Mostly backward compatible across
releases, but conventions subject to change

Datasets
• Standard data loading and preprocessing blocks model
• High, consistent GPU utilization can vastly speed up training
def yield_batches(data_dir):
# load/preprocess data
yield images, labels
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_loss, test_loss, n_train, n_test = 0
for batch_images, batch_labels in yield_batches(data_dir):
l, _ = sess.run([loss, optimizer],
feed_dict={images: batch_images,
labels: batch_labels})

Datasets
• TensorFlow Datasets provide simple API for designing input pipelines
decoupled from model without having to write threaded data
generator
• Part of core API since TensorFlow 1.4 (November 2017)
• Functionality previously provided by QueueRunner, now deprecated
(incompatible with eager execution)
https://www.tensorflow.org/guide/datasets

Dataset example
def load_image(filename):
buffer = tf.read_file(filename)
image = tf.image.decode_png(buffer, channels=3)
image = tf.cast(image, tf.float32)
return image
def input_fn(data_dir, batch_size):
filenames = glob.glob(os.path.join(data_dir, '**', '*.png'), recursive=True)
classnames = [os.path.basename(os.path.dirname(filename)) for filename in filenames]
lookup = {v: k for k, v in enumerate(sorted(set(classnames)))}
labels = [lookup[classname] for classname in classnames]
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.shuffle(buffer_size=1000000)
dataset = dataset.map(lambda filename, label: (load_image(filename), label))
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(1)
iterator = dataset.make_one_shot_iterator()
images, labels = iterator.get_next()
return images, labels
Ensure at least one batch enqueued

Dataset best practices
• In practice, Datasets can increase
training speed by 200-300%
• prefetch overlaps the work of
a producer and consumer
• map accepts a
num_parallel_calls
argument for producer
parallelism
https://www.tensorflow.org/performance/datasets_performance

Estimators
• Directly managing graph and session
can lead to boilerplate, inefficient
code
• See Keras, Datasets
• TensorFlow Estimators expose
interface that hides these details
while remaining flexible and efficient
• Other benefits: Distributed training,
pre-made estimators
TensorFlow Estimators: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks
https://arxiv.org/abs/1708.02637

Estimators
• Part of core API since TensorFlow 1.1 (April 2017), but still maturing
as TensorFlow evolves
• Resemble scikit-learn estimators (uniform methods, separation of
initialization and learning)
https://www.tensorflow.org/guide/estimators

Estimator example
def model_fn(features, labels, mode):
conv1 = tf.nn.relu(tf.nn.conv2d(features, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)
pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=‘SAME')
optimizer = tf.train.GradientDescentOptimizer(0.01)
train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
estimator = tf.estimator.Estimator(model_fn=model_fn)
estimator.train(...)
Note that we no longer define placeholders.
Features (inputs) and labels are tensors
provided by caller
Estimators return an EstimatorSpec
with ops required for training,
evaluation, etc.
Manages session

Estimators and Datasets: Better together
• estimator.train|evaluate|predict are unary methods
accepting a function that returns two tensors, inputs and target
• Or a Dataset
def input_fn(data_dir, ...)
dataset = tf.data.Dataset(...)
dataset = dataset.map(...).batch(batch_size).prefetch(1)
return dataset
def model_fn(features, labels, mode):
model = ...
loss = ...
train_op = ...
return EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
estimator = Estimator(model_fn=model_fn)
estimator.train(lambda: input_fn(train_data_dir, ...))
estimator.evaluate(lambda: input_fn(test_data_dir, ...))

Distributed training with Estimators and Datasets
• Early stage feature to abstract away details of distributed training
• https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/distribute
• DistributionStrategy specifies the nature of distribution
• Currently supports model-replication with synchronous updates (MirroredStrategy)
distribution = tf.contrib.distribute.MirroredStrategy(['/device:GPU:0', '/device:GPU:1'])
config = tf.estimator.RunConfig(train_distribute=distribution)
estimator = tf.estimator.Estimator(model_fn=model_fn, config=config)
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp0ar31ejr', '_tf_random_seed': None,
'_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600,
'_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000,
'_log_step_count_steps': 100, '_train_distribute':
<tensorflow.contrib.distribute.python.mirrored_strategy.MirroredStrategy object at 0x7f161d4b7b00>,
'_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec
object at 0x7f161d4b7c18>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master':
'', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Device is available but not used by distribute strategy: /device:CPU:0
INFO:tensorflow:Configured nccl all-reduce.
INFO:tensorflow:batch_all_reduce invoked for batches size = 14 with algorithm = nccl, num_packs = 1,
agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10

TensorFlow Serving
• Set of APIs for defining how a model should handle
requests without requiring changes to server
architecture
• tensorflow-model-server is a lightweight C++ wrapper
over TensorFlow session that handles API requests via
gRPC
inputs_info = tf.saved_model.utils.build_tensor_info(inputs_tensor)
inputs_def = {tf.saved_model.signature_constants.CLASSIFY_INPUTS: inputs_info}
outputs_info = tf.saved_model.utils.build_tensor_info(outputs_tensor)
outputs_def = {tf.saved_model.signature_constants.CLASSIFY_OUTPUT_SCORES: outputs_def}
signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs=inputs_def,
outputs=outputs_def,
method_name=tf.saved_model.signature_constants.CLASSIFY_METHOD_NAME)
)
...
* For Estimators: https://www.tensorflow.org/guide/saved_model#using_savedmodel_with_estimators

Summary
• TensorFlow high-level APIs allow users to focus on the model without
sacrificing too much flexibility
• Datasets enable users to design performant input pipelines
• Estimators hide the details of the session and reduce boilerplate
The future
• High-level APIs are usable now, but still maturing as TensorFlow evolves
• Compatibility with Keras models via existing methods or conversion to Estimators
• Distributed training with minimal code changes
• Compatibility with eager execution

Thank You
Dean R. Wyatte
Google Developer Group Boulder
@drwyatte
August 21, 2018

A Tour of Tensorflow's APIs

More Related Content

A Tour of Tensorflow's APIs