Debugging Tensorflow Guide
Debugging Tensorflow Guide
---
## .author[Jongwook Choi]
### .small[.white[Feb 17th, 2017] <br/> .green[Initial Version: June 18th, 2016]]
### .x-small[https://github.com/wookayin/tensorflow-talk-debugging]
---
layout: false
.right.img-33[]
[snuvl-web]: http://vision.snu.ac.kr
[wookayin-gh]: https://github.com/wookayin
---
## About
This talk aims to share you with some practical guides and tips on writing and
debugging TensorFlow codes.
--
... because you might find that debugging TensorFlow codes is something like ...
---
class: center, middle, no-number, bg-full
background-image: url(images/meme-doesnt-work.jpg)
background-repeat: no-repeat
background-size: contain
---
## Welcome!
### .green[Contents]
- Introduction: Why debugging in TensorFlow is difficult
- Basic and advanced methods for debugging TensorFlow codes
- General tips and guidelines for easy-debuggable code
- .dogdrip[Benchmarking and profiling TensorFlow codes]
---
---
template: inverse
# Debugging?
---
--
- Difficult!
--
---
.center.img-33[]
---
---
```python
W1, b1, W2, b2, W3, b3 = init_parameters()
def train():
for epoch in range(10):
epoch_loss = 0.0
batch_steps = mnist.train.num_examples / batch_size
for step in range(batch_steps):
batch_x, batch_y = mnist.train.next_batch(batch_size)
* y_pred, loss, gradients = multilayer_perceptron(batch_x, batch_y)
for v, grad_v in zip(all_params, gradients):
v = v - learning_rate * grad_v
epoch_loss += c / batch_steps
print "Epoch %02d, Loss = %.6f" % (epoch, epoch_loss)
```
---
```python
def multilayer_perceptron(x):
* fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
* fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
return out
def train(session):
batch_size = 200
session.run(tf.initialize_all_variables())
---
```python
def multilayer_perceptron(x):
* fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
* fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
return out
```python
* _, c = session.run([train_op, loss], {x: batch_x, y: batch_y})
```
---
## [`Session.run()`][apidocs-sessionrun]
The most important method in TensorFlow --- where every computation is performed!
[apidocs-sessionrun]:
https://www.tensorflow.org/versions/master/api_docs/python/train.html#scalar_summar
y
---
<!--
===================================================================================
========= -->
<!--
===================================================================================
========= -->
---
template: inverse
---
## Debugging Scenarios
--
---
.blue[**Basic ways:**]
.blue[**Advanced ways:**]
---
## (1) Fetch tensors via `Session.run()`
```python
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
bias = tf.Variable(1.0)
# OK, print 10.000; for evaluating y_pred only, input to y is not required
*print('pred_y(x) = %.3f' % session.run(y_pred, {x: 3.0}))
---
```python
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
* return out, fc1, fc2
net = {}
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
*pred, net['fc1'], net['fc2'] = multilayer_perceptron(x)
```
---
## (1) Fetch tensors via `Session.run()`
.green[**The Good:**]
--
.red[**The Bad:**]
---
```python
def alexnet(x):
assert x.get_shape().as_list() == [224, 224, 3]
conv1 = conv_2d(x, 96, 11, strides=4, activation='relu')
pool1 = max_pool_2d(conv1, 3, strides=2)
conv2 = conv_2d(pool1, 256, 5, activation='relu')
pool2 = max_pool_2d(conv2, 3, strides=2)
conv3 = conv_2d(pool2, 384, 3, activation='relu')
conv4 = conv_2d(conv3, 384, 3, activation='relu')
conv5 = conv_2d(conv4, 256, 3, activation='relu')
pool5 = max_pool_2d(conv5, 3, strides=2)
fc6 = fully_connected(pool5, 4096, activation='relu')
fc7 = fully_connected(fc6, 4096, activation='relu')
output = fully_connected(fc7, 1000, activation='softmax')
return conv1, pool1, conv2, pool2, conv3, conv4, conv5, pool5, fc6, fc7
---
net = {}
output = alexnet(images, net)
# access intermediate layers like net['conv5'], net['fc7'], etc.
```
---
```python
class AlexNetModel():
# ...
def build_model(self, x):
assert x.get_shape().as_list() == [224, 224, 3]
self.conv1 = conv_2d(x, 96, 11, strides=4, activation='relu')
self.pool1 = max_pool_2d(self.conv1, 3, strides=2)
self.conv2 = conv_2d(self.pool1, 256, 5, activation='relu')
self.pool2 = max_pool_2d(self.conv2, 3, strides=2)
self.conv3 = conv_2d(self.pool2, 384, 3, activation='relu')
self.conv4 = conv_2d(self.conv3, 384, 3, activation='relu')
self.conv5 = conv_2d(self.conv4, 256, 3, activation='relu')
self.pool5 = max_pool_2d(self.conv5, 3, strides=2)
self.fc6 = fully_connected(self.pool5, 4096, activation='relu')
self.fc7 = fully_connected(self.fc6, 4096, activation='relu')
self.output = fully_connected(self.fc7, 1000, activation='softmax')
return self.output
model = AlexNetModel()
output = model.build_model(images)
# access intermediate layers like self.conv5, self.fc7, etc.
```
---
---
[tf-partial-run]:
https://github.com/tensorflow/tensorflow/blob/v1.0.0/tensorflow/python/client/sessi
on.py#L777
---
## (2) Tensorboard
[tf-tensorboard]:
https://www.tensorflow.org/versions/master/how_tos/summaries_and_tensorboard/index.
html
<br/><br/>
[apidocs-summary]:
https://www.tensorflow.org/versions/master/api_docs/python/summary/generation_of_su
mmaries_#scalar
[apidocs-summarywriter]:
https://www.tensorflow.org/versions/master/api_docs/python/summary/generation_of_su
mmaries_#FileWriter
---
```python
def multilayer_perceptron(x):
# inside this, variables 'fc1/weights' and 'fc1/bias' are defined
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
scope='fc1')
* tf.summary.histogram('fc1', fc1)
* tf.summary.histogram('fc1/sparsity', tf.nn.zero_fraction(fc1))
---
```python
*global_step = tf.Variable(0, dtype=tf.int32, trainable=False)
train_op = tf.train.AdamOptimizer(learning_rate=0.001)\
.minimize(loss, global_step=global_step)
```
```python
def train(session):
batch_size = 200
session.run(tf.global_variables_initializer())
* merged_summary_op = tf.summary.merge_all()
* summary_writer = tf.summary.FileWriter(FLAGS.train_dir, session.graph)
---
Scalar Summary
.center.img-100[]
---
## Tensorboard: A Quick Tutorial (Demo)
.center.img-66[]
---
current_loss = eval_ret[self.loss]
if self.merged_summary_op in eval_tensors:
self.summary_writer.add_summary(
eval_ret[self.merged_summary_op], current_step)
```
- I recommend to take simple and essential scalar summaries *only* (e.g.
train/validation loss, overall accuracy, etc.), and to include debugging stuffs
only on demand
---
.center.img-50[]
---
## (3) [`tf.Print()`][tf-print]
[tf-print]:
https://www.tensorflow.org/versions/r0.8/api_docs/python/control_flow_ops.html#Prin
t
```python
tf.Print(input, data, message=None,
first_n=None, summarize=None, name=None)
```
- It creates .blue[an **identity** op] with the side effect of printing `data`,
when this op is evaluated.
---
```python
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu)
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu)
out = layers.fully_connected(fc2, 10, activation_fn=None)
* out = tf.Print(out, [tf.argmax(out, 1)],
* 'argmax(out) = ', summarize=20, first_n=7)
return out
```
```x-small
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 6 4 4 6 4 4 6 6 4 0
6 4 6 4 4 6 0 4...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 0 0 3 6 4 3 6 6 3 4
4 4 4 4 3 4 6 7...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 4 0 6 6 6 0 7 3 0 6 7
3 6 0 3 4 3 3 6...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 1 0 0 0 3 3 7 0 8 1 2
0 9 9 0 3 4 6 6...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 0 9 0 4 9 9 0 8 2 7
3 9 1 8 3 9 7 8...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 1 1 9 0 8 3 0 9 9 0
2 6 7 7 3 3 3 9...]
I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 6 9 8 3 9 1 0 1 1 9 3
2 3 9 9 3 0 6 6...]
[2016-06-03 00:11:08.661563] Epoch 00, Loss = 0.332199
```
---
.red[Cons:]
---
## (3) [`tf.Assert()`][tf-assert]
[tf-assert]:
https://www.tensorflow.org/versions/r0.8/api_docs/python/control_flow_ops.html#Asse
rt
* Asserts that the given condition is true, *when evaluated* (during the
computation)
* If condition evaluates to `False`, print the list of tensors in `data`,
and an error is thrown.
`summarize` determines how many entries of the tensors to print.
```python
tf.Assert(condition, data, summarize=None, name=None)
```
---
## `tf.Assert`: Examples
```python
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
# let's ensure that all the outputs in `out` are positive
* tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_positive')
return out
```
--
## `tf.Assert`: Examples
```python
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
# let's ensure that all the outputs in `out` are positive
assert_op = tf.Assert(tf.reduce_all(out > 0), [out],
name='assert_out_positive')
* with tf.control_dependencies([assert_op]):
* out = tf.identity(out, name='out')
return out
```
```python
# ... same as above ...
* out = tf.with_dependencies([assert_op], out)
return out
```
---
## `tf.Assert`: Examples
Another good way: store all the created assertion operations into a collection,
(merge them into a single op), and explicitly evaluate them using `Session.run()`
```python
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
* tf.add_to_collection('Asserts',
* tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_gt_0')
* )
return out
```python
... = session.run([train_op, assert_op], feed_dict={...})
```
---
## Some built-in useful Assert ops
```python
tf.assert_negative(x, data=None, summarize=None, name=None)
tf.assert_positive(x, data=None, summarize=None, name=None)
tf.assert_proper_iterable(values)
tf.assert_non_negative(x, data=None, summarize=None, name=None)
tf.assert_non_positive(x, data=None, summarize=None, name=None)
tf.assert_equal(x, y, data=None, summarize=None, name=None)
tf.assert_integer(x, data=None, summarize=None, name=None)
tf.assert_less(x, y, data=None, summarize=None, name=None)
tf.assert_less_equal(x, y, data=None, summarize=None, name=None)
tf.assert_rank(x, rank, data=None, summarize=None, name=None)
tf.assert_rank_at_least(x, rank, data=None, summarize=None, name=None)
tf.assert_type(tensor, tf_type)
tf.is_non_decreasing(x, name=None)
tf.is_numeric_tensor(tensor)
tf.is_strictly_increasing(x, name=None)
```
---
- [`pdb`][pdb]
- [`ipdb`][ipdb]
- [`pudb`][pudb]
- set breakpoint
- pause, continue
- inspect stack-trace upon exception
- watch variables and evaluate expressions interactively
[pdb]: https://docs.python.org/2/library/pdb.html
[ipdb]: https://pypi.python.org/pypi/ipdb
[pudb]: https://pypi.python.org/pypi/pudb
---
## Debugger: Usage
```python
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
* import ipdb; ipdb.set_trace() # XXX BREAKPOINT
return out
```
.img-75.center[

]
---
## Debugger: Usage
```python
for i in range(batch_steps):
batch_x, batch_y = mnist.train.next_batch(batch_size)
* if (np.argmax(batch_y, axis=1)[:7] == [4, 9, 6, 2, 9, 6, 5]).all():
* import pudb; pudb.set_trace() # XXX BREAKPOINT
_, c = session.run([train_op, loss],
feed_dict={x: batch_x, y: batch_y})
```
---
To get .green[variables]:
---
## `IPython.embed()`
.green[ipdb/pudb:]
```python
import pudb; pudb.set_trace()
```
.green[embed:]
```python
from IPython import embed; embed()
```
<!--
If using [`%pdb` magic][pdb-magic] in IPython notebook:
```python
oops
```
[pdb-magic]: http://ipython.readthedocs.io/en/stable/interactive/magics.html?
highlight=magic#magic-pdb
-->
---
Our debugging tools so far can be used for debugging outside `Session.run()`.
--
- The 'custom' operation can be designed for logging or debugging purposes (like
[PrintOp][tf-code-printop])
- ... but very burdensome (need to compile, define op interface, and use it ...)
[docs-custom-ops]:
https://www.tensorflow.org/versions/master/how_tos/adding_an_op/index.html#implemen
t-the-kernel-for-the-op
[tf-code-printop]:
https://github.com/tensorflow/tensorflow/blob/v1.0.0/tensorflow/core/kernels/loggin
g_ops.cc#L53
---
```python
tf.py_func(func, inp, Tout, stateful=True, name=None)
```
```python
def my_func(x):
# x will be a numpy array with the contents of the placeholder below
return np.sinh(x)
[docs-py-func]:
https://www.tensorflow.org/versions/r1.0/api_docs/python/script_ops.html#py_func
---
In other words, we are now able to use the following (hacky) .green[**tricks**]
by intercepting the computation being executed on the graph:
---
```python
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
---
```python
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
---
.img-90.center[
![]
(https://camo.githubusercontent.com/4c671d2b359c9984472f37a73136971fd60e76e4/687474
703a2f2f692e696d6775722e636f6d2f6e30506d58516e2e676966)
]
---
```python
import tensorflow.python.debug as tf_debug
sess = tf.Session()
<!--
Although it is not yet fully functional and has some bugs, it is quite usable!
(Google Brain team will complete and announce it soon)
-->
---
.img-100.center[

]
---
.img-100.center[

]
<!--
## Teaser: `tfdb`
A new TensorFlow debugger and helper library will be published soon :-)
```python
import tfdb
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
-->
---
.img-80.center[

]
---
```python
sess = tf_debug.LocalCLIDebugWrapperSession(sess)
sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)
```
Tensor filters are just python functions `(datum, tensor) -> bool`:
```python
def has_inf_or_nan(datum, tensor):
_ = datum # Datum metadata is unused in this predicate.
if tensor is None:
# Uninitialized tensor doesn't have bad numerical values.
return False
elif (np.issubdtype(tensor.dtype, np.float) or
np.issubdtype(tensor.dtype, np.complex) or
np.issubdtype(tensor.dtype, np.integer)):
return np.any(np.isnan(tensor)) or np.any(np.isinf(tensor))
else:
return False
```
Running tensor filters are, therefore, quite slow.
---
In a tensor dump mode (the **run-end UI**), the debugger shows the list of tensors
dumped in the `session.run()` call:
.img-100.center[

]
---
Commands:
---
.img-80.center[

]
---
## `tfdbg`: Stepper
.img-100.center[

]
---
<div class="center">
<iframe width="672" height="378" src="https://www.youtube.com/embed/CA7fjRfduOI"
frameborder="0" allowfullscreen></iframe>
</div>
<p>
.small[
<br/>
See also: [Debug TensorFlow Models with tfdbg (@Google Developers Blog)]
(https://developers.googleblog.com/2017/02/debug-tensorflow-models-with-tfdbg.html)
]
---
---
## Debugging: Summary
.green[There is no silver bullet; one might need to choose the most convenient and
suitable debugging tool, depending on the case]
---
template: inverse
---
- Learn to use debugging tools, but do not solely rely on them when a problem
occurs.
- Sometimes, just sitting down and reading through 👀 your code with ☕ (a careful
code review!) would be greatly helpful.
---
Almost .red[all] of rule-of-thumb tips and guidelines for writing good, neat, and
defensive codes can be applied to TensorFlow codes :)
[fail-fast]: https://en.wikipedia.org/wiki/Fail-fast
[dry-principle]: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
<p>
[debug-tip-matloff]:
http://heather.cs.ucdavis.edu/~matloff/UnixAndC/CLanguage/Debug.html
---
```python
net['fc7'] = tf.nn.xw_plus_b(net['fc6'], vars['fc7/W'], vars['fc7/b'])
---
<p>
---
```python
ValueError: Cannot feed value of shape (200,)
for Tensor u'Placeholder_1:0', which has shape '(?, 10)'
ValueError: Tensor conversion requested dtype float32 for Tensor with
* dtype int32: 'Tensor("Variable_1/read:0", shape=(256,), dtype=int32)'
```
A better stacktrace:
```python
ValueError: Cannot feed value of shape (200,)
for Tensor u'placeholder_y:0', which has shape '(?, 10)'
ValueError: Tensor conversion requested dtype float32 for Tensor with
* dtype int32: 'Tensor("fc1/weights/read:0", shape=(256,), dtype=int32)'
```
---
```python
def multilayer_perceptron(x):
W_fc1 = tf.Variable(tf.random_normal([784, 256], 0, 1))
b_fc1 = tf.Variable([0] * 256) # wrong here!!
fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1)
# ...
```
```python
>>> fc1
<tf.Tensor 'xw_plus_b:0' shape=(?, 256) dtype=float32>
```
Better:
```python
def multilayer_perceptron(x):
W_fc1 = tf.Variable(tf.random_normal([784, 256], 0, 1), name='fc1/weights')
b_fc1 = tf.Variable(tf.zeros([256]), name='fc1/bias')
fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1, name='fc1/linear')
fc1 = tf.nn.relu(fc1, name='fc1/relu')
# ...
```
```python
>>> fc1
<tf.Tensor 'fc1/relu:0' shape=(?, 256) dtype=float32>
```
---
```python
def multilayer_perceptron(x):
* with tf.variable_scope('fc1'):
W_fc1 = tf.get_variable('weights', [784, 256]) # fc1/weights
b_fc1 = tf.get_variable('bias', [256]) # fc1/bias
fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1) # fc1/xw_plus_b
fc1 = tf.nn.relu(fc1) # fc1/relu
# ...
```
```python
import tensorflow.contrib.layers as layers
def multilayer_perceptron(x):
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
* scope='fc1')
# ...
```
---
.small[https://github.com/wookayin/TensorFlowKR-2017-talk-bestpractice]
</div>
- Make sure that your GPU utilization is *always* non-zero (and, near 100%)
- Watch and monitor using `nvidia-smi` or [`gpustat`][gpustat]
.img-75.center[

]
.img-50.center[

]
[gpustat]: https://github.com/wookayin/gpustat
---
<p>
- What we can do
- Use `tfdbg` !!!
- Use [`cProfile`][python-profilers],
[`line_profiler`][line-profiler]
or [`%profile`][ipython-profile] in IPython
- Use [`nvprof`][nvprof] for profiling CUDA operations
- Use CUPTI (CUDA Profiling Tools Interface) [tools][tf-issue-1824] for TF
.img-66.center[]
[python-profilers]: https://docs.python.org/2/library/profile.html
[line-profiler]: https://pypi.python.org/pypi/line_profiler/
[ipython-profile]: https://ipython.org/ipython-doc/3/interactive/magics.html
[nvprof]: http://docs.nvidia.com/cuda/profiler-users-guide/
[tf-issue-1824]: https://github.com/tensorflow/tensorflow/issues/1824
---
## Concluding Remarks
name: last-page
class: center, middle, no-number
.img-66[]
---
name: last-page
class: center, middle, no-number
## Thank You!
#### [@wookayin][wookayin-gh]