TBD is a new benchmark suite for DNN training that currently covers six major application domains and eight different state-of-the-art models.
The applications in this suite are selected based on extensive conversations with ML developers and users from both industry and academia.
For all application domains, we select recent models capable of delivering state-of-the-art results.
We intend to continually expand TBD with new applications and models based on feedback and support from the community.
Image classification is the archetypal
deep learning application, as this was the first domain where a deep
neural network (AlexNet) proved to be a watershed, beating
all prior traditional methods. In our work, we use two very recent
models, Inception-v3 and Resnet, which follow a structure
similar to AlexNet’s CNN model, but improve accuracy through
novel algorithm techniques that enable extremely deep networks.
Unlike image processing, machine
translation involves the analysis of sequential data and typically
relies on RNNs using LSTM cells as its core algorithm. We select
NMT and Sockeye, developed by the TensorFlow and Amazon
Web Service teams, respectively, as representative RNN-based
models in this area. We also include an implementation of the recently
introduced Transformer model, which achieves a new
state-of-the-art in translation quality using attention layers as an
alternative to recurrent layers.
Training curve (time in hours)
Training curve (time in epochs)
Object detection applications, such as
face detection, are another popular deep learning application and
can be thought of as an extension of image classification, where
an algorithm usually first breaks down an image into regions of
interest and then applies image classification to each region. We
chose to include Faster R-CNN, which achieves state-of-the-art
results on the Pascal VOC datasets. A training iteration
consists of the forward and backward passes of two networks (one
for identifying regions and one for classification), weight sharing
and local fine-tuning. The convolution stack in a Faster R-CNN network is usually a standard image classification network, in our
work: a 101-layer ResNet.
Training quality: 70+% mAP so far
Throughtput: 2.29 samples/sec
GPU Compute Utilizaiton: 90.29%
FP32 Utilization: 70.9%
Training quality: 65+% mAP so far
Throughtput: 2.32 samples/sec
GPU Compute Utilizaiton: 89.41%
FP32 Utilization: 58.9%%
Deep Speech 2 is an end-to-end
speech recognition model from Baidu Research. It is able to accurately
recognize both English and Mandarin Chinese, two very
distant languages, with a unified model architecture and shows
great potential for deployment in industry. The Deep Speech 2
model contains two convolutional layers, plus seven regular recurrent
layers or Gate Recurrent Units (GRUs), unlike the machine translation RNN
models included in our benchmark suite, which use LSTM layers..
Generative Adversarial Networks
A generative adversarial
network (GAN) trains two networks, one generator network and
one discriminator network. The generator is trained to generate
data samples that mimic the real samples, and the discriminator
is trained to distinguish whether a data sample is genuine or synthesized.
GANs are used, for example, to synthetically generate
photographs that look at least superficially authentic to human
While GANs are powerful generative models, GAN training
suffers from instability. The WGAN is a milestone as it makes
great progress towards stable training. Recently Gulrajani et al.
proposes an improvement based on the WGAN to enable stable
training on a wide range of GAN architectures. We include this
model in our benchmark suite, since it is one of the leading DNN
algorithms for unsupervised learning.
Deep Reinforcement Learning
Deep neural networks also drive the recent advances in reinforcement learning,
which have contributed to the creation of the first artificial agents
to achieve human-level performance across challenging domains,
such as the game of Go and various classical computer games. We
include the A3C algorithm in our benchmark suite, as it has
become one of the most popular deep reinforcement learning techniques,
surpassing the DQN training algorithms, and works in both single and distributed machine settings. A3C relies on asynchronously
updated policy and value function networks trained in
parallel over several processing threads.