Hardware Sensitivity
- End-to-End Training
- FP32 Throughput
- Compute Utilization
- FP32 Utilization
- FP16 Throughput
- Hardware Specifications
Frameworks and GPU Specifications
Here we present our performance study for different type of GPUs. The following table shows the specifications of the GPUs we tested:
# of Multi-processors | Core count | Max Clock Rate (MHz) | Memory Size (GB) | LLC Size (MB) | Memory Bus Type | Memory BW (GB/s) | Bus Interface | |
---|---|---|---|---|---|---|---|---|
TitanXp | 30 | 3840 | 1582 | 12 | 3 | GDDR5X | 547.6 | PCIe 3.0 |
1080 Ti | 30 | 3584 | 1582 | 11 | 2.75 | GDDR5X | 352.2 | PCIe 3.0 |
P4000 | 14 | 1792 | 1480 | 8 | 2 | GDDR5 | 243 | PCIe 3.0 |
Titan V | 80 | 5120 | 1455 | 12 | 4.5 | HBM2 | 652.8 | PCIe 3.0 |
V100-PCIe | 80 | 5120 | 1370 | 32 | 6 | HBM2 | 900 | PCIe 3.0 |
V100-SXM2 | 80 | 5120 | 1455 | 16 | 6 | HBM2 | 900 | NVLink |
P100[12GB] | 56 | 3584 | 1303 | 12 | 4 | HBM2 | 549 | PCIe 3.0 |
P100[16GB] | 56 | 3584 | 1480 | 16 | 4 | HBM2 | 720 | PCIe 3.0 |
2080 Ti | 68 | 4352 | 1635 | 11 | 6 | GDDR6 | 616 | PCIe 3.0 |
Here are the framework and batch size details of all benchmarks that we used for our hardware sensitivity study.
ResNet-50 | InceptionV3 | Sockeye | NMT | Transformer | A3C | Faster RCNN | |
---|---|---|---|---|---|---|---|
Framework | MXNet v1.1.0 | MXNet v1.1.0 | MXNet v1.1.0 | TensorFlow v1.8.0 | TensorFlow v1.8.0 | MXNet v1.1.0 | MXNet v1.1.0 |
Batch Size | 32 images | 32 images | 32 sentences | 128 sentences | 2048 sentences | 32 snapshots | 1 image + 128 ROIs |