Q: Detector timing? A: The timing in the paper is wrong. The total runtime of PIXOR is 35 ms on a NVIDIA TITAN Xp GPU, which consists of 1 ms data voxelization, 31 ms network forward pass, and 3 ms oriented-NMS. Note that both voxelization and oriented-NMS are implemented on GPU for more efficiency. Q: How many residual layers in Res_block_5 in Figure 2? A: There's a typo in the text. It should be 3. Q: log(dx), log(dy) in regression targets? A: This is a typo as well. They should be dx and dy. Q: Network optimization details on KITTI? A: We train the network with stochastic gradient descent with momentum for 35 epochs on 4 NVIDIA 1080Ti GPUs, with each GPU taking 4 frames. The initial learning rate is 0.01 and we decay it by 10 after 20 and 30 epochs respectively. The training process takes < 4 hours. Q: How do you evaluate on KITTI without having proper 2D detection box? A: We manually set the 2D box height wrt. the BEV detection's distance to ego-car. Specifically, if the distance is larger than 60 meters, we set the 2D box height to 10 pixels; if the distance is larger than 30 meters and smaller than 60 meters, we set the 2D box height to 30 pixels; if the distance is smaller than 30 meters, we set the 2D box height to 50 pixels.