Piotr Wojciechowski: Inference optimization techniques

2019-03-20 12 0 532 YouTube

Download Convert to MP3

Contributed Talk at the PL in ML: Polish View on Machine Learning 2018 Conference (plinml.mimuw.edu.pl). Abstract: GPUs are frequently used for the compute-intensive task of training Deep Neural Networks. However, some in the industry recommend CPUs for inference because of concerns about additional communication overhead(*).In this presentation, we will dive deep into Deep Neural Network Inference. You will learn how the CUDA programming model supports high performance inference on GPUs. You will learn how NVIDIA's TensorRT platform optimizes networks for inference at low latency and high throughput. Finally, we will discuss various optimizations strategies and common benchmarking criteria

Piotr Wojciechowski: Inference optimization techniques

Related Videos