Keynote | Technical
Friday 17th | 13:20 - 13:50 | Theatre 19
One-liner summary:
Training Deep Learning Models on Multiple GPUs in the Cloud
Keywords defining the session:
- Deep Learning
- GPUs
- Scalability
GPUs on the cloud as Infrastructure as a Service (IaaS) seem a commodity. However to efficiently distribute deep learning tasks on several GPUs is challenging. Even if some frameworks offer ways to benefit from data parallelism, devil is in details. Experts in deep learning care about learning rate and batch sizes. But some engineering details can ruin the scalability or efficiency of their training. Besides software implementation issues, latency on communications, GPUDirect support, drivers configuration or even the pricing of cloud providers can have big impact on training times and costs. Results on this topic will be shared.