Furio Valerio Sordini and Pavel Sulimov
With the increasing popularity of large machine learning models capable of solving complicated tasks in the sphere of natural language processing, computer vision, etc., the need for distributed computation has rocketed significantly. We would like to provide the 'surgery' of parallelization methods from one of the most popular deep learning frameworks - PyTorch. Particularly, we would like to demonstrate two main approaches: data parallelization (when the single module is trained asynchronically in streams) and model parallelization (both horizontal – with several models trained simultaneously, and vertical – when the model parameters are split into groups). Moreover, we will guide through the cases of different resources availability, i.e. what could be done when having only CPUs, a single GPU, or multiple GPUs. Our showing is to be done on an example of urban planning problem solution, where we are creating synthetic cities with deep convolutional generative adversarial neural networks. These models have complicated architecture and billions of parameters when generating images starting from mid-resolution like 256x256, which makes them perfect instances for distributed computation demonstration.
With the increasing popularity of large machine learning models capable of solving complicated tasks in the sphere of natural language processing, computer vision, etc., the need for distributed computation has rocketed significantly. We would like to provide the 'surgery' of parallelization methods from one of the most popular deep learning frameworks - PyTorch. Particularly, we would like to demonstrate two main approaches: data parallelization (when the single module is trained asynchronically in streams) and model parallelization (both horizontal – with several models trained simultaneously, and vertical – when the model parameters are split into groups). Moreover, we will guide through the cases of different resources availability, i.e. what could be done when having only CPUs, a single GPU, or multiple GPUs. Our showing is to be done on an example of urban planning problem solution, where we are creating synthetic cities with deep convolutional generative adversarial neural networks. These models have complicated architecture and billions of parameters when generating images starting from mid-resolution like 256x256, which makes them perfect instances for distributed computation demonstration.