17 November. 17.30 - 18.10 | Garage

Semantic segmentation is the classification of every pixel in an image/video. The segmentation partitions a digital image into multiple objects to simplify/change the representation of the image into something that is more meaningful and easier to analyze [1][2]. The technique has a wide variety of applications ranging from perception in autonomous driving scenarios to cancer cell segmentation for medical diagnosis. Exponential growth in the datasets that require such segmentation is driven by improvements in the accuracy and quality of the sensors generating the data. This growth is further compounded by exponential advances in cloud technologies enabling the storage and compute available for such applications. The need for the semantically segmented datasets is a key requirement to improve the accuracy of inference engines that are built upon them. Streamlining the accuracy and efficiency of these systems directly affects the value of the business outcome for organizations that are developing such functionalities as a part of their AI strategy. This presentation details workflows for labeling, preprocessing, modeling, and evaluating performance/accuracy. Scientists and engineers leverage domain specific features/tools that support the entire workflow from labeling the ground truth, handling data from a wide variety of sources/formats, developing models and finally deploying these models. Users can scale their deployments optimally on GPU-based cloud infrastructure to build accelerated training and inference pipelines while working with big datasets. These environments are optimized for engineers to develop such functionality with ease and then scale against large datasets with Spark-based clusters on the cloud. An illustration of these techniques leverages MATLAB and its application-specific toolboxes and domain-specific functionalities. This accelerates the development of advanced driver-assistance systems (ADAS). It uses the CamVid dataset [3] to create and train a fully convolutional neural network FCN-8s [4] initialized by VGG-16 [5] weights. This model is then deployed at scale to execute optimally on NVIDIA® GPUs by leveraging the GPU Coder™. The same approaches can be used for other fully convolutional networks such as SegNet [6] or U-Net [7]. The trained models can be operationalized to scale against large datasets by leveraging Spark based systems on the cloud while measuring the performance and accuracy. The end outcome is faster semantic labeling of large datasets and better performance of labeling and inference pipelines. The architecture of a typical system is discussed both from the perspective of the domain/subject matter experts who develop the convolutional neural network models, as well as the IT/OT persona that is responsible for the deployment of the models, ETL pipelines, storage, and other underlying infrastructure. DevOps/MLOps maturity in such applications is discussed along with a high-level business viewpoint on the need and value of well architected systems and practices. Finally, the need for governance and lifecycle management of the models and underlying data for such applications is also discussed. The performance benchmarks presented for these end-to-end workflows underlines how engineers can scale their semantic segmentation workloads against large datasets of image and video data. References: [1] Linda G. Shapiro and George C. Stockman (2001): “Computer Vision”, pp 279–325, New Jersey, Prentice-Hall, ISBN 0-13-030796-3 [2] Barghout, Lauren, and Lawrence W. Lee. “Perceptual information processing system.” Paravue Inc. U.S. Patent Application 10/618,543, filed July 11, 2003. [3] Brostow, G. J., J. Fauqueur, and R. Cipolla. “Semantic object classes in video: A high-definition ground truth database.” Pattern Recognition Letters. Vol. 30, Issue 2, 2009, pp 88-97. [4] Long, J., E. Shelhamer, and T. Darrell. “Fully Convolutional Networks for Semantic Segmentation.” Proceedings of the IEE