Video semantic segmentation network with low latency based on deep learning | [International Journal of Communication Networks and Information Security • 2023]

Author(s):

1. Channappa Gowda D V: Department of Information Science and Engineering, the Oxford College of Engineering,Bangalore 560068,

2. R. Kanagavalli: Department of Information Science and Engineering, the Oxford College of Engineering, Bangalore 560068

Abstract:

Recently, new advances in deep learning algorithms have yielded some fascinating results in the field of computer vision technology. As a result, it can now perform activities that formerly required the use of human vision and the brain. Classification, object identification, and semantic segmentation have all seen substantial advancements in deep learning architecture in the last few years. For still images and movies, there has been a major advancement in the field of semantic segmentation. In the field of autonomous driving, effectively segmenting videos semantically remains a complex task, primarily due to the need for high performance, the substantial costs associated with convolutional neural networks (CNNs), and the critical requirement for reduced latency. To address these issues, a specialized machine-learning framework is being developed. This framework utilizes advanced deep learning structures, specifically SegNet and FlowNet 2.0, applied to the CamVid dataset, enabling precise pixel-by-pixel semantic segmentation of video content with maintained low latency. This approach is particularly beneficial for practical applications, leveraging the strengths of both the SegNet and FlowNet architectures. A key feature of this system is its decision-making network, which evaluates each frame to decide if it should be processed by a segmentation network or an optical flow network, based on a calculated confidence score. This decision process is further refined by the introduction of adaptive key frame scheduling, enhancing the efficiency of the system. Performance testing with the ResNet50 SegNet model yielded a mean Intersection over Union (IoU) of 54.27% and an average processing rate of 19.57 frames per second. Moreover, the integration of FlowNet2.0, optimized for GPU operations, improved the frame processing rate to 30.19 fps, with a mean IoU of 47.65%. This improvement was facilitated by a 47.65% utilization rate of the GPU, leading to a notable enhancement in the speed of video semantic segmentation without compromising on quality.

Page(s): 209-225

DOI: DOI not available

Published: Journal: International Journal of Communication Networks and Information Security, Volume: 15, Issue: 3, Year: 2023

Keywords:

object detection , Semantic Segmentation , Decision Network

References:

References are not available for this document.

Citations

Citations are not available for this document.

Citations

Downloads

Views