YOLO-based models are widely used for personal protective equipment (PPE) compliance detection due to their excellent detection performance and efficiency. However, most YOLO models are not competent for detection tas...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
YOLO-based models are widely used for personal protective equipment (PPE) compliance detection due to their excellent detection performance and efficiency. However, most YOLO models are not competent for detection tasks in complex industrial scenarios such as remote surveillance and extremely small targets. In addition, there is a lack of effective model lightweighting and knowledge transfer approaches for industrial deployment. To this end, this paper proposes a Multi-scale and Knowledge-Distilling YOLO (MKD-YOLO) based on YOLOv8n for efficient PPE compliance detection. Specifically, in backbone stage, we design an Efficient Multi-Scale Enhanced Convolution (C2f-EMSEC) module and Large Spatial Pyramid Pooling-Fast (LSPPF) module for multi-scale and global-contextual feature learning as well as reducing model complexity. Then, in neck stage, a refined Bidirectional feature Pyramid Network (BPNet) is designated to capture fine-grained details for extremely small object detection. Moreover, we apply channel-wise knowledge distillation to facilitate model lightweighting and domain-specific knowledge transfer learning. Experiments on our proposed dataset and public datasets show that the proposed MKD-YOLO achieves a new state-of-the-art (SOTA) detection performance and efficiency for practical PPE compliance detection tasks. Codes and the dataset are available at https://***/z1Zjt/MKD-YOLO.
Convolutional neural network is a class of deep neural networks that has made a great breakthrough in image recognition. CNNs are commonly used to detect and classify visual applications so that they are frequently em...
详细信息
With the development of stereoscopic imaging technology, stereoscopic image quality assessment (SIQA) has gradually been more and more important, and how to design a method in line with human visual perception is full...
详细信息
ISBN:
(纸本)9781728173221
With the development of stereoscopic imaging technology, stereoscopic image quality assessment (SIQA) has gradually been more and more important, and how to design a method in line with human visual perception is full of challenges due to the complex relationship between binocular views. In this article, firstly, convolutional neural network (CNN) based on the visual pathway of human visual system (HVS) is built, which simulates different parts of visual pathway such as the optic chiasm, lateral geniculate nucleus (LGN), and visual cortex. Secondly, the two pathways of our method simulate the ‘what’ and ‘where’ visual pathway respectively, which are endowed with different feature extraction capabilities. Finally, we find a different application way for 3D-convolution, employing it fuse the information from left and right view, rather than just extracting temporal features in video. The experimental results show that our proposed method is more in line with subjective score and has good generalization.
Learning-based compression systems have shown great potential for multi-task inference from their latent-space representation of the input image. In such systems, the decoder is supposed to be able to perform various ...
详细信息
ISBN:
(纸本)9781728173221
Learning-based compression systems have shown great potential for multi-task inference from their latent-space representation of the input image. In such systems, the decoder is supposed to be able to perform various analyses of the input image, such as object detection or segmentation, besides decoding the image. At the same time, privacy concerns around visual ana-lytics have grown in response to the increasing capabilities of such systems to reveal private information. In this paper, we propose a method to make latent-space inference more privacy-friendly using mutual information-based criteria. In particular, we show how organizing and compressing the latent representation of the image according to task-specific mutual information can make the model maintain high analytics accuracy while becoming less able to reconstruct the input image and thereby reveal private information.
In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which...
详细信息
ISBN:
(纸本)9781728173221
In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which takes binocular fusion, binocular rivalry and binocular suppression into account to imitate the complex binocular visual mechanism in the human brain. Besides, to extract spatial saliency features of the left view, the right view, and the fusion view, saliency generating layers (SGLs) are applied in the network. The SGL apply multi-scale dilated convolution to emphasize essential spatial information of the input features. Experimental results on four public stereoscopic image databases demonstrate that the proposed method outperforms the state-of-the-art SIQA methods on both symmetrical and asymmetrical distortion stereoscopic images.
We demonstrate a new capture system that allows generation of virtual views corresponding with a virtual camera that is placed between the players on a sports field. Our depth estimation and segmentation pipeline can ...
详细信息
We demonstrate a new capture system that allows generation of virtual views corresponding with a virtual camera that is placed between the players on a sports field. Our depth estimation and segmentation pipeline can reduce 2K resolution views from 16 cameras to patches in a single 4K resolution texture atlas. We have created a real time, WebGL 2 based, playback application that renders an arbitrary view from the 4K atlas. The application allows a user to change viewpoint in real time. Additionally, to interpret the scene, a user can also remove objects such as a player or the ball. At the conference we will demonstrate both the automatic multi-camera conversion pipeline and the real-time rendering/object removal on a smartphone.
In this paper we study techniques for accurate detection, localization, and tracking of multiple people in an indoor scene covered by multiple top-view fisheye cameras. This is a rarely studied setting within the topi...
详细信息
ISBN:
(纸本)9781728173221
In this paper we study techniques for accurate detection, localization, and tracking of multiple people in an indoor scene covered by multiple top-view fisheye cameras. This is a rarely studied setting within the topic of multi-camera object tracking. The experimental results on test videos exhibit good performance for practical use. We also propose methods to account for occlusion by scene objects at different stages of the algorithm that lead to improved results.
Nowadays, the wave of digitization, networking and informatization is sweeping the world, which makes the visualimage become an important way of communication and transmission of global culture. The purpose of this p...
详细信息
The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limi...
详细信息
ISBN:
(纸本)9781728173221
The ever higher quality and wide diffusion of fake images have spawn a quest for reliable forensic tools. Many GAN image detectors have been proposed, recently. In real world scenarios, however, most of them show limited robustness and generalization ability. Moreover, they often rely on side information not available at test time, that is, they are not universal. We investigate these problems and propose a new GAN image detector based on a limited sub-sampling architecture and a suitable contrastive learning paradigm. Experiments carried out in challenging conditions prove the proposed method to be a first step towards universal GAN image detection, ensuring also good robustness to common image impairments, and good generalization to unseen architectures.
In the age of digital content creation and distribution, steganography, that is, hiding of secret data within another data is needed in many applications, such as in secret communication between two parties, piracy pr...
详细信息
ISBN:
(纸本)9781728173221
In the age of digital content creation and distribution, steganography, that is, hiding of secret data within another data is needed in many applications, such as in secret communication between two parties, piracy protection, etc. In image steganography, secret data is generally embedded within the image through an additional step after a mandatory image enhancement process. In this paper, we propose the idea of embedding data during the image enhancement process. This saves the additional work required to separately encode the data inside the cover image. We used the Alpha-Trimmed mean filter for image enhancement and XOR of the 6 MSBs for embedding the two bits of the bitstream in the 2 LSBs whereas the extraction is a reverse process. Our obtained quantitative and qualitative results are better than a methodology presented in a very recent paper.
暂无评论