Identifying and locating objects in images and videos, including elements like traffic signs, vehicles, buildings, and people, constitutes a fundamental and demanding task in computer vision, known as object detection...
详细信息
YOLO-based models are widely used for personal protective equipment (PPE) compliance detection due to their excellent detection performance and efficiency. However, most YOLO models are not competent for detection tas...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
YOLO-based models are widely used for personal protective equipment (PPE) compliance detection due to their excellent detection performance and efficiency. However, most YOLO models are not competent for detection tasks in complex industrial scenarios such as remote surveillance and extremely small targets. In addition, there is a lack of effective model lightweighting and knowledge transfer approaches for industrial deployment. To this end, this paper proposes a Multi-scale and Knowledge-Distilling YOLO (MKD-YOLO) based on YOLOv8n for efficient PPE compliance detection. Specifically, in backbone stage, we design an Efficient Multi-Scale Enhanced Convolution (C2f-EMSEC) module and Large Spatial Pyramid Pooling-Fast (LSPPF) module for multi-scale and global-contextual feature learning as well as reducing model complexity. Then, in neck stage, a refined Bidirectional feature Pyramid Network (BPNet) is designated to capture fine-grained details for extremely small object detection. Moreover, we apply channel-wise knowledge distillation to facilitate model lightweighting and domain-specific knowledge transfer learning. Experiments on our proposed dataset and public datasets show that the proposed MKD-YOLO achieves a new state-of-the-art (SOTA) detection performance and efficiency for practical PPE compliance detection tasks. Codes and the dataset are available at https://***/z1Zjt/MKD-YOLO.
Background With the rapid development of information technology and the digitization of medical devices, various diseases require the use of medical imaging equipment for diagnosis. At present, various medical imaging...
详细信息
ISBN:
(纸本)9789819637546
Background With the rapid development of information technology and the digitization of medical devices, various diseases require the use of medical imaging equipment for diagnosis. At present, various medical imaging diagnostic equipment such as CT and nuclear magnetic resonance can provide two-dimensional planar images of diseases. Doctors urgently need to accurately determine the spatial location, size, geometry, and spatial relationship with the surrounding tissue. Therefore, it is very important to use computer technology to segment 3D MRI images, determine the location of lesions, and then perform 3D reconstruction. Method At present, automatic recognition and marking of brain images are displayed in two dimensions. Therefore, it is necessary to use 3D visualization technology for reconstruction. In addition, it can be combined with virtual and real, and some additional information is superimposed on the brain image for integrated display. In addition, a combination of virtual and real needs to be superimposed, and some additional information is superimposed on the brain image for integrated display. The research focus of this paper includes two main parts: disease segmentation and 3D reconstruction visualization. Firstly, the disease segmentation method based on 3D MRI brain image files was designed, and then the feature extraction and 3D reconstruction functions were designed. Thereby forming a complete process of disease region segmentation and three-dimensional reconstruction. Results This study is based on a three-dimensional MRI brain image segmentation algorithm. The algorithm is advanced in technology, high in accuracy, and can effectively identify the location of the disease. Then, this study used the Unity tool to implement a three-dimensional reconstruction and visual display program for brain image disease segmentation. Therefore, the doctor can quickly and intuitively grasp the spatial information inside the brain and the information of the lesion
One of the keys to obtaining acceptable quality imagery/video encoded at very low bit rates is to transmit only that information which is critical to human perception. To successfully achieve this goal, one must not o...
详细信息
ISBN:
(纸本)081941638X;9780819416384
One of the keys to obtaining acceptable quality imagery/video encoded at very low bit rates is to transmit only that information which is critical to human perception. To successfully achieve this goal, one must not only understand the human visual system, but be able to utilize this information in the design of their codec. This paper will present an overview of the properties associated with color science and human visual perception, and how they could make an impact on very low bit-rate image coding.
image coding schemes are described from the point of view of their associated image models. Among the work related to these paradigms in the Univ. of Tokyo, 3-D model-based coding and 2-D deformable triangle based mot...
详细信息
ISBN:
(纸本)081941638X;9780819416384
image coding schemes are described from the point of view of their associated image models. Among the work related to these paradigms in the Univ. of Tokyo, 3-D model-based coding and 2-D deformable triangle based motion compensation are outlined.
The guiding principle of this study is to find an optimum way to simplify the contours produced by a second generation coding scheme based on morphological segmentation. For this purpose, evaluations of existing metho...
详细信息
ISBN:
(纸本)081941638X
The guiding principle of this study is to find an optimum way to simplify the contours produced by a second generation coding scheme based on morphological segmentation. For this purpose, evaluations of existing methods for contour simplification are carried out first. Based on the human visual phenomenon, a new nonlinear filter by means of majority operation is designed to simplify the contours in order to obtain an optimum compromise between the cost for contour coding and visual quality. Applications for region-based still image coding and video coding are demonstrated. Experimental results have shown an average of 20% reduction of bits for contour coding while keeping good visual quality.
This demo paper gives a real-time learned image codec on FPGA. By using Xilinx VCU128, the proposed system reaches 720P@30fps codec, which is 7.76x faster than prior work.
ISBN:
(纸本)9781665475921
This demo paper gives a real-time learned image codec on FPGA. By using Xilinx VCU128, the proposed system reaches 720P@30fps codec, which is 7.76x faster than prior work.
The paper describes an algorithm for the assessment of image fidelity. The algorithm includes an imageprocessing model of the human visual system for luminance still imagery. The major components of the algorithm are...
详细信息
Perceptual organization is the process of assigning each part of a scene to a specified association of features to be a part of the same organization. In the twenty century, Gestalt psychologists formalized how image ...
详细信息
ISBN:
(纸本)9781728180687
Perceptual organization is the process of assigning each part of a scene to a specified association of features to be a part of the same organization. In the twenty century, Gestalt psychologists formalized how image features tend to be grouped by giving a set of organizing principles. In this paper, we propose an approach for the detection of perceptual groups in an image. We are mainly interested in features grouped by the proximity law of Gestalt. We conceive an object-based model within a stochastic framework using a marked point process (MPP). We use a Bayesian learning method to extract perceptual groups in a scene. The proposed model tested on synthetic images proves the efficient detection of perceptual groups in noisy images.
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by t...
详细信息
ISBN:
(纸本)9781728185514
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by the drone by predicting the next video frame as a new state signal. The Dreamer is a conditional video sequence generator. This model-based environment avoids the time-consuming interactions between the agent and the environment, speeding up largely the training process. This demonstration showcases for the first time the application of the Dreamer to train an agent that can finish the racing task in the Airsim simulator.
暂无评论