Memes have evolved into a powerful tool for social interaction on platforms like Twitter, Instagram, Facebook, Pinterest, where they communicate complex emotions through a blend of images, text, and emojis. In this re...
详细信息
This Volume 5150 Part 2 of 2 parts of the conference proceedings contains 73 papers. Topics discussed include coding standard, image and video security and watermarking, MPEG video coding standard, error resilient cod...
详细信息
This Volume 5150 Part 2 of 2 parts of the conference proceedings contains 73 papers. Topics discussed include coding standard, image and video security and watermarking, MPEG video coding standard, error resilient coding, image and video segmentation, visualization, systems and architectures, three dimensional imageprocessing, object based coding, image compression beyond wavelets, semantic characterization of multimedia documents, image based rendering and related technologies, image and video enhancement and image and video coding.
Identifying and locating objects in images and videos, including elements like traffic signs, vehicles, buildings, and people, constitutes a fundamental and demanding task in computer vision, known as object detection...
详细信息
YOLO-based models are widely used for personal protective equipment (PPE) compliance detection due to their excellent detection performance and efficiency. However, most YOLO models are not competent for detection tas...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
YOLO-based models are widely used for personal protective equipment (PPE) compliance detection due to their excellent detection performance and efficiency. However, most YOLO models are not competent for detection tasks in complex industrial scenarios such as remote surveillance and extremely small targets. In addition, there is a lack of effective model lightweighting and knowledge transfer approaches for industrial deployment. To this end, this paper proposes a Multi-scale and Knowledge-Distilling YOLO (MKD-YOLO) based on YOLOv8n for efficient PPE compliance detection. Specifically, in backbone stage, we design an Efficient Multi-Scale Enhanced Convolution (C2f-EMSEC) module and Large Spatial Pyramid Pooling-Fast (LSPPF) module for multi-scale and global-contextual feature learning as well as reducing model complexity. Then, in neck stage, a refined Bidirectional feature Pyramid Network (BPNet) is designated to capture fine-grained details for extremely small object detection. Moreover, we apply channel-wise knowledge distillation to facilitate model lightweighting and domain-specific knowledge transfer learning. Experiments on our proposed dataset and public datasets show that the proposed MKD-YOLO achieves a new state-of-the-art (SOTA) detection performance and efficiency for practical PPE compliance detection tasks. Codes and the dataset are available at https://***/z1Zjt/MKD-YOLO.
This Volume 5150 Part 1 of 3 parts of the conference proceedings contains 73 papers. Topics discussed include object based video, extraction tools, evaluation metrics and applications, immersive three dimensional vide...
详细信息
This Volume 5150 Part 1 of 3 parts of the conference proceedings contains 73 papers. Topics discussed include object based video, extraction tools, evaluation metrics and applications, immersive three dimensional video communication, stereo imaging, scalable coding, rate control, object tracking, image and video segmentation, image retrieval, image and video indexing, internet video and streaming, video quality assessment and motion compensated wavelet coding.
Background With the rapid development of information technology and the digitization of medical devices, various diseases require the use of medical imaging equipment for diagnosis. At present, various medical imaging...
详细信息
ISBN:
(纸本)9789819637546
Background With the rapid development of information technology and the digitization of medical devices, various diseases require the use of medical imaging equipment for diagnosis. At present, various medical imaging diagnostic equipment such as CT and nuclear magnetic resonance can provide two-dimensional planar images of diseases. Doctors urgently need to accurately determine the spatial location, size, geometry, and spatial relationship with the surrounding tissue. Therefore, it is very important to use computer technology to segment 3D MRI images, determine the location of lesions, and then perform 3D reconstruction. Method At present, automatic recognition and marking of brain images are displayed in two dimensions. Therefore, it is necessary to use 3D visualization technology for reconstruction. In addition, it can be combined with virtual and real, and some additional information is superimposed on the brain image for integrated display. In addition, a combination of virtual and real needs to be superimposed, and some additional information is superimposed on the brain image for integrated display. The research focus of this paper includes two main parts: disease segmentation and 3D reconstruction visualization. Firstly, the disease segmentation method based on 3D MRI brain image files was designed, and then the feature extraction and 3D reconstruction functions were designed. Thereby forming a complete process of disease region segmentation and three-dimensional reconstruction. Results This study is based on a three-dimensional MRI brain image segmentation algorithm. The algorithm is advanced in technology, high in accuracy, and can effectively identify the location of the disease. Then, this study used the Unity tool to implement a three-dimensional reconstruction and visual display program for brain image disease segmentation. Therefore, the doctor can quickly and intuitively grasp the spatial information inside the brain and the information of the lesion
This demo paper gives a real-time learned image codec on FPGA. By using Xilinx VCU128, the proposed system reaches 720P@30fps codec, which is 7.76x faster than prior work.
ISBN:
(纸本)9781665475921
This demo paper gives a real-time learned image codec on FPGA. By using Xilinx VCU128, the proposed system reaches 720P@30fps codec, which is 7.76x faster than prior work.
Perceptual organization is the process of assigning each part of a scene to a specified association of features to be a part of the same organization. In the twenty century, Gestalt psychologists formalized how image ...
详细信息
ISBN:
(纸本)9781728180687
Perceptual organization is the process of assigning each part of a scene to a specified association of features to be a part of the same organization. In the twenty century, Gestalt psychologists formalized how image features tend to be grouped by giving a set of organizing principles. In this paper, we propose an approach for the detection of perceptual groups in an image. We are mainly interested in features grouped by the proximity law of Gestalt. We conceive an object-based model within a stochastic framework using a marked point process (MPP). We use a Bayesian learning method to extract perceptual groups in a scene. The proposed model tested on synthetic images proves the efficient detection of perceptual groups in noisy images.
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by t...
详细信息
ISBN:
(纸本)9781728185514
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by the drone by predicting the next video frame as a new state signal. The Dreamer is a conditional video sequence generator. This model-based environment avoids the time-consuming interactions between the agent and the environment, speeding up largely the training process. This demonstration showcases for the first time the application of the Dreamer to train an agent that can finish the racing task in the Airsim simulator.
暂无评论