The structural similarity of point clouds presents challenges in accurately recognizing and segmenting semantic information at the demarcation points of complex scenes or objects. In this study, we propose a multi-sca...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
The structural similarity of point clouds presents challenges in accurately recognizing and segmenting semantic information at the demarcation points of complex scenes or objects. In this study, we propose a multi-scale graph transformer network (MGTN) for 3D point cloud semantic segmentation. First, a multi-scale graph convolution (MSG-Conv) is devised to address the limitations faced by existing methods when extracting local and global features of point cloud data with varying densities simultaneously. Subsequently, we employ a graph-transformer (G-T) module to enhance edge details and spatial position information in the point cloud, thereby improving recognition accuracy for small objects and confusing elements such as columns and beams. Extensive testing on ShapeNet parts and S3DIS datasets was conducted to demonstrate the effectiveness of MGTN. Compared to the baseline network DGCNN, our proposed MGTN achieves substantial performance improvements, as evidenced by notable increases in mIoU of 1.5% and 18.5% on the ShapeNet parts and S3DIS datasets respectively. Additionally, MGTN outperforms the recent CFSA-Net by 2.3% and 3.4% on OA and mIoU respectively.
Due to the substantial storage requirements of the 4D medical images, achieving efficient compression of such images is a crucial topic. Existing traditional image/video coding methods have achieved remarkable results...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Due to the substantial storage requirements of the 4D medical images, achieving efficient compression of such images is a crucial topic. Existing traditional image/video coding methods have achieved remarkable results in most compression tasks, but their performance in encoding 4D medical images remain poor. This is because these methods cannot fully exploit the spatio-temporal correlations in 4D images. Recently, implicit neural representation (INR) based image/video compression methods have made significant progress, with coding performance comparable to traditional methods. However, they also suffer from significant performance losses in 4D medical image compression like traditional methods. In this paper, we propose an efficient hybrid representation framework, which includes six learnable feature planes and a tiny MLP decoder. This framework alleviates the issue of previous methods lacking the ability to utilize the spatio-temporal correlations in 4D medical images, enabling it to capture these information more effectively. We also introduce a novel adaptive plane scaling strategy that allocates the numbers of parameter in each plane based on the resolution of the image. This design allows the model to further enhance the reconstruction quality at the same compression ratio. Extensive experiments show that our model achieves better RD performance compared to traditional and INR-based methods, and it also offers faster encoding speeds than INR-based methods.
In streaming media services, video transcoding is a common practice to alleviate bandwidth demands. Unfortunately, traditional methods employing a uniform rate factor (RF) across all videos often result in significant...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
In streaming media services, video transcoding is a common practice to alleviate bandwidth demands. Unfortunately, traditional methods employing a uniform rate factor (RF) across all videos often result in significant inefficiencies. Content-adaptive encoding (CAE) techniques address this by dynamically adjusting encoding parameters based on video content characteristics. However, existing CAE methods are often tightly coupled with specific encoding strategies, leading to inflexibility. In this paper, we propose a model that predicts both RF-quality and RF-bitrate curves, which can be utilized to derive a comprehensive bitrate-quality curve. This approach facilitates flexible adjustments to the encoding strategy without necessitating model retraining. The model leverages codec features, content features, and anchor features to predict the bitrate-quality curve accurately. Additionally, we introduce an anchor suspension method to enhance prediction accuracy. Experiments confirm that the actual quality metric (VMAF) of the compressed video stays within +/- 1 of the target, achieving an accuracy of 99.14%. By incorporating our quality improvement strategy with the rate-quality curve prediction model, we conducted online A/B tests, obtaining both +0.107% improvements in video views and video completions and +0.064% app duration time. Our model has been deployed on the Xiaohongshu App.
Neural Radiance Fields (NeRF) have demonstrated exceptional performance in generating novel views of scenes by learning implicit volumetric representations from calibrated RGB images, without depth information. A majo...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
Neural Radiance Fields (NeRF) have demonstrated exceptional performance in generating novel views of scenes by learning implicit volumetric representations from calibrated RGB images, without depth information. A major limitation is the need for large training datasets in neural network-based view synthesis frameworks. The challenge of effective data augmentation for view synthesis remains unresolved. NeRF models require extensive scene coverage from multiple views to accurately estimate radiance and density. Insufficient coverage reduces the model's ability to interpolate or extrapolate unseen parts of the scene effectively. In this paper, we propose a novel pipeline to address this data augmentation issue using depth map information. We use depth image-based rendering (DIBR) to overcome the lack of enough views for training NeRF. Experimental results indicate that our approach enhances the quality of rendered images using the NeRF framework, achieving an average peak signal-to-noise ratio (PSNR) increase of 7.2 dB, with a maximum improvement of 12 dB.
In computer vision applications, image enhancement is important for improving image quality and extracting meaningful information. Noise removal is a commonly used technique in image enhancement. In this study, the Ba...
详细信息
ISBN:
(纸本)9798350388978;9798350388961
In computer vision applications, image enhancement is important for improving image quality and extracting meaningful information. Noise removal is a commonly used technique in image enhancement. In this study, the Batch Renormalization Denoising Network (BRDNet), which performs well in noise removal, is used as the base model with the use of the Bottleneck Attention Module (BAM) to achieve performance improvement. The proposed method is tested on different datasets with different noise levels and their results are compared. In quantitative experiments, an increase in the PSNR metric value was observed and the visual results were found to be closer to the target images.
The proceedings contain 21 papers. The special focus in this conference is on ICT Innovations. The topics include: Aligning Food Ingredients with Multiple Semantic Resources;crossword Generation as a Co...
ISBN:
(纸本)9783031861611
The proceedings contain 21 papers. The special focus in this conference is on ICT Innovations. The topics include: Aligning Food Ingredients with Multiple Semantic Resources;crossword Generation as a Constraint Satisfaction Problem Using Parallel processing and Lemmatization;comprehensive Examination of Network Access, Logging, and Auditing Strategies in Public and Private Institutions: Safeguarding Information Security, Resilience, and Compliance in the Digital Era;Benefits of Parallelization in CPU Rendering: Quantitative Analysis Using a Custom 3D Rendering Engine;simulation of the Quasigroup Redundancy Check Code’s Ability to Detect Errors;YOLOv8 Oriented Bounding Box (OBB) Model for Waymo Open Dataset;deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data;transfer Learning with Yolo for Object Detection in Remote Sensing;comparison of On-Board and Off-Board processing Power Consumption for Drone Camera images;classification of Some Cosmological images Using Deep Learning and Persistent Homology;mushroom Classification Using Machine Learning;towards a Framework for Promoting Student Engagement to Maximize Learning in Higher Education: A Case Study;Blood Oxygen Saturation Estimation Using PPG Signals from the MIMIC-III Database;novel Methodology for Gaining New Insights Into the Pharmacological Mechanisms of Cannabis sativa and Alzheimer’s Disease Through Signaling Pathway Analysis Using Bioinformatics Tools;AI Cardiologist: Arrhythmia Detection by Transformer-Based Language Model;ambient Assisted Living Sensor-Based Solution for Elderly Self-monitoring;Classification of Autism and Typical Development Children Based on EEG Signals;detecting the Unseen: Exploiting Radar-Sonar Sensor Fusion for visual Detection of Low-Profile Naval Drones;Evaluating Killer Drone Defense: NATO SPS Project "Anti-Drones" Field Trials.
This demo paper gives a real-time learned image codec on FPGA. By using Xilinx VCU128, the proposed system reaches 720P@30fps codec, which is 7.76x faster than prior work.
ISBN:
(纸本)9781665475921
This demo paper gives a real-time learned image codec on FPGA. By using Xilinx VCU128, the proposed system reaches 720P@30fps codec, which is 7.76x faster than prior work.
Perceptual organization is the process of assigning each part of a scene to a specified association of features to be a part of the same organization. In the twenty century, Gestalt psychologists formalized how image ...
详细信息
ISBN:
(纸本)9781728180687
Perceptual organization is the process of assigning each part of a scene to a specified association of features to be a part of the same organization. In the twenty century, Gestalt psychologists formalized how image features tend to be grouped by giving a set of organizing principles. In this paper, we propose an approach for the detection of perceptual groups in an image. We are mainly interested in features grouped by the proximity law of Gestalt. We conceive an object-based model within a stochastic framework using a marked point process (MPP). We use a Bayesian learning method to extract perceptual groups in a scene. The proposed model tested on synthetic images proves the efficient detection of perceptual groups in noisy images.
This study addresses the challenges of information extraction and complete tumor registry coding for lymphoma [1], a malignancy originating from the lymphatic system. Lymphoma encompasses various subtypes, primarily c...
详细信息
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by t...
详细信息
ISBN:
(纸本)9781728185514
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by the drone by predicting the next video frame as a new state signal. The Dreamer is a conditional video sequence generator. This model-based environment avoids the time-consuming interactions between the agent and the environment, speeding up largely the training process. This demonstration showcases for the first time the application of the Dreamer to train an agent that can finish the racing task in the Airsim simulator.
暂无评论