Lidars and cameras play essential roles in autonomous driving, offering complementary information for 3D detection. The state-of-the-art fusion methods integrate them at the feature level, but they mostly rely on the ...
ISBN:
(纸本)9783031723346;9783031723353
Lidars and cameras play essential roles in autonomous driving, offering complementary information for 3D detection. The state-of-the-art fusion methods integrate them at the feature level, but they mostly rely on the learned soft association between point clouds and images, which lacks interpretability and neglects the hard association between them. In this paper, we combine feature-level fusion with point-level fusion, using hard association established by the calibration matrices to guide the generation of object queries. Specifically, in the early fusion stage, we use the 2D CNN features of images to decorate the point cloud data, and employ two independent sparse convolutions to extract the decorated point cloud features. In the mid-level fusion stage, we initialize the queries with a center heatmap and embed the predicted class labels as auxiliary information into the queries, making the initial positions closer to the actual centers of the targets. Extensive experiments conducted on two popular datasets, i.e. KITTI, Waymo, demonstrate the superiority of DecoratingFusion.
Due to the capability of abdominal images to accurately represent the spatial distribution and size relationships of lesion components in the body, precise segmentation of these images can significantly assist doctors...
ISBN:
(数字)9783031587764
ISBN:
(纸本)9783031587757;9783031587764
Due to the capability of abdominal images to accurately represent the spatial distribution and size relationships of lesion components in the body, precise segmentation of these images can significantly assist doctors in diagnosing illnesses. To address issues such as high computational resource consumption and inaccurate boundary delineation, we propose a two-stage segmentation framework with multi-scale feature fusion. This approach aims to enhance segmentation accuracy while reducing computational complexity. In the initial stage, a coarse segmentation network is employed to identify the location of segmentation targets with minimal computational overhead. Subsequently, in the second stage, we introduce a multi-scale feature fusion module that incorporates cross-layer connectivity. This method enhances the network's context-awareness capabilities and improves its ability to capture boundary information of intricate medical structures. Our proposed method has achieved notable results, with an average Dice Similarity Coefficient (DSC) score of 85.60% and 37.26% for organs and lesions, respectively, on the validation set. Additionally, the average running time and area under the GPU memory-time curve are reported as 11 s and 24,858.1 megabytes, demonstrating the efficiency and effectiveness of our approach in both accuracy and resource utilization.
In many works, the problem of extractive summarisation has been framed as the problem of extracting the best summary from a given document. Many popular recent works aim to solve this by employing neural networks. And...
ISBN:
(数字)9783031127007
ISBN:
(纸本)9783031126994;9783031127007
In many works, the problem of extractive summarisation has been framed as the problem of extracting the best summary from a given document. Many popular recent works aim to solve this by employing neural networks. And many of them are trained using a very limited scope, for example, a vast majority of the neural models are trained only using the best summary. Some also consider pruning useless summaries using other models. In this work, we show the problems that can arise when training neural models using such methods. We analyse those problems in some major milestones in Neural Extractive Summarisation. We also show and demonstrate ways to overcome them experimentally.
XR technology leads the arrival of spatial computing, and outlines a clearer form for the immersive experience of image -- "ultimate cinema". From the birth of film to interactive film, to VR film and game, ...
ISBN:
(纸本)9783031600111;9783031600128
XR technology leads the arrival of spatial computing, and outlines a clearer form for the immersive experience of image -- "ultimate cinema". From the birth of film to interactive film, to VR film and game, it provides us with a revolutionary vein of image technology media and immersive experience, and finally points to immersive theater. From the Minimalism space to public art, the physical liberation of performance art provides us with a theoretical vein of art history in which the audience is constantly enlarged and the body experience is advanced. In the development of games, Positive Psychology and Maslow's Hierarchy of Needs provide theoretical support for the occurrence of immersive experience, and have achieved remarkable results in practice. Finally, with the integration of technology and theory, immersive theater and the ultimate form of VR, the holodeck based on XR is becoming more and more visible, allowing us to glimpse the opportunities and difficulties that future immersive experience may face. At the same time, we should be more aware of the original intention of human's pursuit of immersive experience, and avoid the domination of technology over people.
Accurate coronary vessels segmentation from invasive coronary angiography (ICA) is essential for diagnosis and treatment planning for patients with coronary stenosis. Current machine learning-based approaches primaril...
ISBN:
(纸本)9783031524479;9783031524486
Accurate coronary vessels segmentation from invasive coronary angiography (ICA) is essential for diagnosis and treatment planning for patients with coronary stenosis. Current machine learning-based approaches primarily utilise convolutional neural networks (CNNs), which heavily focus on the local vessels features and ignore the geometric structures such as the shapes and directions of vessels. This limits the machine understandability of ICA images and creates a bottleneck for improvements of computer-generated segmentation quality, including unstable generalisation ability in low contrast areas and disconnection in vascular structures. To address these issues, we propose a fusion of Graph Attention Network (GAT) and CNN to assist in the learning of global geometric information during coronary vessels segmentation. We train and evaluate the proposed method on a large-scale ICA dataset and demonstrate that combining GAT into a unified network yields improved segmentation performance. Additionally, we utilise specific metrics to demonstrate the achieved improvements, as they offer greater potential for future research and exploration.
Dating and traditional matchmaking practices have now been replaced by the transient and evolving landscape of online dating. Applications such as Bumble, Hinge, and Tinder strive to outdo each other in introducing fe...
ISBN:
(纸本)9783031612800;9783031612817
Dating and traditional matchmaking practices have now been replaced by the transient and evolving landscape of online dating. Applications such as Bumble, Hinge, and Tinder strive to outdo each other in introducing features and design elements to cater to user needs. However, the utilisation of these apps differs significantly from how individuals engage with other interpersonal computer-mediated interactions, such as social media platforms. Through this study, we set to understand the practices and perceptions of university-going students regarding online dating platforms and how they can be used to model effective interactions. The user study employs a mixed-method approach, including an online survey with 72 respondents and 11 semi-structured interviews. The paper also encompasses a competitive study where the researchers compared existing dating platforms based on a set of parameters.
The underwater communication module demands a substantial amount of power to execute its myriad functions, depleting the energy source at a rapid pace. This research work introduces a resource-optimized encoding algor...
ISBN:
(纸本)9783031538292;9783031538308
The underwater communication module demands a substantial amount of power to execute its myriad functions, depleting the energy source at a rapid pace. This research work introduces a resource-optimized encoding algorithm for multi-hop communication, denoted as "resource-efficient communication", which strategically employs an optimal pulse signals for encoding sensor data generated by the underwater node. This significantly mitigates bandwidth usage during transmission, enhances payload security, consequently resulting in reduced power consumption for energy-sensitive sensor nodes. The efficacy of the resource-efficient communication algorithm is assessed by inputting various sensor data over a specific time interval. The evaluation results demonstrate a promising outcome, with a 100% run-time achievement when the sensor data exhibited gradual changes, while it still achieved a commendable 75% run-time in the case of non-deterministic variations in sensor data. The proposed algorithm accomplishes a transmission time of 100 s for steady sensor values and 127 s for fluctuating ones, using a packet size of 10,000 bytes. In contrast, the OOK modulation method requires 160 s for the same task. These results emphasize a significant enhancement in resource utilization efficiency provided by the proposed algorithm compared to conventional communication methods.
Specular highlight widely exists in daily life. Its strong brightness influences the recognition of text and graphic patterns in images, especially for documents and cards. In this paper, we propose a coarse-to-fine d...
ISBN:
(纸本)9783031500688;9783031500695
Specular highlight widely exists in daily life. Its strong brightness influences the recognition of text and graphic patterns in images, especially for documents and cards. In this paper, we propose a coarse-to-fine dynamic association learning method for specular highlight detection and removal. Specifically, based on the dichromatic reflection model, we first use a sub-network to separate the specular highlight layer and locate the regions of the highlight. Instead of directly subtracting the estimated specular highlight component from the raw image to get the highlight removal result, we design an associated learning module (ALM) together with a second-stage sub-network to restore the color distortion of the specular highlight layer removal. Our ALM respectively extracts features from the specular highlight part and non-specular highlight part to improve the color restoration. We conducted extensive evaluation experiments and the ablation study on the synthetic dataset and the real-world dataset. Our method achieved 36.09 PSNR and 97% SSIM on SHIQ dataset, along with 28.90 PSNR and 94% SSIM on SD1 dataset, which outperformed the SOTA methods.
Item fairness of recommender systems aims to evaluate whether items receive a fair share of exposure according to different definitions of fairness. Raj and Ekstrand [26] study multiple fairness metrics under a common...
ISBN:
(纸本)9783031560651;9783031560668
Item fairness of recommender systems aims to evaluate whether items receive a fair share of exposure according to different definitions of fairness. Raj and Ekstrand [26] study multiple fairness metrics under a common evaluation framework and test their sensitivity with respect to various configurations. They find that fairness metrics show varying degrees of sensitivity towards position weighting models and parameter settings under different information access systems. Although their study considers various domains and datasets, their findings do not necessarily generalize to next basket recommendation (NBR) where users exhibit a more repeat-oriented behavior compared to other recommendation domains. This paper investigates fairness metrics in the NBR domain under a unified experimental setup. Specifically, we directly evaluate the item fairness of various NBR methods. These fairness metrics rank NBR methods in different orders, while most of the metrics agree that repeat-biased methods are fairer than explore-biased ones. Furthermore, we study the effect of unique characteristics of the NBR task on the sensitivity of the metrics, including the basket size, position weighting models, and user repeat behavior. Unlike the findings in [26], Inequity of Amortized Attention (IAA) is the most sensitive metric, as observed in multiple experiments. Our experiments lead to novel findings in the field of NBR and fairness. We find that Expected Exposure Loss (EEL) and Expected Exposure Disparity (EED) are the most robust and adaptable fairness metrics to be used in the NBR domain.
Medical image segmentation plays an essential role in developing computer-assisted diagnosis and treatment systems, yet it still faces numerous challenges. In the past few years, Convolutional Neural Networks (CNNs) h...
详细信息
ISBN:
(纸本)9789819985425;9789819985432
Medical image segmentation plays an essential role in developing computer-assisted diagnosis and treatment systems, yet it still faces numerous challenges. In the past few years, Convolutional Neural Networks (CNNs) have been successfully applied to the task of medical image segmentation. Regrettably, due to the locality of convolution operations, these CNN-based architectures have their limitations in learning global context information in images, which might be crucial to the success of medical image segmentation. Meanwhile, the vision Transformer (ViT) architectures own the remarkable ability to extract long-range semantic features with the shortcoming of their computation complexity. To make medical image segmentation more efficient and accurate, we present a novel light-weight architecture named LeViT-UNet, which integrates multi-stage Transformer blocks in the encoder via LeViT, aiming to explore the effectiveness of fusion between local and global features together. Our experiments on two challenging segmentation benchmarks indicate that the proposed LeViT-UNet achieved competitive performance compared with various state-of-the-art methods in terms of efficiency and accuracy, suggesting that LeViT can be a faster feature encoder for medical images segmentation. LeViT-UNet-384, for instance, achieves Dice similarity coefficient (DSC) of 78.53% and 90.32% with a segmentation speed of 85 frames per second (FPS) in the Synapse and ACDC datasets, respectively. Therefore, the proposed architecture could be beneficial for prospective clinic trials conducted by the radiologists. Our source codes are publicly available at https://***/apple1986/LeViT_UNet.
暂无评论