Traditional Chinese Medicine (TCM) has gained prominence in clinical practice, with tongue diagnosis, a key technique, now being integrated with Artificial Intelligence (AI) to achieve more objective and quantifiable ...
详细信息
ISBN:
(纸本)9789819620531;9789819620548
Traditional Chinese Medicine (TCM) has gained prominence in clinical practice, with tongue diagnosis, a key technique, now being integrated with Artificial Intelligence (AI) to achieve more objective and quantifiable results, thereby mitigating reliance on subjective judgment. However, challenges such as poor lighting conditions and limited imaging equipment often compromise image clarity, complicating tongue detection and identification. To address these issues, we propose a DualTask Feedback Learning (DTFL) framework, designed to enhance tongue detection in patient images by improving image quality. In our approach, Super-Resolution (SR) serves as a preliminary task preceding Tongue Detection (TD), enabling the TD network to process high-quality images for more accurate results. To further improve the interaction between SR and TD tasks, we incorporate Feature Alignment (FA) loss, which establishes a feedback connection that allows the SR network to acquire task-specific knowledge from the TD network. Additionally, we introduce a quality fusion augmentation and alternate training strategy to address potential challenges associated with FA loss during training. To the best of our knowledge, we are the first to integrate SR into TD. Experiments demonstrate that DTFL significantly improves performance by generating SR images that are optimally suited for TD.
We motivate and summarise the track SpecifyThis - Bridging gaps between program specification paradigms, taking place at the International Symposium on Leveraging Applications of Formal Methods, ISoLA 2024.
ISBN:
(纸本)9783031753794;9783031753800
We motivate and summarise the track SpecifyThis - Bridging gaps between program specification paradigms, taking place at the International Symposium on Leveraging Applications of Formal Methods, ISoLA 2024.
Many studies have been conducted on handwritten mathematical expression recognition (HMER) based on encoder-decoder architecture. However, the previous methods fail to predict accurate results due to low-quality image...
详细信息
ISBN:
(纸本)9789819785100;9789819785117
Many studies have been conducted on handwritten mathematical expression recognition (HMER) based on encoder-decoder architecture. However, the previous methods fail to predict accurate results due to low-quality images such as blur, complex background and distortion. In addition, ambiguous or subtle symbols caused by different handwriting styles are often recognized incorrectly. In this paper, we propose an efficient method for HMER to deal with the above issues. Specifically, we propose a Dual-branch Refinement Module (DRM) to deal with the challenging disturbances. In terms of ambiguous or subtle symbols, we believe that the combination of local and global information is beneficial to recognizing these symbols. Therefore, we design a Local Feature Enhancement Module (LFEM) to enhance local features, which can cooperate with global information extracted by the following transformer decoder. Extensive experimental results on CROHME and HME100K datasets verify the effectiveness of our method.
The costly and time-consuming annotation process to produce large training sets for modelling semantic LiDAR segmentation methods has motivated the development of semi-supervised learning (SSL) methods. However, such ...
详细信息
ISBN:
(纸本)9783031732317;9783031732324
The costly and time-consuming annotation process to produce large training sets for modelling semantic LiDAR segmentation methods has motivated the development of semi-supervised learning (SSL) methods. However, such SSL approaches often concentrate on employing consistency learning only for individual LiDAR representations. This narrow focus results in limited perturbations that generally fail to enable effective consistency learning. Additionally, these SSL approaches employ contrastive learning based on the sampling from a limited set of positive and negative embedding samples. This paper introduces a novel semi-supervised LiDAR semantic segmentation framework called ItTakesTwo (IT2). IT2 is designed to ensure consistent predictions from peer LiDAR representations, thereby improving the perturbation effectiveness in consistency learning. Furthermore, our contrastive learning employs informative samples drawn from a distribution of positive and negative embeddings learned from the entire training set. Results on public benchmarks show that our approach achieves remarkable improvements over the previous state-of-the-art (SOTA) methods in the field. The code is available at: https://***/yyliu01/IT2.
3D object detection is a crucial task in computer vision and autonomous systems, which is widely utilized in robotics, autonomous driving, and augmented reality. With the advancement of input devices, researchers prop...
详细信息
ISBN:
(纸本)9789819784929;9789819784936
3D object detection is a crucial task in computer vision and autonomous systems, which is widely utilized in robotics, autonomous driving, and augmented reality. With the advancement of input devices, researchers propose to use multimodal information to improve the detection accuracy. However, integrating 2D and 3D features effectively to harness their complementary nature for detection tasks is still a challenge. In this paper, we note that the complementary nature of geometric and visual texture information can effectively strengthen feature fusion, which plays a key role in detection. To this end, we propose the Cross-Dimensional Attention Fusion-based indoor 3D object detection method (CDAF3D). This method dynamically learns geometric information with corresponding 2D image texture details through a cross-dimensional attention mechanism, enabling the model to capture and integrate spatial and textural information effectively. Additionally, due to the nature of 3D object detection, where intersecting entities with different specific labels are unrealistic, we further propose Preventive 3D Intersect Loss (P3DIL). This loss enhances detection accuracy by addressing intersections between objects of different labels. We evaluate the proposed CDAF3D on the SUN RGB-D and Scannet v2 datasets. Our results achieve 78.2 mAP@0.25 and 66.5 mAP@0.50 on ScanNetV2 and 70.3 mAP@0.25 and 54.1 mAP@0.50 on SUN RGB-D. The proposed CDAF3D outperforms all the multi-sensor-based methods with 3D IoU thresholds of 0.25 and 0.5.
Multilingual programs, whose implementations are made of different languages, are gaining traction especially in domains, such as web programming, that particularly benefit from the additional flexibility brought by u...
详细信息
ISBN:
(纸本)9783031753794;9783031753800
Multilingual programs, whose implementations are made of different languages, are gaining traction especially in domains, such as web programming, that particularly benefit from the additional flexibility brought by using multiple languages. In this paper, we discuss the impact that the features commonly used in multilingual programming have on our capability of specifying and analyzing them. To this end, we first outline a few broad categories of multilingual programming, according to the mechanisms that are used for inter-language communication. Based on these categories, we describe several instances of multilingual programs, as well as the intricacies that formally reasoning about their behavior would entail. We also summarize the state of the art in multilingual program analysis, including the challenges that remain open. These contributions can help understand the lay of the land in multilingual program specification and analysis, and motivate further work in this area.
Fine-grained image recognition (FGIR) aims to distinguish visual objects belonging to different subclasses within the same category. Existing methods mainly focus on identifying discriminative regions and extracting t...
详细信息
ISBN:
(纸本)9789819785018;9789819785025
Fine-grained image recognition (FGIR) aims to distinguish visual objects belonging to different subclasses within the same category. Existing methods mainly focus on identifying discriminative regions and extracting the most prominent features. However, this approach leads to scale imbalance between the foreground and background of an image. And it tends to focus on extracting features from salient foreground regions while neglecting valuable information present in the background. To address these two challenges, we propose a weakly supervised foreground-background partitioning and feature fusion framework. Specifically, a foreground-background image partition module is employed to separate the foreground and background regions to resolve the scale imbalance in image. We incorporate a feature similarity calculation module to weigh the foreground and background features. To leverage the background information while capturing discriminative regions, we introduce a selective mask feature module. Comprehensive experiments on four popular and competitive datasets demonstrated the superiority of the proposed method in comparison with the state-of-the-art methods.
Large Language Models (LLMs) have found extensive use across different applications due to its diverse capabilities and proficiency in executing instructions. In the case of chatbots, they are frequently required to s...
详细信息
ISBN:
(纸本)9789819784899;9789819784905
Large Language Models (LLMs) have found extensive use across different applications due to its diverse capabilities and proficiency in executing instructions. In the case of chatbots, they are frequently required to show empathy when used in the context of emotional support. However, to date their performance is still not satisfactory due to the lack of deep understanding of user related issues. Hence, we introduce the Empathizing Before Generation (EBG), a two-step learning framework that allows LLMs to analyze the chain of thought (COT) prior to generating a response. This model also enables the inference of the 24 emotions conveyed in the user's text as well as facilitates the generation of empathetic, high-quality and appropriate responses. We create a COT version of the dataset for sentiment inference by utilizing a publicly accessible sentiment dialogue. This dataset is then used as support for the training of two layers of EBG. Experiments indicate that models integrated with the EBG outperform other models in demonstrating empathy, with 98.2% and 92.8% accuracy in emotional attributes and labels respectively. Additionally, there is a notable enhancement in the model's capacity to comprehend COT instructions, infer emotions, and generate answers that are more satisfactory than other models.
While recent advances in deep learning (DL) for surgical scene segmentation have yielded promising results on single-centre and single-imaging modality data, these methods usually do not generalise to unseen distribut...
详细信息
ISBN:
(纸本)9783031732928;9783031732904
While recent advances in deep learning (DL) for surgical scene segmentation have yielded promising results on single-centre and single-imaging modality data, these methods usually do not generalise to unseen distribution or unseen modalities. Even though human experts can identify visual appearances, DL methods often fail to do so if data samples do not follow the similar data distribution. Current literature for tackling domain gaps in modality changes has been done mostly for natural scene data. However, these methods cannot be directly applied to the endoscopic data as the visual cues are very limited compared to the natural scene data. In this work, we exploit the style and content information in the image by performing instance normalization and feature covariance mapping techniques for preserving robust and generalizable feature representations. Further, to eliminate the risk of removing salient feature representation associated with the objects of interest, we introduce a restitution module within the feature learning ResNet backbone that allows the retention of useful task-relevant features. Our proposed method obtained 13.7% improvement over the baseline DeepLabv3+ and nearly 8% improvement on recent state-of-the-art (SOTA) methods for the target (different modality) set of EndoUDA polyp dataset. Similarly, our method achieved 19% improvement over the baseline and 6% over best performing SOTA on EndoUDA Barrett's esophagus (BE) data.
Few-shot point cloud 3D object detection (FS3D) aims to identify and localise objects of novel classes from point clouds, using knowledge learnt from annotated base classes and novel classes with very few annotations....
详细信息
ISBN:
(纸本)9789819785070;9789819785087
Few-shot point cloud 3D object detection (FS3D) aims to identify and localise objects of novel classes from point clouds, using knowledge learnt from annotated base classes and novel classes with very few annotations. Thus far, this challenging task has been approached using prototype learning, but the performance remains far from satisfactory. We find that in existing methods, the prototypes are only loosely constrained and lack of fine-grained awareness of the semantic and geometrical correlation embedded within the point cloud space. To mitigate these issues, we propose to leverage the inherent contrastive relationship within the semantic and geometrical subspaces to learn more refined and generalisable prototypical representations. To this end, we first introduce contrastive semantics mining, which enables the network to extract discriminative categorical features by constructing positive and negative pairs within training batches. Meanwhile, since point features representing local patterns can be clustered into geometric components, we further propose to impose contrastive relationship at the primitive level. Through refined primitive geometric structures, the transferability of feature encoding from base to novel classes is significantly enhanced. The above designs and insights lead to our novel Contrastive Prototypical VoteNet (CP-VoteNet). Extensive experiments on two FS3D benchmarks FS-ScanNet and FS-SUNRGBD demonstrate that CP-VoteNet surpasses current state-of-the-art methods by considerable margins across different FS3D settings. Further ablation studies conducted corroborate the rationale and effectiveness of our designs.
暂无评论