As Metaverse emerges as the next-generation Internet paradigm, the ability to efficiently generate content is paramount. AI-Generated Content (AIGC) emerges as a key solution, yet the resource-intensive nature of larg...
详细信息
The application of artificial intelligence (AI) in three-dimensional (3D) agricultural research, particularly for maize, has been limited by the scarcity of large-scale, diverse datasets. While 2D image datasets are a...
The Barcelona Clinic Liver Cancer (BCLC) staging system plays a crucial role in clinical planning, offering valuable insights for effectively managing hepatocellular carcinoma. Accurate prediction of BCLC stages can s...
详细信息
ISBN:
(数字)9798350371499
ISBN:
(纸本)9798350371505
The Barcelona Clinic Liver Cancer (BCLC) staging system plays a crucial role in clinical planning, offering valuable insights for effectively managing hepatocellular carcinoma. Accurate prediction of BCLC stages can significantly ease the workload on radiologists. However, few datasets are explicitly designed for discerning BCLC stages. Despite the common practice of appending BCLC labels to clinical data within datasets, the inherent imbalance in BCLC distribution is further amplified by the diverse purposes for which datasets are curated. In this study, we aim to develop a BCLC staging system using the advanced Swin Transformer model. Additionally, we explore the integration of two datasets, each originally intended for separate objectives, highlighting the critical challenge of preserving class distribution in practical study designs. This exploration is pivotal for ensuring the applicability of our developed staging system in the designed clinical settings. Our resulting BCLC staging system demonstrates an accuracy of 55.81% (±7.8%), contributing to advancing medical image-based research for predicting BCLC stages.
Deep learning has revolutionized medical imaging, offering advanced methods for accurate diagnosis and treatment planning. The BCLC staging system is crucial for staging Hepatocellular Carcinoma (HCC), a high-mortalit...
详细信息
ISBN:
(数字)9798350351552
ISBN:
(纸本)9798350351569
Deep learning has revolutionized medical imaging, offering advanced methods for accurate diagnosis and treatment planning. The BCLC staging system is crucial for staging Hepatocellular Carcinoma (HCC), a high-mortality cancer. An automated BCLC staging system could significantly enhance diagnosis and treatment planning efficiency. However, we found that BCLC staging, which is directly related to the size and number of liver tumors, aligns well with the principles of the Multiple Instance Learning (MIL) framework. To effectively achieve this, we proposed a new preprocessing technique called Masked Cropping and Padding(MCP), which addresses the variability in liver volumes and ensures consistent input sizes. This technique preserves the structural integrity of the liver, facilitating more effective learning. Furthermore, we introduced Re ViT, a novel hybrid model that integrates the local feature extraction capabilities of Convolutional Neural Networks (CNNs) with the global context modeling of Vision Transformers (ViTs). Re ViT leverages the strengths of both architectures within the MIL framework, enabling a robust and accurate approach for BCLC staging. We will further explore the trade-off between performance and interpretability by employing TopK Pooling strategies, as our model focuses on the most informative instances within each bag.
As Metaverse emerges as the next-generation Internet paradigm, the ability to efficiently generate content is paramount. AI-Generated Content (AIGC) emerges as a key solution, yet the resource-intensive nature of larg...
详细信息
Contrastive learning-based video-language representation learning approaches, e.g., CLIP, have achieved outstanding performance, which pursue semantic interaction upon pre-defined video-text pairs. To clarify this coa...
Contrastive learning-based video-language representation learning approaches, e.g., CLIP, have achieved outstanding performance, which pursue semantic interaction upon pre-defined video-text pairs. To clarify this coarse-grained global interaction and move a step further, we have to encounter challenging shell-breaking interactions for fine-grained cross-modal learning. In this paper, we creatively model video-text as game players with multivariate cooperative game theory to wisely handle the uncertainty during fine-grained semantic interaction with diverse granularity, flexible combination, and vague intensity. Concretely, we propose Hierarchical Banzhaf Interaction (HBI) to value possible correspondence between video frames and text words for sensitive and explainable cross-modal contrast. To efficiently realize the cooperative game of multiple video frames and multiple text words, the proposed method clusters the original video frames (text words) and computes the Banzhaf Interaction between the merged tokens. By stacking token merge modules, we achieve cooperative games at different semantic levels. Extensive experiments on commonly used text-video retrieval and video-question answering bench-marks with superior performances justify the efficacy of our HBI. More encouragingly, it can also serve as a visualization tool to promote the understanding of cross-modal interaction, which have a far-reaching impact on the community. Project page is available at https://***/HBI/.
This article describes a framework for modeling and executing training in augmented and virtual reality environments. The framework was designed based on characteristics observed in existing training applications for ...
This article describes a framework for modeling and executing training in augmented and virtual reality environments. The framework was designed based on characteristics observed in existing training applications for complex tasks, such as the use of tools, control panels and the need for step-by-step instructions. Unlike other frameworks, in the proposed system, it is possible to create the training entirely within the virtual/augmented environment, avoiding constant switching between 2D and 3D environments. The framework allows for the definition of steps in a training program, each of which includes textual instructions, videos, and 3D objects, static or animated, anchored in the real world. To demonstrate the capabilities of the framework, a training program for operating a Universal Testing Machine was created as a case study. Overall, the proposed framework allows for the creation of effective and efficient AR training programs for a variety of tasks and industries.
Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction ta...
Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction tasks, e.g., unsupervised semantic segmentation (USS). The extracted relationship among pixel-level representations typically contains rich class-aware information that semantically identical pixel embeddings in the representation space gather together to form sophisticated concepts. However, leveraging the learned models to ascertain semantically consistent pixel groups or regions in the image is non-trivial since over/ under-clustering overwhelms the conceptualization procedure under various semantic distributions of different images. In this work, we investigate the pixel-level semantic aggregation in self-supervised ViT pre-trained models as image Segmentation and propose the Adaptive Conceptualization approach for USS, termed ACSeg. Concretely, we explicitly encode concepts into learnable prototypes and design the Adaptive Concept Generator (ACG), which adaptively maps these prototypes to informative concepts for each image. Meanwhile, considering the scene complexity of different images, we propose the modularity loss to optimize ACG independent of the concept number based on estimating the intensity of pixel pairs belonging to the same concept. Finally, we turn the USS task into classifying the discovered concepts in an unsupervised manner. Extensive experiments with state-of-the-art results demonstrate the effectiveness of the proposed ACSeg.
Monitoring the quality of river water is of fundamental importance and needs to be taken into consideration when it comes to the research into the hydrological field. In this context, the concentration of the dissolve...
详细信息
The ease of using transportation is one of the most critical things in the city with a significant population like Jakarta. The growth of the population in Jakarta is increased rapidly. The wage that Many transportati...
详细信息
暂无评论