News text is an important branch of natural language processing. Compared to ordinary texts, news text has significant economic and scientific value. The characteristics of news text include structural hierarchy, dive...
详细信息
This paper focuses on the task of few-shot 3D point cloud semantic *** some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation and inaccurate sema...
详细信息
This paper focuses on the task of few-shot 3D point cloud semantic *** some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation and inaccurate semantic *** tackle these issues,we first leverage part-whole relationships into the task of 3D point cloud semantic segmentation to capture semantic integrity,which is empowered by the dynamic capsule routing with the module of 3D Capsule Networks(CapsNets)in the embedding ***,the dynamic routing amalgamates geometric information of the 3D point cloud data to construct higher-level feature representations,which capture the relationships between object parts and their ***,we designed a multi-prototype enhancement module to enhance the prototype ***,the single-prototype enhancement mechanism is expanded to the multi-prototype enhancement version for capturing rich ***,the shot-correlation within the category is calculated via the interaction of different samples to enhance the intra-category *** studies prove that the involved part-whole relations and proposed multi-prototype enhancement module help to achieve complete object segmentation and improve semantic ***,under the integration of these two modules,quantitative and qualitative experiments on two public benchmarks,including S3DIS and ScanNet,indicate the superior performance of the proposed framework on the task of 3D point cloud semantic segmentation,compared to some state-of-the-art methods.
Thyroid nodules,a common disorder in the endocrine system,require accurate segmentation in ultrasound images for effective diagnosis and ***,achieving precise segmentation remains a challenge due to various factors,in...
详细信息
Thyroid nodules,a common disorder in the endocrine system,require accurate segmentation in ultrasound images for effective diagnosis and ***,achieving precise segmentation remains a challenge due to various factors,including scattering noise,low contrast,and limited resolution in ultrasound *** existing segmentation models have made progress,they still suffer from several limitations,such as high error rates,low generalizability,overfitting,limited feature learning capability,*** address these challenges,this paper proposes a Multi-level Relation Transformer-based U-Net(MLRT-UNet)to improve thyroid nodule *** MLRTUNet leverages a novel Relation Transformer,which processes images at multiple scales,overcoming the limitations of traditional encoding *** transformer integrates both local and global features effectively through selfattention and cross-attention units,capturing intricate relationships within the *** approach also introduces a Co-operative Transformer Fusion(CTF)module to combine multi-scale features from different encoding layers,enhancing the model’s ability to capture complex patterns in the ***,the Relation Transformer block enhances long-distance dependencies during the decoding process,improving segmentation *** results showthat the MLRT-UNet achieves high segmentation accuracy,reaching 98.2% on the Digital Database Thyroid Image(DDT)dataset,97.8% on the Thyroid Nodule 3493(TG3K)dataset,and 98.2% on the Thyroid Nodule3K(TN3K)*** findings demonstrate that the proposed method significantly enhances the accuracy of thyroid nodule segmentation,addressing the limitations of existing models.
As the application of Industrial Robots(IRs)scales and related participants increase,the demands for intelligent Operation and Maintenance(O&M)and multi-tenant collaboration *** methods could no longer cover the r...
详细信息
As the application of Industrial Robots(IRs)scales and related participants increase,the demands for intelligent Operation and Maintenance(O&M)and multi-tenant collaboration *** methods could no longer cover the requirements,while the Industrial Internet of Things(IIoT)has been considered a promising ***,there’s a lack of IIoT platforms dedicated to IR O&M,including IR maintenance,process optimization,and knowledge *** this context,this paper puts forward the multi-tenant-oriented ACbot platform,which attempts to provide the first holistic IIoT-based solution for O&M of *** on an information model designed for the IR field,ACbot has implemented an application architecture with resource and microservice management across the cloud and multiple *** this basis,we develop four vital applications including real-time monitoring,health management,process optimization,and knowledge *** have deployed the ACbot platform in real-world scenarios that contain various participants,types of IRs,and *** date,ACbot has been accessed by 10 organizations and managed 60 industrial robots,demonstrating that the platform fulfills our ***,the application results also showcase its robustness,versatility,and adaptability for developing and hosting intelligent robot applications.
Video colorization aims to add color to grayscale or monochrome *** existing methods have achieved substantial and noteworthy results in the field of image colorization,video colorization presents more formidable obst...
详细信息
Video colorization aims to add color to grayscale or monochrome *** existing methods have achieved substantial and noteworthy results in the field of image colorization,video colorization presents more formidable obstacles due to the additional necessity for temporal ***,there is rarely a systematic review of video colorization *** this paper,we aim to review existing state-of-the-art video colorization *** addition,maintaining spatial-temporal consistency is pivotal to the process of video *** gain deeper insight into the evolution of existing methods in terms of spatial-temporal consistency,we further review video colorization methods from a novel *** colorization methods can be categorized into four main categories:optical-flow based methods,scribble-based methods,exemplar-based methods,and fully automatic ***,optical-flow based methods rely heavily on accurate optical-flow estimation,scribble-based methods require extensive user interaction and modifications,exemplar-based methods face challenges in obtaining suitable reference images,and fully automatic methods often struggle to meet specific colorization *** also discuss the existing challenges and highlight several future research opportunities worth exploring.
This paper proposes a novel method for early action prediction based on 3D skeleton data. Our method combines the advantages of graph convolutional networks (GCNs) and adversarial learning to avoid the problems of ins...
详细信息
This paper proposes a novel method for early action prediction based on 3D skeleton data. Our method combines the advantages of graph convolutional networks (GCNs) and adversarial learning to avoid the problems of insufficient spatio-temporal feature extraction and difficulty in predicting actions in the early execution stage of actions. In our method, GCNs, which have outstanding performance in the field of action recognition, are used to extract the spatio-temporal features of the skeleton. The model learns how to optimize the feature distribution of partial videos from the features of full videos through adversarial learning. Experiments on two challenging action prediction datasets show that our method performs well on skeleton-based early action prediction. State-of-the-art performance is reported in some observation ratios.
Task offloading is an important concept for edge computing and the Internet of Things(IoT)because computationintensive tasksmust beoffloaded tomore resource-powerful remote *** has several advantages,including increas...
详细信息
Task offloading is an important concept for edge computing and the Internet of Things(IoT)because computationintensive tasksmust beoffloaded tomore resource-powerful remote *** has several advantages,including increased battery life,lower latency,and better application performance.A task offloading method determines whether sections of the full application should be run locally or offloaded for execution *** offloading choice problem is influenced by several factors,including application properties,network conditions,hardware features,and mobility,influencing the offloading system’s operational *** study provides a thorough examination of current task offloading and resource allocation in edge computing,covering offloading strategies,algorithms,and factors that influence *** offloading and partial offloading strategies are the two types of offloading *** algorithms for task offloading and resource allocation are then categorized into two parts:machine learning algorithms and non-machine learning *** examine and elaborate on algorithms like Supervised Learning,Unsupervised Learning,and Reinforcement Learning(RL)under machine *** the non-machine learning algorithm,we elaborate on algorithms like non(convex)optimization,Lyapunov optimization,Game theory,Heuristic Algorithm,Dynamic Voltage Scaling,Gibbs Sampling,and Generalized Benders Decomposition(GBD).Finally,we highlight and discuss some research challenges and issues in edge computing.
The growing computing power,easy acquisition of large-scale data,and constantly improved algorithms have led to a new wave of artificial intelligence(AI)applications,which change the ways we live,manufacture,and do **...
详细信息
The growing computing power,easy acquisition of large-scale data,and constantly improved algorithms have led to a new wave of artificial intelligence(AI)applications,which change the ways we live,manufacture,and do *** with this development,a rising concern is the relationship between AI and human intelligence,namely,whether AI systems may one day overtake,manipulate,or replace *** this paper,we introduce a novel concept named hybrid human-artificial intelligence(H-AI),which fuses human abilities and AI capabilities into a unified *** presents a challenging yet promising research direction that prompts secure and trusted AI innovations while keeping humans in the loop for effective *** scientifically define the concept of H-AI and propose an evolution road map for the development of AI toward *** then examine the key underpinning techniques of H-AI,such as user profile modeling,cognitive computing,and human-in-the-loop machine ***,we discuss H-AI’s potential applications in the area of smart homes,intelligent medicine,smart transportation,and smart ***,we conduct a critical analysis of current challenges and open gaps in H-AI,upon which we elaborate on future research issues and directions.
With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of th...
详细信息
With recent advancements in robotic surgery,notable strides have been made in visual question answering(VQA).Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the *** limitation restricts the interpretative capacity of the VQA models and their abil-ity to explore specific image *** address this issue,this study proposes a grounded VQA model for robotic surgery,capable of localizing a specific region during answer *** inspiration from prompt learning in language models,a dual-modality prompt model was developed to enhance precise multimodal information ***,two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model.A visual complementary prompter merges visual prompt knowl-edge with visual information features to guide accurate *** textual complementary prompter aligns vis-ual information with textual prompt knowledge and textual information,guiding textual information towards a more accurate inference of the ***,a multiple iterative fusion strategy was adopted for comprehensive answer reasoning,to ensure high-quality generation of textual and grounded *** experimental results vali-date the effectiveness of the model,demonstrating its superiority over existing methods on the EndoVis-18 and End-oVis-17 datasets.
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1...
详细信息
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1) Instruction annotation quality: despite existing VLLMs exhibiting strong performance,instructions generated by those advanced VLLMs may still suffer from inaccuracies, such as hallucinations.(2) Instructions and image diversity: the limited range of instruction types and the lack of diversity in image data may impact the model's ability to generate diversified and closer to real-world scenarios outputs. To address these challenges, we construct a high-quality, diverse visual instruction tuning dataset MMInstruct,which consists of 973k instructions from 24 domains. There are four instruction types: judgment, multiplechoice, long visual question answering, and short visual question answering. To construct MMInstruct, we propose an instruction generation data engine that leverages GPT-4V, GPT-3.5, and manual correction. Our instruction generation engine enables semi-automatic, low-cost, and multi-domain instruction generation at 1/6 the cost of manual construction. Through extensive experiment validation and ablation experiments,we demonstrate that MMInstruct could significantly improve the performance of VLLMs, e.g., the model fine-tuning on MMInstruct achieves new state-of-the-art performance on 10 out of 12 benchmarks. The code and data shall be available at https://***/yuecao0119/MMInstruct.
暂无评论