In recent years, depression, as a serious mental illness, has received widespread attention from various sectors of society. How to identify depressive emotions in a timely manner and detect depression has become an u...
详细信息
Transformer-like architectures, which are the model of choice in the field of natural language processing, have recently been adapted to computer vision (CV) fields and demonstrated remarkable effectiveness on various...
详细信息
Human-object interaction (HOI) detection is a meaningful research topic on human activity understanding. Recent works have made significant progress by focusing on efficient triplet matching and leveraging image-wide ...
详细信息
ISBN:
(纸本)9781665475921
Human-object interaction (HOI) detection is a meaningful research topic on human activity understanding. Recent works have made significant progress by focusing on efficient triplet matching and leveraging image-wide features based on encoder-decoder architecture. However, the ability to gather relevant contextual information about human is limited and different sub-tasks in HOI detection are not differentiated by specific decoupling in previous methods. To this end, we propose a new transformer-based method for HOI detection, namely, Mask-Guided Transformer (MGT). Our model, which is composed of five parallel decoders with a shared encoder, not only emphasizes interactive regions by applying body features, but also disentangles the prediction of instance and interaction. We achieve a favorable result at 63.3 mAP on the well-known HOI detection dataset V-COCO.
5G Core network (5GC) employs a Service Based Architecture (SBA). This architecture decomposes the control plane into multiple independent Network Functions (NFs). NFs open interfaces to provide services to other NFs,...
详细信息
Deep learning models for food image classification rely on vast amounts of data to effectively recognize and differentiate between various food items. However, training these models on such extensive datasets presents...
详细信息
ISBN:
(数字)9798331519872
ISBN:
(纸本)9798331519889
Deep learning models for food image classification rely on vast amounts of data to effectively recognize and differentiate between various food items. However, training these models on such extensive datasets presents significant computational challenges. Big data and distributed training are inherently complementary in addressing these challenges. Big data involves large volumes of information that traditional single-machine processing cannot handle efficiently. distributed training offers a solution by enabling machine learning models to scale across multiple machines or nodes, thereby more effectively processing and learning from extensive datasets. By partitioning tasks into smaller chunks processed in parallel, distributed training optimizes computational resources, ensuring efficient analysis of big data while maintaining fault tolerance and performance scalability. This paper explores various strategies for distributed training and communication between GPUs. We compare the performance and utilization of a single P100 GPU versus two T4 GPUs on the Food-11 dataset, focusing on accuracy and training time. Our results indicate that using two GPUs yields slightly better performance than a single GPU, reducing the training time by 125 seconds, decreasing the validation dataset loss by 0.02, and increasing the accuracy by 1%. The overall accuracy achieved is 92%, with a corresponding loss of 0.28.
Operator fusion is essentially and widely used in a large number of matrix computation systems in science and industry. The existing distributed operator fusion methods focus on only either low communication cost with...
详细信息
ISBN:
(纸本)9781450392495
Operator fusion is essentially and widely used in a large number of matrix computation systems in science and industry. The existing distributed operator fusion methods focus on only either low communication cost with the risk of out of memory or large-scale processing with high communication cost. We propose a distributed elastic fused operator called Cuboid-based Fused Operator (CFO) that achieves both low communication cost and large-scale processing. We also propose a novel fusion plan generator called Cuboid-based Fusion plan Generator (CFG) that finds a fusion plan to fuse more operators including large-scale matrix multiplication. We implement a fast distributed matrix computation engine called FuseME by integrating both CFO and CFG seamlessly. FuseME outperforms the state-of-the-art systems including SystemDS by orders of magnitude.
To accurately evaluate the patient’s condition, medical workers usually need to register multiple pathological images of the lesion site samples. Using computer technology to assist in registration work can effective...
详细信息
ISBN:
(数字)9798350391954
ISBN:
(纸本)9798350391961
To accurately evaluate the patient’s condition, medical workers usually need to register multiple pathological images of the lesion site samples. Using computer technology to assist in registration work can effectively improve the efficiency of doctors analyzing pathological images. One of the most advanced methods currently is the Virtual Alignment of Pathology image Series method, which is a multi-staining digital pathology image registration method that combines global and local calculations. However, this method may encounter certain biases when processingimages with significant angle differences. Through a detailed analysis of this method, this article proposes an improvement plan which optimizes the acquisition of non-rigid registration mask images, enabling the method to obtain mask images more reasonably and achieve better registration results for images with significant angle differences. This provides more accurate judgment basis and helps doctors diagnose and develop treatment plans more accurately.
With the rapid development of the power industry, the scale of our country's power grid continues to expand, and transmission lines spread all over the country. The stability of transmission lines is one of the im...
详细信息
In imageprocessing, "image fusion" is the amalgamation of attributes and requirements from many images into a singular, more comprehensive representation. Multi-modal medical image fusion is a significant c...
详细信息
ISBN:
(数字)9798331521349
ISBN:
(纸本)9798331521356
In imageprocessing, "image fusion" is the amalgamation of attributes and requirements from many images into a singular, more comprehensive representation. Multi-modal medical image fusion is a significant category of image fusion. It entails the integration of medical images acquired from several modalities. This work utilizes computed tomography (CT) scans, Positron Emission Tomography (PET) and magnetic resonance imaging (MRI) as modalities. This study (M3IF-SBTS) aims to construct a multi modal medical image approach that combines Optimal Sub-band Tree Structuring (SBTS) and Principal Component Analysis (PCA) with MRI and CT images in a manner that maximizes the information content in the fused image. The SBTS is an advanced wavelet transform version of Discrete Wavelet Transform (DWT), where signal is filtered more times. The PCA provides dimensionality reduction and retains the relevant features. The wavelet coefficients are fused using PCA-based fusion method. This combination of SBTS and PCA provides superior results compared to using SBTS, DWT, or PCA as individual methods. This improves overall visual and parametric quality of fusion results than many compared methods.
Dynamic channel pruning is a technique aimed at reducing the theoretical computational complexity and inference latency of convolutional neural networks. Dynamic channel pruning methods introduce complex additional mo...
Dynamic channel pruning is a technique aimed at reducing the theoretical computational complexity and inference latency of convolutional neural networks. Dynamic channel pruning methods introduce complex additional modules for dynamically selecting channels for images. Due to the additional modules, dynamic channel pruning methods never achieve optimal acceleration effect in real world. To address this problem, we propose Consecutive Dynamic Channel Pruning (CDCP), a novel dynamic channel pruning framework unified for almost all dynamic pruning methods designed for continuous imageprocessing. The core idea of CDCP stems from our observation that adjusting the network for all frames in semantically continuous scenes is unnecessary since adjacent frames often share similar network structures in dynamic channel pruning. CDCP introduces a simple binary classifier to determine whether the network structure needs to be adjusted for a new frame. Our method can also be used for semantically non-continuous imageprocessing tasks with a slightly lower probability of model reuse. We validate the effectiveness of CDCP on three dynamic channel pruning methods and better acceleration effects are achieved when applied them with CDCP to the semantically continuous Waymo dataset, the nuScenes dataset, and the semantically discontinuous COCO dataset.
暂无评论