This paper addresses the challenges of optimizing task scheduling for a distributed, task-based execution model in OpenMP for cluster computing environments. Traditional OpenMP implementations are primarily designed f...
详细信息
ISBN:
(数字)9798350355543
ISBN:
(纸本)9798350355550
This paper addresses the challenges of optimizing task scheduling for a distributed, task-based execution model in OpenMP for cluster computing environments. Traditional OpenMP implementations are primarily designed for shared-memory parallelism and offer limited control over task scheduling. However, improved scheduling mechanisms are critical to achieving performance and portability in distributed and heterogeneous environments. OpenMP Cluster (OMPC) was introduced to overcome these limitations, extending OpenMP with the Heterogeneous Earliest Finish Time (HEFT) task scheduling algorithm tailored for large-scale systems. To improve scheduling and enable better system utilization, the runtime system must resolve challenges such as changes in the application balance, amount of parallelism, and varying communication *** work presents three key contributions: first, the refactoring of the OMPC runtime to unify task scheduling across devices and hosts; second, the optimization of the HEFT-based scheduling algorithm to ensure efficient task execution in distributed environments; and third, an extensive evaluation of Work Stealing and HEFT scheduling mechanisms in real-world clusters. While the HEFT implementation in OMPC is not fully optimized, this work provides a significant step toward improving distributed task scheduling in cluster computing, offering insights and incremental advancements that support the development of scalable and high-performance applications. Results show improvements of up to 24% in scheduling time while opening up to more extensions in the scheduling methods.
Different from intelligent analysis applications in traditional small-scale data scenarios, intelligent analysis applications in big data scenarios are no longer a single AI algorithm model problem, but a fusion of bi...
详细信息
Writing efficient, scalable, and portable HPC synthetic aperture radar (SAR) applications is increasingly challenging due to the growing diversity and heterogeneity in distributed systems. Considerable developer and c...
详细信息
ISBN:
(纸本)9781665423694
Writing efficient, scalable, and portable HPC synthetic aperture radar (SAR) applications is increasingly challenging due to the growing diversity and heterogeneity in distributed systems. Considerable developer and computational resources are often spent to port applications to new HPC platforms and architectures, which is both time consuming and expensive. Domain-specific languages have been shown to be highly productive for development effort, but additionally achieving both scalable computational efficiency and platform portability remains challenging. The Halide programming language is both productive and efficient for dense data processing, supports common CPU architectures and heterogeneous resources like GPUs, and has previously been extended for distributedprocessing. We propose to use a distributed Halide implementation for scalable and heterogeneous HPC SAR processing. We implement a backprojection algorithm for SAR image reconstruction and demonstrate scalability on the OLCF Summit supercomputer up to 1,024 compute nodes (43,008 cores, each with 4 hardware threads) with a large 32,768x32,768 dataset, and up to 8 distributed GPUs with a 8,192x 8,192 dataset. Our results show excellent scaling and portability to heterogeneous resources, and motivate additional improvements in Halide to better support distributed high-performance signal processing.
Medical image analysis plays a crucial role in modern medicine, serving as an indispensable tool for clinicians to enhance diagnostic precision and establish personalized treatment programs. Our team has devised an in...
Medical image analysis plays a crucial role in modern medicine, serving as an indispensable tool for clinicians to enhance diagnostic precision and establish personalized treatment programs. Our team has devised an innovative method for medical image analysis, employing distributed deep learning techniques for both COVID-19 and brain tumor datasets. Traditional approaches to medical image analysis often demand significant computational resources and extensive data, rendering their implementation challenging in real-world scenarios. In order to overcome these obstacles, we propose a distributed deep learning framework that enables parallelprocessing of medical images across multiple nodes. Our framework leverages convolutional neural networks and transfer learning to achieve exceptional accuracy in the detection of COVID-19 and brain tumors from inputted images, respectively. Through the evaluation of an extensive dataset of medical images, we have successfully demonstrated the efficiency of our approach in accurately detecting these conditions
Alignment of deformed medical images is a challenging task in which the images are subject to large deformations due to some constraints in the image acquisition process. In the field of deformation image alignment, c...
详细信息
ISBN:
(数字)9798331544577
ISBN:
(纸本)9798331544584
Alignment of deformed medical images is a challenging task in which the images are subject to large deformations due to some constraints in the image acquisition process. In the field of deformation image alignment, convolutional neural network-based methods have a local acceptance domain and cannot handle deformed images. In this regard, we propose a new multi-scale parallel architecture model of Transformer-CNN, which takes advantage of Transformer's goodness at modeling remote dependencies, while simplifying the module and reducing the number of model parameters. The multiscale iterative structure of CNN can acquire local information and connect feature information at different scales. Experimental results on the medical ear canal dataset show that the method has excellent performance in terms of alignment accuracy, outperforming existing deep learning and some traditional alignment methods, while reducing the number of model parameters to some extent, making it lightweight.
The main function of muscles is to contract when receiving electrical signals from nerves, resulting in the movement or deformation of related bones and tissues. Muscles possess the characteristics of flexibility and ...
详细信息
We address the problem of learning new classes for semantic segmentation models from few examples, which is challenging because of the following two reasons. Firstly, it is difficult to learn from limited novel data t...
We address the problem of learning new classes for semantic segmentation models from few examples, which is challenging because of the following two reasons. Firstly, it is difficult to learn from limited novel data to capture the underlying class distribution. Secondly, it is challenging to retain knowledge for existing classes and to avoid catastrophic forgetting. For learning from limited data, we propose a pseudo-labeling strategy to augment the few-shot training annotations in order to learn novel classes more effectively. Given only one or a few images labeled with the novel classes and a much larger set of unlabeled images, we transfer the knowledge from labeled images to unlabeled images with a coarse-to-fine pseudo-labeling approach in two steps. Specifically, we first match each labeled image to its nearest neighbors in the unlabeled image set at the scene level, in order to obtain images with a similar scene layout. This is followed by obtaining pseudo-labels within this neighborhood by applying classifiers learned on the few-shot annotations. In addition, we use knowledge distillation on both labeled and unlabeled data to retain knowledge on existing classes. We integrate the above steps into a single convolutional neural network with a unified learning objective. Extensive experiments on the Cityscapes and KITTI datasets validate the efficacy of the proposed approach in the self-driving domain. Code is available from https://***/ChasonJiang/FSCILSS.
RGBT tracking has attracted increasing attention since RGB and thermal infrared data have strong complementary advantages, which could make trackers all-day and all-weather work. Existing works usually focus on extrac...
详细信息
RGBT tracking has attracted increasing attention since RGB and thermal infrared data have strong complementary advantages, which could make trackers all-day and all-weather work. Existing works usually focus on extracting modality-shared or modality-specific information, but the potentials of these two cues are not well explored and exploited in RGBT tracking. In this paper, we propose a novel multi-adapter network to jointly perform modality-shared, modality-specific and instance-aware target representation learning for RGBT tracking. To this end, we design three kinds of adapters within an end-to-end deep learning framework. In specific, we use the modified VGG-M as the generality adapter to extract the modality-shared target representations. To extract the modality-specific features while reducing the computational complexity, we design a modality adapter, which adds a small block to the generality adapter in each layer and each modality in a parallel manner. Such a design could learn multilevel modality-specific representations with a modest number of parameters as the vast majority of parameters are shared with the generality adapter. We also design instance adapter to capture the appearance properties and temporal variations of a certain target. Moreover, to enhance the shared and specific features, we employ the loss of multiple kernel maximum mean discrepancy to measure the distribution divergence of different modal features and integrate it into each layer for more robust representation learning. Extensive experiments on two RGBT tracking benchmark datasets demonstrate the outstanding performance of the proposed tracker against the state-of-the-art methods.
Recommender systems for social networking have recently used machine learning (ML) methods. This paper describes a multi-agent system that simulates a Twitter recommender system to give users a list of helpful suggest...
详细信息
ISBN:
(数字)9798350368109
ISBN:
(纸本)9798350368116
Recommender systems for social networking have recently used machine learning (ML) methods. This paper describes a multi-agent system that simulates a Twitter recommender system to give users a list of helpful suggestions, i.e., a list of users who are followed, or they want to follow. When processing data in parallel on multi-node distributed systems, a machine learning technique (such as a neural network or multilayer perceptron) is evaluated for scalability using a simulator. A multi-agent modelling simulates the dispersed environment. The quantity of nodes, the method used within the model recommender framework, and the real followers and followees data are the first parameters that need to be configured on the simulator. The experimental findings were acquired on three different datasets to evaluate the speed and precision of a simulated recommender system when evaluating the machine learning method under various conditions.
With the recent improvement in machine learning and deep learning technologies due to an increase in computation power, its uses for imageprocessing and computer vision have also increased. The style of the image, th...
With the recent improvement in machine learning and deep learning technologies due to an increase in computation power, its uses for imageprocessing and computer vision have also increased. The style of the image, through orthodox methods, has been changed with the help of filters that are not capable of applying the style from only a few sets of options, with no consideration for semantic or contextual data. The transfer of style is also limited to the entire image, with no provisions for applying style to only a selected portion of it. Through advanced deep learning technologies, style transfer can be achieved with semantic and contextual accuracy while also providing the ability to apply to a selected portion of the image. The style can also be generated through the use of images generated for the text using deep learning.
暂无评论