Edge computing is a rapidly developing research area known for its ability to reduce latency and improve energy efficiency, and it also has a potential for green computing. Many geographically distributed edge servers...
详细信息
The minimum spanning tree is a critical problem for many applications in network analysis, communication network design, and computer science. The parallel implementation of minimum spanning tree algorithms increases ...
详细信息
ISBN:
(纸本)9783030975494;9783030975487
The minimum spanning tree is a critical problem for many applications in network analysis, communication network design, and computer science. The parallel implementation of minimum spanning tree algorithms increases the simulation performance of large graph problems using high-performance computational resources. The minimum spanning tree algorithms generally use traditional parallel programming models for distributed and shared memory systems, like Massage Passing Interface or OpenMP. Furthermore, the partitioned global address space model offers new capabilities in the form of asynchronous computations on distributed shared memory, positively affecting the performance and scalability of the algorithms. The paper aims to present a new minimum spanning tree algorithm implemented in a partitioned global address space model. The experiments with diverse parameters have been conducted to study the efficiency of the asynchronous implementation of the algorithm.
The digital transformation opens new opportunities for enterprises to optimize their business processes by applying data-driven analysis techniques. For storing and organizing the required huge amounts of data, differ...
详细信息
With the rapid development of storage and network technology, emerging high-performance hardware is being widely applied to the distributed storage cluster. However, existing distributed storage systems employing mult...
详细信息
ISBN:
(数字)9798350317152
ISBN:
(纸本)9798350317169
With the rapid development of storage and network technology, emerging high-performance hardware is being widely applied to the distributed storage cluster. However, existing distributed storage systems employing multi-layer abstractions to provide table data services result in leaving high-speed hardware under-exploited. In this paper, we propose TEngine, a native distributed table storage engine designed for NVMe SSD and RDMA. The key is that TEngine removes the file abstraction to construct table structures on the device directly. For metadata service, TEngine designs a decoupled single metadata server, reducing distributed coordination, easing the burden on the metadata node, and enabling localized data node access. For data service, TEngine optimizes the parallel processing capability of NVMe devices by integrating upper-level multi-thread parallel operations with lower-level NVMe devices' parallel I/O processing. Moreover, TEngine introduces a periodic pull-based data synchronization approach to transform data pushing into periodic data pulling, which offloads the synchronization burden from the leader to the followers. The experimental results show that TEngine outperforms state-of-the-art distributed storage systems using the same hardware environment.
Automated Guided Vehicles are mobile robots de-signed for transportation purposes, and one of the most important problems associated with intelligent logistics is the problem of job scheduling. The goal is to find the...
Automated Guided Vehicles are mobile robots de-signed for transportation purposes, and one of the most important problems associated with intelligent logistics is the problem of job scheduling. The goal is to find the optimal allocation of job execution by the number of available devices. The problem can be resolved with a simulation in which the different scenarios are evaluated. However, creating such a simulation model requires a statistical description of the problem. In this paper, we implement the simulation model for the AGV environment. Based on the mathematical description of the model, the discrete event simulation is created using the Python programming language and the SimPy library. We use the simulation to compare the solution of the job scheduling problem using the simulated annealing and genetic algorithms.
Existing multi-FPGA architectures often leverage high-speed interconnect technologies to achieve higher performance by exploiting ample communication bandwidth. In this paper, we propose an effective mapping approach ...
Existing multi-FPGA architectures often leverage high-speed interconnect technologies to achieve higher performance by exploiting ample communication bandwidth. In this paper, we propose an effective mapping approach for accelerating CNNs on bandwidth-constrained distributed multi-FPGA architectures. We formulate the system-level mapping problem and then introduce a method based on Genetic Algorithm (GA) and Mixed-Integer Nonlinear Programming (MINLP) to attain optimal solutions.
With the digital transformation and increasing demand for informatization in the healthcare industry, traditional centralized information systems can no longer meet the requirements of high-concurrency access. However...
With the digital transformation and increasing demand for informatization in the healthcare industry, traditional centralized information systems can no longer meet the requirements of high-concurrency access. However, microservices architecture offers a flexible and scalable solution that effectively addresses the complexity and high-concurrency access demands of healthcare information systems. In this paper, we propose a microservices system that adopts a distributed architecture combined with high-concurrency processing mechanisms. The system disperses various modules across different servers as micro-services, and each microservice can independently handle requests. This distributed and parallel processing approach improves system responsiveness and throughput while reducing the risk of single-point failures. To validate the feasibility and performance of the system, we conducted a series of experiments and evaluations. The results demonstrate that the distributed healthcare information system based on microservices architecture performs exceptionally well in handling large-scale data and high-concurrency access. The system not only provides efficient data storage and retrieval capabilities but also exhibits good scalability and fault tolerance.
In this work, we introduce and study a set of tree-based algorithms for resources allocation considering group dependencies between their parameters. Real world distributed and high-performance computing systems often...
详细信息
Deep learning (DL) is being widely used to solve complex problems in scientific applications from diverse domains, such as weather forecasting, medical diagnostics, and fluid dynamics simulation. DL applications consu...
详细信息
ISBN:
(纸本)9781450397339
Deep learning (DL) is being widely used to solve complex problems in scientific applications from diverse domains, such as weather forecasting, medical diagnostics, and fluid dynamics simulation. DL applications consume a large amount of data using large-scale high-performance computing (HPC) systems to train a given model. These workloads have large memory and storage requirements that typically go beyond the limited amount of main memory available on an HPC server. This significantly increases the overall training time as the input training data and model parameters are frequently swapped to slower storage tiers during the training process. In this paper, we use the latest advancements in the memory subsystem, specifically Compute Express Link (CXL), to provide additional memory and fast scratch space for DL workloads to reduce the overall training time while enabling DL jobs to efficiently train models using data that is much larger than the installed system memory. We propose a framework, called DeepMemoryDL, that manages the allocation of additional CXL-based memory, introduces a fast intermediate storage tier, and provides intelligent prefetching and caching mechanisms for DL workloads. We implement and integrate DeepMemoryDL with a popular DL platform, TensorFlow, to show that our approach reduces read and write latencies, improves the overall I/O throughput, and reduces the training time. Our evaluation shows a performance improvement of up to 34% and 27% compared to the default TensorFlow platform and CXL-based memory expansion approaches, respectively.
distributed deep learning is becoming increasingly important due to the size of deep neural networks. The sheer volume of the input datasets used in the process can have a significant negative effect on the training t...
详细信息
暂无评论