The Yin-He global spectral model (YHGSM), embodies a parallel semi-Lagrangian solver and has two schemes implemented: maximum wind speed scheme and on-demand communication scheme. Maximum wind speed communication adop...
详细信息
ISBN:
(纸本)9781665435741
The Yin-He global spectral model (YHGSM), embodies a parallel semi-Lagrangian solver and has two schemes implemented: maximum wind speed scheme and on-demand communication scheme. Maximum wind speed communication adopts a single and fixed data structure, which has a large communication overhead. Although the overhead of on-demand communication is reduced, it is still pretty huge. In this paper, a novel adaptable approach is proposed in which a monthly maximum wind speed is used in the YHGSM. This approach reduces the difference between the actual wind speed and the maximum wind speed used in the model;in turn, the communication overhead in the trajectory computation is further reduced. Experiments show that in the maximum wind speed scheme and on-demand schemes, the communication overheads with the adaptive maximum wind speed are significantly reduced. In addition, in a ten-day forecast with the on-demand communication scheme, the total overhead for the semi-Lagrangian computing and the total parallel execution time are also both reduced, and the reduction ratio increases as the number of nodes increases.
Analyzing parallel programs has become increasingly difficult due to the immense amount of information collected on large systems. In this scenario, cluster analysis has been proved to be a useful technique to reduce ...
详细信息
ISBN:
(纸本)9780769546766
Analyzing parallel programs has become increasingly difficult due to the immense amount of information collected on large systems. In this scenario, cluster analysis has been proved to be a useful technique to reduce the amount of data to analyze. A good example is the use of the density-based cluster algorithm DBSCAN to identify similar single program multiple data (SPMD) computing phases in message-passing applications. This structure detection simplifies the analyst work as the whole information available is reduced to a small set of clusters. However, DBSCAN presents two major problems: it is very sensitive to its parametrization and is not capable of correctly detect clusters when the data set has different densities across the data space. In this paper, we introduce the Aggregative Cluster Refinement, an iterative algorithm that produces more accurate structure detections of SPMD phases than DBSCAN. In addition, it is able to detect clusters with different densities.
Container technologies are seeing wider use at advanced computing facilities for managing highly complex applications that must execute at multiple sites. However, in a distributed high throughput computing setting, t...
详细信息
ISBN:
(纸本)9781728168760
Container technologies are seeing wider use at advanced computing facilities for managing highly complex applications that must execute at multiple sites. However, in a distributed high throughput computing setting, the unrestricted use of containers can result in the container explosion problem. If a new container image is generated for each variation of a job dispatched to a site, shared storage is soon exceeded. On the other hand, if a single large container image is used to meet multiple needs, the size of that container may become a problem for storage and transport. To address this problem, we observe that many containers have an internal structure generated by a structured package manager, and this information could be used to strategically combine and share container images. We develop LANDLORD to exploit this property and evaluate its performance through a combination of simulation studies and empirical measurement of high energy physics applications.
The linear structure of blockchain ensures data security and credibility. But at the same time, it has become the performance bottleneck of the entire system, limiting the growth of the transaction processing rate. Th...
详细信息
ISBN:
(纸本)9781665435741
The linear structure of blockchain ensures data security and credibility. But at the same time, it has become the performance bottleneck of the entire system, limiting the growth of the transaction processing rate. The inherent concurrency of directed acyclic graph (DAG) technology solves these problems, but it also brings new problems: blocks total order and ledger consistency. In this paper, we propose a Layer-based DAG (L-DAG) blockchain, which avoids the complex total ordering algorithms by keeping blocks in order between and within layers during the generation process. And we introduce the proportional-integral-derivative (PID) controller to dynamically control the width of layers by using in-degree and out-degree of blocks to achieve the consistency of ledger. We extend Practical Byzantine Fault Tolerance (PBFT) protocol in parallel based on the L-DAG structure and successfully apply it to consortium blockchain scenarios. The L-DAG blockchain structure is implemented through Hyperledger Fabric. Experimental results show that as the number of consensus threads increases, the TPS of L-DAG-based PBFT can grow with near-linear efficiency.
With the development of artificial intelligence(AI) applications, a large number of data are generated from mobile or IoT devices at the edge of the network. Deep learning tasks are executed to obtain effective inform...
详细信息
ISBN:
(纸本)9781665435741
With the development of artificial intelligence(AI) applications, a large number of data are generated from mobile or IoT devices at the edge of the network. Deep learning tasks are executed to obtain effective information in the user data. However, the edge nodes are heterogeneous and the network bandwidth is limited in this case, which will cause general distributed deep learning to be inefficient. In this paper, we propose Group Synchronous parallel (GSP), which uses a density-based algorithm to group edge nodes with similar training speeds together. In order to eliminate stragglers, group parameter servers are responsible for coordinating communication of nodes in the group with Stale Synchronous parallel and aggregating the gradients of these nodes. And a global parameter server is responsible for aggregating the gradients from the group parameter servers to update the global model. To save network bandwidth, we further propose Grouping Dynamic Sparsification (GDS). It adjusts the gradient sparsification rate of nodes dynamically based on GSP so as to differentiates the communication volume and makes the training speed of all nodes tend to be the same. We evaluate GSP and GDS's performance on LeNet-5, ResNet, VGG, and Seq2Seq with Attention. The experimental results show that GSP speedups the training by 45% similar to 120% with 16 nodes. GDS on top of GSP can make up for some test accuracy loss, up to 0.82% for LeNet-5.
Real-time systems are commonly found in safety-critical fields requiring the system to be predictable to reduce validation overheads. However, the contradiction between the need for high throughput and predictability ...
详细信息
ISBN:
(纸本)9781538637906
Real-time systems are commonly found in safety-critical fields requiring the system to be predictable to reduce validation overheads. However, the contradiction between the need for high throughput and predictability in these systems has become sharpening from the view of concurrent applications. In this paper, we propose a Predictable Servant-based Execution Model (PSEM) to regulate both communication and computation of tasks to be predictable in an efficient way. In PSEM, by extending the Logical Execution Time (LET) model with the Servant concept, periodic responsiveness is improved without the erosion on the foundation of predictability. Evaluation results on the implementation of the runtime system demonstrate that PSEM achieves a speedup of 7.2X compared to existing runtime, and can provide time-aware applications with more precise timing service.
Modern sensor technologies, internet and advanced irrigation equipment allow a relative precise control of agricultural irrigation that leads to high water-use efficiency. However, the core control algorithms that mak...
详细信息
ISBN:
(纸本)9781538637906
Modern sensor technologies, internet and advanced irrigation equipment allow a relative precise control of agricultural irrigation that leads to high water-use efficiency. However, the core control algorithms that make use of these technologies have not been well studied. In this work, a reinforcement learning based irrigation control technique is investigated. The delayed reward of crop yield is handled by the temporal difference technique. The learning process can be based on both off-line simulation and real data from sensors and crop yield. Neural network based fast models for soil water level and crop yield are developed to improve the scalability of learning. Simulations for various geographic locations and crop types show that the proposed method can significantly increase net return considering both crop yield and water expense.
In the previous research, the assessment of author's influence is mainly based on the historical information of literature, such as the number of author's publications and times cited, and the reference relati...
详细信息
ISBN:
(纸本)9781538637906
In the previous research, the assessment of author's influence is mainly based on the historical information of literature, such as the number of author's publications and times cited, and the reference relationship. However, the author influence is not only reflected in the amount of static data, but also in the behavior that the author's point of view is noticed and communicated. Meanwhile, the influence spreads through the relational path of cooperation and citation between authors, on which the authors should have similar academic interests. Therefore, this paper proposed an influence spreading model with the author's co-citation interest similarity and the path of citation and cooperation. On the basis of this, a novel algorithm of influence spreading prediction is designed, and carried on the experiment verification using the public literature information resources. The results of AUC indicator show the effectiveness on the proposed method.
The Travelling Salesman Problem (TSP) is one of the typical combinatorial optimization problems that is easy to describe but hard to solve. In this work, we present a novel solution that integrates a genetic algorithm...
详细信息
ISBN:
(纸本)9781538637906
The Travelling Salesman Problem (TSP) is one of the typical combinatorial optimization problems that is easy to describe but hard to solve. In this work, we present a novel solution that integrates a genetic algorithm, local-search heuristics, and a greedy algorithm. For the genetic algorithm we keep the evolutionary technique to generate children from parents, which uses operators like mutation, selection of the most fitted element, and crossover, but the latter is enhanced with a local-search heuristic. We also use the local search heuristic for its strong climbing ability, as well as to find local optima efficiently in the TSP space. The greedy algorithm is used to generate new greedy children from parents. The experimental evaluation shows that the optimization algorithm presented provides higher quality solutions for TSP with respect to previous genetic algorithms, within reasonable computational time.
Interprocessor communication times can be a significant fraction of the overall execution time required for data parallelapplications. Large communication to computation ratios of the tasks performed by these applica...
详细信息
ISBN:
(纸本)0818684038
Interprocessor communication times can be a significant fraction of the overall execution time required for data parallelapplications. Large communication to computation ratios of the tasks performed by these applications results in suboptimal performance when executed on data parallel architectures. We present an alternate architectural framework, referred to as concurrently communicating SIMD (CCSIMD), which maintains the SIMD execution model, while introducing a small degree of task parallelism to exploit the communication concurrency. We introduce three different implementations of our architectural framework, and illustrate their effect on a suite of data parallelapplications. Results show that CCSIMD architectures can provide a cost-effective way to hide communication latency in data parallelapplications that can result in an increase in the performance of these applications.
暂无评论