Withthe advent of next-generation sequencing technology, sequencing costs have fallen sharply compared to the previous sequencing technologies. Genomic big data has become the significant big data application. In the...
详细信息
ISBN:
(纸本)9783030389611;9783030389604
Withthe advent of next-generation sequencing technology, sequencing costs have fallen sharply compared to the previous sequencing technologies. Genomic big data has become the significant big data application. In the face of growing genomic data, its storage and migration face enormous challenges. therefore, researchers have proposed a variety of genome compression algorithms, but these algorithms cannot meet the processing requirements for large amount of biological data and high processing speed. this manuscript proposes a parallel and distributed referential genome compression algorithm-Fast distributed Referential Compression (FastDRC). this algorithm compresses a large number of genomic sequences in parallel under the Apache Hadoop distributedcomputing framework. Experiments show that the compression efficiency of the FastDRC is greatly improved when it compresses large quantities of genomic data. Moreover, FastDRC leads to the only distributedcomputing method known to us in the field of genome compression. the source code for FastDRC can be obtained from this link: https://***/GhostCCCathenry/FastDRC.
Graph partitioning plays a very fundamental and important role in a distributed graph computing (DGC) framework, because it determines the communication cost and workload balance among computing nodes. Existing soluti...
详细信息
ISBN:
(纸本)9783030389918;9783030389901
Graph partitioning plays a very fundamental and important role in a distributed graph computing (DGC) framework, because it determines the communication cost and workload balance among computing nodes. Existing solutions are mainly heuristic-based but unfortunately cannot achieve partitioning quality, load balance, and speed at the same time. In this paper, we propose Sliding-Window Reordering (SWR), a streaming vertex-cut graph partitioning algorithm, that introduces a pre-partitioning window to re-order incoming edges, making it much easier for a greedy strategy to maintain balance while optimizing edge assignment at a minimal computational cost. We analytically and experimentally evaluate SWR on several real-world and synthetic graphs and show that it achieves the best overall performance. Compared with HDRF, the state-of-the-art at present, the partitioning speed is increased by 3-20 times, and the partitioning quality is increased by 15% to 30% on average when achieving balanced load among all nodes.
Power semiconductor installations in medium-voltage power grids are usually built by numerous particular power converter units containing large number of power modules. Such large installations place high demands on c...
详细信息
Power semiconductor installations in medium-voltage power grids are usually built by numerous particular power converter units containing large number of power modules. Such large installations place high demands on control systems including precise timing and signal distribution across the whole system. this paper deals with a proposed distributed modular control system designed for STATCOM installed in a MV power grid of 22 kV. the proposed control system is designed for a real-time control of the converter with high-speed interference-immune communication, reliable synchronization of numerous PWM outputs and synchronous immediate blocking of fire pulses for safe central stop in case of error handling.
To speedup the accesses to massive amount of data, heterogeneous architecture has been widely adopted in the mainstream storage system. In such systems, load imbalance and scheduler overhead are the primary factors th...
详细信息
ISBN:
(纸本)9783030389611;9783030389604
To speedup the accesses to massive amount of data, heterogeneous architecture has been widely adopted in the mainstream storage system. In such systems, load imbalance and scheduler overhead are the primary factors that slow down the I/O performance. In this paper, we propose an effective file scheduling strategy HSPP that includes statistic based file classification, partition with erasure coding and adaptive data placement to optimize load balance and read latency on the distributed heterogeneous storage system. the experiment results show that HSPP is superior than existing strategies in terms of load balance, read latency, and scheduling overhead.
Hash table is a key component in a number of AI algorithms such as Graph Convolutional Neural Networks, Approximate Nearest Neighbor Search, Bag-of-Words based Text Mining algorithms, etc. Efficient implementation of ...
详细信息
ISBN:
(纸本)9781665423021
Hash table is a key component in a number of AI algorithms such as Graph Convolutional Neural Networks, Approximate Nearest Neighbor Search, Bag-of-Words based Text Mining algorithms, etc. Efficient implementation of hash tables is needed for a wide range of AI applications. High bandwidth memory (HBM), which provides significantly higher memory bandwidththan traditional DDR, has recently gained popularity. In this work, we propose a high throughput parallel hash table targeting HBM-enabled FPGAs. Our design is tailored for HBM architecture, allowing flexible and balanced mapping between processing engines and HBM channels at design time given query distribution and hash table properties (key/value length, collision handling, and hash table size). We further develop a novel data organization and query flow which enable our accelerator to scale up to 16 processing engines (PEs). the proposed design supports parallel search, insert, and delete queries. Experimental results demonstrate that our hash table accelerator can achieve up to 3575 million operations per second (MOPS) for search-only queries and up to 1470 MOPS for 50%/50% distributed search/update queries on HBM-enabled FPGAs. It achieves better throughput than the state-of-the-art GPU and FPGA designs by up to 3.5 x and 3.2 x respectively.
the task of 3D ICs layout design involves the assembly of millions of components taking into account many different requirements and constraints such as topological, wiring or manufacturability ones. It is a NP-hard p...
详细信息
Withthe increasing number of cores in modern systems, dynamic concurrency throttling (DCT) and turbo-boosting techniques are becoming a solution to better use the hardware resources. While DCT techniques tune the num...
详细信息
Withthe increasing number of cores in modern systems, dynamic concurrency throttling (DCT) and turbo-boosting techniques are becoming a solution to better use the hardware resources. While DCT techniques tune the number of running threads, boosting techniques speed up sequential phases or unbalanced threads. However, as each region of an application may behave differently, optimizing both knobs is not straightforward. Hence, we propose two strategies that apply DCT and turbo-boosting: DBF, which aims to find an ideal configuration for each parallel/sequential region, and DBC, which considers the combination of parallel/sequential regions during the optimization. We show that DBF and DBC improve the EDP by up to 19% and 27% compared to a DCT-only strategy and by up to 95% and 96% compared to a Boost-only technique. We also show that DBF is more suitable for applications with high variability in the CPU workload, while DBC is better when there is low workload variability.
In this paper, we aim to alleviate traffic congestion on each link online by traffic signal control for local road transport network to be controlled. We first propose a transportation model with time delay system con...
详细信息
ISBN:
(纸本)9788993215182
In this paper, we aim to alleviate traffic congestion on each link online by traffic signal control for local road transport network to be controlled. We first propose a transportation model with time delay system considering the travel time of the vehicle and calculate the optimal signal phases with variable cycle length and offset using distributed model predictive control for that model to reduce congestion rate on local roads. We then explain algorithms for distributed control that each intersection solves the optimization problem in parallel sharing information only with adjacent intersection. Finally, the effectiveness of the proposed method is confirmed by numerical simulation.
In this paper, we propose Strark-H, a storage and query strategy for large-scale spatial data based on Spark, to improve the response speed of spatial query by considering the spatial location and category keywords of...
详细信息
ISBN:
(纸本)9783030389918;9783030389901
In this paper, we propose Strark-H, a storage and query strategy for large-scale spatial data based on Spark, to improve the response speed of spatial query by considering the spatial location and category keywords of spatial objects. Firstly, we define a custom InputFormat class to make spark natively understand the content of Shapefile, which is a common file format to store spatial data. then, we put forward a partition and indexing method for spatial storage, based on which spatial data is partitioned unevenly according to the spatial position, which ensures the size of each partition does not exceed the block in HDFS and preserve the spatial proximity of spatial objects in the cluster. Moreover, a secondary index is generated, including global index based on spatial position for all partitions as well as local index based on category of spatial objects. Finally, we design a new data loading and query scheme based on Strark-H for spatial queries including range query, K-NN query and spatial join query. Extensive experiments on OSM show that Strark-H can be applied to Spark to natively support spatial query and storage with efficiency and scalability.
the bulk synchronous parallel (BSP) is a celebrated synchronization model for general-purpose parallelcomputingthat has successfully been employed for distributed training of machine learning models. A prevalent sho...
详细信息
ISBN:
(纸本)9781728146034
the bulk synchronous parallel (BSP) is a celebrated synchronization model for general-purpose parallelcomputingthat has successfully been employed for distributed training of machine learning models. A prevalent shortcoming of the BSP is that it requires workers to wait for the straggler at every iteration. To ameliorate this shortcoming of classic BSP, we propose ELASTICBSP a model that aims to relax its strict synchronization requirement. the proposed model offers more flexibility and adaptability during the training phase, without sacrificing on the accuracy of the trained model. We also propose an efficient method that materializes the model, named ZIPLINE. the algorithm is tunable and can effectively balance the tradeoff between quality of convergence and iteration throughput, in order to accommodate different environments or applications. A thorough experimental evaluation demonstrates that our proposed ELASTICBSP model converges faster and to a higher accuracy than the classic BSP. It also achieves comparable (if not higher) accuracy than the other sensible synchronization models.
暂无评论