Cloud storage is a vital component of cloud architecture, often utilizing distributed key-value stores like Amazon S3 and Google Cloud Storage for managing da-ta and metadata. these systems distribute data across node...
详细信息
ISBN:
(数字)9798331533991
ISBN:
(纸本)9798331534004
Cloud storage is a vital component of cloud architecture, often utilizing distributed key-value stores like Amazon S3 and Google Cloud Storage for managing da-ta and metadata. these systems distribute data across nodes using key range or consistent hashing, but they face challenges such as load imbalance and limited parallelism due to uneven data distribution and varying node performance. Cur-rent implementations, such as MongoDB, address these imbalances by migrating data between nodes but often neglect the characteristics of the underlying data structures, leading to increased overhead from costly delete and insert operations. To address these issues, this design leverages the properties of the LSM tree, a commonly used storage engine, to optimize data migration. the approach intro-duces hot zone prediction using nonlinear regression to accurately identify data hotspots based on key characteristics, insertion time, and TTL. A storage engine-aware migration system is developed to migrate grouped SSTable files rather than individual key-value pairs, significantly reducing migration overhead. Additionally, the data migration I/O process is offloaded using the NVMe-oF protocol, minimizing CPU involvement and preserving node performance. Implemented on mongo-rocks, this solution improves load balancing by directly moving SSTable files across nodes, enhancing efficiency and reducing performance degradation in distributed key-value stores.
the binning of metagenomic sequences is one of crucial steps in metagenomic projects which allow the study of uncultured organisms. Although the projects need to analyze a huge amount of data, most available binning m...
详细信息
the B+-tree is an important index in the fields of data warehousing and database management systems. Withthe development of new hardware technologies, the B+-tree needs to be revisited to fully take advantage of hard...
详细信息
ISBN:
(纸本)9783030602451;9783030602444
the B+-tree is an important index in the fields of data warehousing and database management systems. Withthe development of new hardware technologies, the B+-tree needs to be revisited to fully take advantage of hardware resources. In this paper, we focus on optimization techniques to increase the searching performance of B+-trees on the coupled CPU-GPU architecture. First, we propose a hierarchical searching approach on the single coupled GPU to efficiently deal with leaf nodes of B+-trees. It adopts a flexible strategy to determine the number of work items in a work group to search one key in order to reduce irregular memory accesses and divergent branches in the work group. Second, we present a co-processing pipeline method on the coupled architecture. the CPU and the integrated GPU process the sorting and searching tasks simultaneously to hide sorting and partial searching latencies. A distribution model is designed to support the workload balance strategy based on real-time performance. Our performance study shows that the hierarchical searching scheme provides an improvement up to 36% on the GPU compared to the baseline algorithm with fixed number of work items and the co-processing pipeline method further increases the throughput by a factor of 1.8. To the best of our knowledge, this paper is the first study to consider boththe CPU and the coupled GPU to optimize B+-trees searches.
the paper is devoted by investigation of generic complexity of the algorithmic problem of solving of systems of equations in finite groups, finite semigroups and finite fields. We show that if this problem is intracta...
详细信息
parallel column models were first proposed in the 1950s for investigating the sensitivity of the HETP in a packed column to maldistribution. Although it is possible to develop a parallel column model using a process s...
详细信息
parallel column models were first proposed in the 1950s for investigating the sensitivity of the HETP in a packed column to maldistribution. Although it is possible to develop a parallel column model using a process simulator, it may take considerable effort to arrange and specify the columns. Recently, we have described what we call a parallel Column Model (PCM) withthe aim of making it easier to model Dividing Wall Columns (DWCs), we realized that it should also be possible to use the PCM to model maldistribution in packed columns. Both equilibrium stage and rate-based column models can easily be used within this framework. In this paper we review the literature on simulating packed columns with maldistribution. We also show how easily our PCM may be used to describe maldis-tribution in packed columns and show how our results match the obtained results of earlier papers. We propose a simple bed effectiveness approximation that can use the results from a regular column simulation and assign stages to specific packed beds such that the resulting column suffers less from liquid maldistribution. We illustrate the use of this approximation with two practical examples involving the design of commercial scale columns.
this paper aims to study the efficiency of various seq2seq deep learning architectures for the solution of toxic speech classification and performing efficient sentiment analysis using unilingual publicly available da...
详细信息
With its strong floating-point operation capability and high memory bandwidth in data parallelism, the graphics processing unit (GPU) has been widely used in general-purpose computing. GPU-based computations have been...
详细信息
ISBN:
(纸本)9781665423144
With its strong floating-point operation capability and high memory bandwidth in data parallelism, the graphics processing unit (GPU) has been widely used in general-purpose computing. GPU-based computations have been extensively applied in the field of computational fluid dynamics (CFD). this paper aims to design an extremely efficient double-precision GPU-accelerated parallel algorithm for supersonic flow computations on hybrid grids. Compute unified device architecture (CUDA) is used as a general-purpose parallel computing platform and programming model to perform parallel computing codes on GPUs. the cell-centered finite volume method based on unstructured grids is used in the spatial discretization of governing equations, whereas the three-stage explicit Runge-Kutta scheme with second-order accuracy is used for temporal discretization. the turbulence is solved by using the K-omega SST two-equation model. three test cases are studied to validate the computational accuracy of the proposed algorithm. the numerical results agree well withthe experiment data, thereby suggesting that the GPU-accelerated parallel algorithm has good accuracy.
the recent prevalence of positioning sensors and mobile devices generates a massive amount of spatial-temporal data from moving objects in real-time. As one of the fundamental processes in data analysis, the clusterin...
详细信息
ISBN:
(纸本)9783030602451;9783030602444
the recent prevalence of positioning sensors and mobile devices generates a massive amount of spatial-temporal data from moving objects in real-time. As one of the fundamental processes in data analysis, the clustering on spatial-temporal data creates various applications, like event detection and travel pattern extraction. However, most of the existing works only focus on the offline scenario, which is not applicable to online time-sensitive applications due to their low efficiency and ignorance of temporal features. In this paper, we propose a distributed streaming framework for spatial-temporal data clustering, which accepts various clustering algorithms while ensuring low resource consumption and result correctness. the framework includes a dynamic partitioning strategy for continuous load-balancing and a cluster-merging algorithm based on convex hulls [10], which guarantees the result correctness. Extensive experiments on real dataset prove the effectiveness of our proposed framework and its advantage over existing solutions.
Here, we describe a method for handling large graphs with data sizes exceeding memory capacity using minimal hardware resources. this method (called Pimiento) is a vertex-centric graph-processing framework on a single...
详细信息
ISBN:
(纸本)9783030389611;9783030389604
Here, we describe a method for handling large graphs with data sizes exceeding memory capacity using minimal hardware resources. this method (called Pimiento) is a vertex-centric graph-processing framework on a single machine and represents a semi-external graph-computing system, where all vertices are stored in memory, and all edges are stored externally in compressed sparse row data-storage format. Pimiento uses a multi-core CPU, memory, and multi-threaded data preprocessing to optimize disk I/O in order to reduce random-access overhead in the graph-algorithm implementation process. An on-the-fly update-accumulated mechanism was designed to reduce the time that the graph algorithm accesses disks during execution. Our experiments compared external this method with other graph-processing systems, including GraphChi, X-Stream, and FlashGraph, revealing that Pimiento achieved 7.5x, 4x, 1.6x better performance on large real-world graphs and synthetic graphs in the same experimental environment.
Manycore architectures are mainly composed of a very large amount of computing nodes interconnected with a multiplicity of links usually forming a NoC-like mesh architecture. High-speed links permit to obtain a higher...
详细信息
ISBN:
(纸本)9781728160443
Manycore architectures are mainly composed of a very large amount of computing nodes interconnected with a multiplicity of links usually forming a NoC-like mesh architecture. High-speed links permit to obtain a higher throughput but are much more expensive than normal links, making the interconnection of the system a cost/performance trade-off. Simulating such architectures is very important in order to characterise the optimal network topology for a given problem. In this work we introduce SCALPsim: a simulation framework permitting to evaluate routing algorithms and network properties in 1-D, 2-D and 3-D regular mesh topologies simultaneously using links of different characteristics in terms of latency and throughput. these features are particularly interesting in large scale systems withprocessing elements grouped into clusters, where communication properties differ largely inside and between clusters. this paper presents the framework and an application based on Cellular Self-Organizing Maps - CSOM.
暂无评论