the data in scientific and engineering computations is usually gifted physical meaning in the presence of nonnegativity. For high performance and effectiveness, it is necessary and beneficial to consider the prior non...
详细信息
ISBN:
(纸本)9781538637906
the data in scientific and engineering computations is usually gifted physical meaning in the presence of nonnegativity. For high performance and effectiveness, it is necessary and beneficial to consider the prior nonnegativity in processingthe nonnegative information. In this paper, we propose a Projective Hard thresholding Pursuit (PHTP) method for the nonnegative sparse signal recovery. It reconstructs sparse signal by combining the nonnegative projection with HTP. theoretically, we prove that the proposed algorithm can find all nonnegative s-sparse signals provided the sensing matrix has suitable restricted isometry property. Moreover, We extend this result to its fast version. Besides, we verify PHTP's convergence withthe measurement error being nonzero. Numerical experiments demonstrate that PHTP outperforms HTP and Nonnegative Least Squares(NNLS) for nonnegative sparse recovery and denoising.
In this paper, we study parallel data access on distributed file systems, e.g, the Hadoop file system. Our experiments show that parallel data read requests are often served data remotely and in an imbalanced fashion....
详细信息
ISBN:
(纸本)9781479986484
In this paper, we study parallel data access on distributed file systems, e.g, the Hadoop file system. Our experiments show that parallel data read requests are often served data remotely and in an imbalanced fashion. this results in a serious disk access and data transfer contention on certain cluster/storage nodes. We conduct a complete analysis on how remote and imbalanced read patterns occur and how they are affected by the size of the cluster. We then propose a novel method to Optimize parallel Data Access on distributed File Systems referred to as Opass. the goal of Opass is to reduce remote parallel data accesses and achieve a higher balance of data read requests between cluster nodes. To achieve this goal, we represent the data read requests that are issued by parallelapplications to cluster nodes as a graph data structure where edges weights encode the demands of data locality and load capacity. then we propose new matching-based algorithms to match processes to data based on the configurations of the graph data structure so as to compute the maximum degree of data locality and balanced access. Our proposed method can benefit parallel data-intensive analysis with various parallel data access strategies. Experiments are conducted on PRObEs Marmot 128-node cluster testbed and the results from both benchmark and well-known parallelapplications show the performance benefits and scalability of Opass.
作者:
Wang, YanWang, XinFudan Univ
Sch Comp Sci Shanghai Key Lab Intelligent Informat Proc Shanghai 200433 Peoples R China
distributed storage systems (DSS) play an important role in data storage applications, since they provide high reliability for huge data storage requirement. As node failures are frequent in a large distributed storag...
详细信息
ISBN:
(纸本)9780769546766
distributed storage systems (DSS) play an important role in data storage applications, since they provide high reliability for huge data storage requirement. As node failures are frequent in a large distributed storage system, the performance of repairing node failure causes many researchers' interests. In this paper, we propose a distributed storage code to minimize the coding complexity during the repairing process, at a cost of inducing larger redundancy. Our code construction is based on regular graphs and exploits simple look-up repair. We analyze the performance of the proposed code, and compare them with existing distributed storage codes. Analytical results show that the proposed code outperforms the others in terms of low repair complexity and disk I/O overhead.
High performance computing applications, runtimes, and platforms are becoming more configurable to enable applications to obtain better performance. As a result, users are increasingly presented with a multitude of op...
详细信息
ISBN:
(数字)9781728168760
ISBN:
(纸本)9781728168760
High performance computing applications, runtimes, and platforms are becoming more configurable to enable applications to obtain better performance. As a result, users are increasingly presented with a multitude of options to configure application-specific as well as platform-level parameters. the combined effect of different parameter choices on application performance is difficult to predict, and an exhaustive evaluation of this combinatorial parameter space is practically infeasible. One approach to parameter selection is a user-guided exploration of a part of the space. However, such an ad hoc exploration of the parameter space can result in suboptimal choices. therefore, an automatic approach that can efficiently explore the parameter space is needed. In this paper, we propose HiPerBOt, a Bayesian optimization based configuration selection framework to identify application and platform-level parameters that result in high performing configurations. We demonstrate the effectiveness of HiPerBOt in tuning parameters that include compiler flags, runtime settings, and application-level options for several parallel codes, including, Kripke, Hypre, LULESH, and OpenAtom.
In the era of smart cities huge data volumes are continuously generated and collected, thus prompting the need for efficient and distributed data mining approaches. Generalized itemset mining is an established data mi...
详细信息
ISBN:
(纸本)9781479942930
In the era of smart cities huge data volumes are continuously generated and collected, thus prompting the need for efficient and distributed data mining approaches. Generalized itemset mining is an established data mining technique, which entails the discovery of multiple-level patterns hidden in the analyzed data by exploiting analyst-provided taxonomies. Among the generalized itemsets, the most peculiar high-level patterns are those with many contrasting correlations among items at different abstraction levels. they represent misleading situations that are worth analyzing separately by experts during manual inspection. this paper proposes a novel cloud-based service, named MGI-CLOUD, to efficiently mine misleading multiple-level patterns, i.e., theMisleading Generalized Itemsets, on a distributed computing environment. MGI-CLOUD consists of a set of distributed MapReduce jobs running in the cloud. As a case study, the system has been contextualized in a real-life scenario, i.e., the analysis of traffic law infractions committed in a smart city environment. the experiments, performed on real datasets, demonstrate the efficiency and effectiveness of MGI-CLOUD.
Although parallel systems with high peak performance have been exciting, high peak performance often means high power consumption. In this paper, power-aware parallel systems are investigated, where each node can make...
详细信息
ISBN:
(纸本)9783540680673
Although parallel systems with high peak performance have been exciting, high peak performance often means high power consumption. In this paper, power-aware parallel systems are investigated, where each node can make 19 dynamic voltage scaling (DVS). Based on the characteristics of communication and memory access in MPI programs, a compiler is used to automatically form communication and computation regions, and to optimally assign frequency and voltage to the regions. Frequency and voltage of each node are dynamically adjusted, and energy consumption is minimized within the limit of performance loss. the results from simulations and experiments show that compiler-directed energy-time tradeoff can save 20-40% energy consumption with less than 5% performance loss.
OLAP (online analytical processing) applications are based on a variety of aggregate queries on large-scale data. As aggregation is always performed on columns, traditional row-oriented storage, in which all the colum...
详细信息
ISBN:
(纸本)9780769546766
OLAP (online analytical processing) applications are based on a variety of aggregate queries on large-scale data. As aggregation is always performed on columns, traditional row-oriented storage, in which all the columns of a data row are stored together, has seriously restricted its performance. this paper proposes a dimension-oriented storage model based on HBase, and a new parallel aggregation technique, which accomplishes aggregation operations withparallel MapReduce jobs. Finally, compared with Hive on standard TPC-H data set, our technique is demonstrated to improve performance of core aggregate operations significantly.
the Yin-He global spectral model (YHGSM), embodies a parallel semi-Lagrangian solver and has two schemes implemented: maximum wind speed scheme and on-demand communication scheme. Maximum wind speed communication adop...
详细信息
ISBN:
(纸本)9781665435741
the Yin-He global spectral model (YHGSM), embodies a parallel semi-Lagrangian solver and has two schemes implemented: maximum wind speed scheme and on-demand communication scheme. Maximum wind speed communication adopts a single and fixed data structure, which has a large communication overhead. Although the overhead of on-demand communication is reduced, it is still pretty huge. In this paper, a novel adaptable approach is proposed in which a monthly maximum wind speed is used in the YHGSM. this approach reduces the difference between the actual wind speed and the maximum wind speed used in the model;in turn, the communication overhead in the trajectory computation is further reduced. Experiments show that in the maximum wind speed scheme and on-demand schemes, the communication overheads withthe adaptive maximum wind speed are significantly reduced. In addition, in a ten-day forecast withthe on-demand communication scheme, the total overhead for the semi-Lagrangian computing and the total parallel execution time are also both reduced, and the reduction ratio increases as the number of nodes increases.
Overlapping computations and communication is a key to accelerating stencil applications on parallel computers, especially for GPU clusters. However, such programming is a time-consuming part of the stencil applicatio...
详细信息
ISBN:
(纸本)9780769546759
Overlapping computations and communication is a key to accelerating stencil applications on parallel computers, especially for GPU clusters. However, such programming is a time-consuming part of the stencil application development. To address this problem, we developed an automatic code generation tool to produce a parallel stencil application with latency hiding automatically from its dataflow model. Withthis tool, users visually construct the workflows of stencil applications in a dataflow programming model. Our dataflow compiler determines a data decomposition policy for each application, and generates source code that overlaps the stencil computations and communication (MPI and PCIe). We demonstrate two types of overlapping models, a CPU-GPU hybrid execution model and a GPU-only model. We use a CFD benchmark computing 19-point 3D stencils to evaluate our scheduling performance, which results in 1.45 TFLOPS in single precision on a cluster with 64 Tesla C1060 GPUs.
the demand of network packet processing is increasing as applications demand more and more bandwidth and computing capability. In this paper, the characteristics of packet processing and multi-core systems are analyze...
详细信息
ISBN:
(纸本)9780769544151
the demand of network packet processing is increasing as applications demand more and more bandwidth and computing capability. In this paper, the characteristics of packet processing and multi-core systems are analyzed. In order to analyze the differences of serial packet processing and parallel packet processing, the packet processing model is proposed. In the end, a DPI experiment on a multi-core system has been carried out to verify the analysis. the results show the proportion of the parallel part and the load balancing of computing resources affect performance.
暂无评论