Recent advances in computing and sensor technologies have facilitated the emergence of increasingly sophisticated and complex cyber-physical systems and wireless sensor networks. Moreover, integration of cyber-physica...
详细信息
ISBN:
(纸本)9781509060580
Recent advances in computing and sensor technologies have facilitated the emergence of increasingly sophisticated and complex cyber-physical systems and wireless sensor networks. Moreover, integration of cyber-physical systems and wireless sensor networks with other contemporary technologies, such as unmanned aerial vehicles (i.e. drones) and fog computing, enables the creation of completely new smart solutions. By building upon the concept of a Smart Mobile Access Point (SMAP), which is a key element for a smart network, we propose a novel hierarchical placement strategy for SMAPs to improve scalability of SMAP based monitoring systems. SMAPs predict communication behavior based on information collected from the network, and select the best approach to support the network at any given time. In order to improve the network performance, they can autonomously change their positions. Therefore, placement of SMAPs has an important role in such systems. Initial placement of SMAPs is an NP problem. We solve it using a parallel implementation of the genetic algorithm with an efficient evaluation phase. The adopted hierarchical placement approach is scalable;it enables construction of arbitrarily large SMAP based systems.
Data-race-free (DRF) parallel programming becomes a standard as newly adopted memory models of mainstream programming languages such as C++ or Java impose data-race-freedom as a requirement. We propose compiler techni...
详细信息
ISBN:
(纸本)9781509049318
Data-race-free (DRF) parallel programming becomes a standard as newly adopted memory models of mainstream programming languages such as C++ or Java impose data-race-freedom as a requirement. We propose compiler techniques that automatically delineate extended data-race-free regions (xDRF), namely regions of code which provide the same guarantees as the synchronization-free regions (in the context of DRF codes). xDRF regions stretch across synchronization boundaries, function calls and loop back-edges and preserve the data-race-free semantics, thus increasing the optimization opportunities exposed to the compiler and to the underlying architecture. Our compiler techniques precisely analyze the threads' memory accessing behavior and data sharing in shared-memory, general-purpose parallel applications and can therefore infer the limits of xDRF code regions. We evaluate the potential of our technique by employing the xDRF region classification in a state-of-the-art, dual-mode cache coherence protocol. Larger xDRF regions reduce the coherence bookkeeping and enable optimizations for performance (6.8%) and energy efficiency (11.7%) compared to a standard directory-based coherence protocol.
In this paper, a parallel implementation of the cellular-automata interference algorithm for two waves using the fragmented programming technology and LuNA system based on it is proposed. The technology is based on a ...
详细信息
ISBN:
(纸本)9783319629322;9783319629315
In this paper, a parallel implementation of the cellular-automata interference algorithm for two waves using the fragmented programming technology and LuNA system based on it is proposed. The technology is based on a strategy of data flow control. Unlike existing systems and technologies, LuNA provides a unified technology for implementing parallel programs on a heterogeneous multicomputer. The LuNA program contains a description of data fragments, computational fragments, and information dependencies between them. In the work, the LuNA program was executed on a computational cluster with homogeneous nodes. The results of comparison of the LuNA and MPI implementations showed that the execution time of the LuNA program exceeded that of the MPI program. This is due to the peculiarities of algorithms used for the distribution, search and transfer of data and computation fragments between the nodes of a cluster. The complexity of writing the LuNA program is much lower than for the MPI program.
Global scale human simulations have application in diverse fields such as economics, anthropology and marketing. The sheer number of agents, however, makes them extremely sensitive to variations in algorithmic complex...
详细信息
ISBN:
(纸本)9783319589435;9783319589428
Global scale human simulations have application in diverse fields such as economics, anthropology and marketing. The sheer number of agents, however, makes them extremely sensitive to variations in algorithmic complexity resulting in potentially prohibitive computational resource costs. In this paper we show that the computational capability of modern servers has increased to the point where billions of individual agents can be modeled on moderate institutional resources and (in a few years) on high end consumer systems. We close with the proposition of future frameworks to enable collaborative modelling of the global human population.
Bayesian Network algorithms are widely applied in the fields of bioinformatics, document classification, big data, and marketing informatics. In this paper, several Bayesian Network algorithms are evaluated, including...
详细信息
ISBN:
(纸本)9781538619964
Bayesian Network algorithms are widely applied in the fields of bioinformatics, document classification, big data, and marketing informatics. In this paper, several Bayesian Network algorithms are evaluated, including Naive Bayes, Tree Augmented Naive Bayes, k-BAN, and k-BAN with Order Swapping. The algorithms are implemented using Scala and compared with the bnlearn library in R and Weka. Several datasets with varying numbers of attributes and instances are used to test the accuracy and efficiency of the implementations of the algorithms provided by the three packages. When handling huge datasets, issues involving accuracy, efficiency, and serial vs. parallel execution become more critical and should be addressed. We implemented several parallel algorithms as well as an efficient way to perform cross-validations, resulting in significant speedups.
This paper introduces aspect libraries, a unit of modularity in parallel programs with compositional properties. Aspects address the complexity of parallel programs by enabling the composition of (multiple) parallelis...
详细信息
ISBN:
(纸本)9783319619828;9783319619811
This paper introduces aspect libraries, a unit of modularity in parallel programs with compositional properties. Aspects address the complexity of parallel programs by enabling the composition of (multiple) parallelism modules with a given (sequential) base program. This paper illustrates the introduction of parallelism using reusable parallel libraries, coded in AspectJ. These libraries provide performance comparable to traditional parallel programming techniques and enable the composition of multiple parallelism modules (e.g., shared memory with distributed memory) with a given base program.
Because of the wide use of randomized scheduling in concurrency testing research, it is important to understand randomized scheduling and its limitations. This work analyzes how randomized scheduling discovers concurr...
详细信息
ISBN:
(纸本)9781538626849
Because of the wide use of randomized scheduling in concurrency testing research, it is important to understand randomized scheduling and its limitations. This work analyzes how randomized scheduling discovers concurrency bugs by focusing on the probabilities of the two possible orders of a pair of events. Analysis shows that the disparity between probabilities can be large for programs that encounter a large number of events during execution. Because sets of ordered event pairs define conditions for discovering concurrency bugs, this disparity can make some concurrency bugs highly unlikely. The complementary nature of the two possible orders also indicates a potential trade-off between the probability of discovering frequently-occurring and infrequently-occurring concurrency bugs. To help address this trade-off in a more balanced way, randomized-stride scheduling is proposed, where scheduling granularity for each thread is adjusted using a randomized stride calculated based on thread length. With some assumptions, strides can be calculated to allow covering the least likely event pair orders. Experiments confirm the analysis results and also suggest that randomized-stride scheduling is more effective for discovering concurrency bugs compared to the original randomized scheduling implementation, and compared to other algorithms in recent literature.
High Efficiency Video Coding is able to reduce the bit-rate up to 50% compared to H.264/AVC, using increasingly complex computational processes for motion estimation. In this paper, some motion estimation operations a...
详细信息
ISBN:
(数字)9783319522777
ISBN:
(纸本)9783319522777;9783319522760
High Efficiency Video Coding is able to reduce the bit-rate up to 50% compared to H.264/AVC, using increasingly complex computational processes for motion estimation. In this paper, some motion estimation operations are parallelised using Open Computing Language in a Graphics Processing Unit. The parallelisation strategy is three-fold: calculation of distortion measurement using 4 x 4 blocks, accumulation of distortion measure values for different block sizes and calculation of local minima. Moreover, we use 3D-arrays to store the distortion measure values and the motion vectors. Two 3D-arrays are used for transferring data from GPU to CPU to continue the encoding process. The proposed parallelisation is able to reduce the execution time, on average 52.5%, compared to the HEVC Test Model. Additionally, there is a negligible impact on the compression efficiency, as an increment in the BD-BR, on average 2.044%, and a reduction in the BD-PSNR, on average 0.062%.
We present ParCube, which finds the pairwise intersections in a set of millions of congruent cubes. This operation is required when computing boolean combinations of meshes or polyhedra in CAD/CAM and additive manufac...
详细信息
ISBN:
(纸本)9781450354943
We present ParCube, which finds the pairwise intersections in a set of millions of congruent cubes. This operation is required when computing boolean combinations of meshes or polyhedra in CAD/CAM and additive manufacturing, and in determining close points in a 3D set. ParCube is very compact because it is uses a uniform grid with a functional programming API. ParCube is very fast;even single threaded it usually beats CGAL's elapsed time, sometimes by a factor of 3. Also because it is FP, PARCUBE parallelizes very well. On an Nvidia GPU, processing 10M cubes to find 6M intersections, it took 0.33 elapsed seconds, beating CGAL by a factor of 131. PARCUBE is independent of the specific parallel architecture, whether shared memory multicore Intel Xeon using either OpenMP or TBB, or Nvidia GPUs with thousands of cores. We expect the principles used in PARCUBE to apply to other computational geometry problems. Efficiently finding all bipartite intersections would be an easy extension.
Convolution is the most important and fundamental concept in multimedia processing. For example, for digital image processing 2D convolution is used for different filtering operations. It has many mathematical operati...
详细信息
ISBN:
(纸本)9781538625859
Convolution is the most important and fundamental concept in multimedia processing. For example, for digital image processing 2D convolution is used for different filtering operations. It has many mathematical operations and is performed on all image pixels. Therefore, it is almost a compute-intensive kernel. In order to improve its performance in this paper, we apply two approaches to vectorize it, broadcasting of coefficients and repetition of coefficients using Intrinsic programming Model (IPM) and AVX technology. Our experimental results on an Intel Skylake microarchitecture show that the performance of broadcasting of coefficients is much higher than repetition of coefficients for different filter sizes and different image sizes. In addition, in order to evaluate the performance of Compiler Automatic Vectorization (CAV), and OpenCV library for this kernel, we use GCC and LLVM compilers. Our experimental results show that the performance of both IPM implementations are faster than GCC's and LLVM auto-vectorizations.
暂无评论