Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utili...
详细信息
Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on the GPU leveraging data parallelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the microbatch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the microbatches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency-aware adaptive microbatching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel-Ziv-Storer-Szymanski compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.
This paper aimed to implement both sequential and parallel implementations using CUDA on matrix multiplication to see the differences and effects of it, followed by an analysis of the result. We used the algorithm as ...
详细信息
Nowadays, parallel applications are used every day in high performance computing, scientific computing and also in everyday tasks due to the pervasiveness of multi-core architectures. However, several implementation c...
详细信息
The sustainability of service and manufacturing operations rely heavily on the availability of equipment and assets. High availability of assets can be achieved with effective maintenance strategies. In this direction...
详细信息
The sustainability of service and manufacturing operations rely heavily on the availability of equipment and assets. High availability of assets can be achieved with effective maintenance strategies. In this direction, we study a multi-skilled workforce planning problem to establish a resilient maintenance service network for high-value assets. We improve the efficiency of the maintenance network by optimising the workforce capacity in repair shops and achieving workforce heterogeneity by cross-training. As a solution strategy, we develop a two-stage iterative heuristic algorithm. At the first stage, the set of all feasible cross-training policies is effectively and systematically searched via a state-of-art multi-thread simulated annealing (MTSA) metaheuristic to find a policy(ies) that achieves the minimum cost. Further, the developed MTSA algorithm is enhanced with the multi-neighbourhood feature to escape from local optimality and implemented via parallel programming techniques. In the second stage, workforce capacity and spare parts inventory levels are optimised for the cross-training policy found at the first stage by a queuing approximation and a greedy heuristic. The MTSA obtains the lowest cost in 91 cases out of 128 compared to genetic algorithm (GA), variable neighbourhood search (VNS), an improved single-thread simulated annealing algorithm (SA) and integer programming-based clustering (IPBC) algorithms.
parallel computers are everywhere. Over the last few years, a change of paradigm occurred in the computer industry. Mainly due to power dissipation constraints and memory access time limitations, rather than increasin...
详细信息
With the widespread use of the multicore systems having smaller transistor sizes, soft errors become an important issue for parallel program execution. Fault injection is a prevalent method to quantify the soft error ...
详细信息
With the widespread use of the multicore systems having smaller transistor sizes, soft errors become an important issue for parallel program execution. Fault injection is a prevalent method to quantify the soft error rates of the applications. However, it is very time consuming to perform detailed fault injection experiments. Therefore, prediction-based techniques have been proposed to evaluate the soft error vulnerability in a faster way. In this work, we present a soft error vulnerability prediction approach for parallel applications using machine learning algorithms. We define a set of features including thread communication, data sharing, parallel programming, and performance characteristics;and train our models based on three ML algorithms. This study uses the parallel programming features, as well as the combination of all features for the first time in vulnerability prediction of parallel programs. We propose two models for the soft error vulnerability prediction: (1) A regression model with rigorous feature selection analysis that estimates correct execution rates, (2) A novel classification model that predicts the vulnerability level of the target programs. We get maximum prediction accuracy rate of 73.2% for the regression-based model, and achieve 89% F-score for our classification model.
The consultative committee for space data system (CCSDS)-123 is a standard for lossless compression of multispectral and hyperspectral images with applications in on-board power-constrained systems, such as satellites...
详细信息
The consultative committee for space data system (CCSDS)-123 is a standard for lossless compression of multispectral and hyperspectral images with applications in on-board power-constrained systems, such as satellites and military drones. This letter explores the low-power heterogeneous architecture of the Nvidia Jetson TX2 by proposing a parallel solution to the CCSDS-123 compressor on embedded systems, reducing development effort compared with the production of dedicated circuits, while maintaining low energy consumption. This solution parallelizes the predictor on a low-power graphics processing unit (GPU) while the encoders exploit the heterogeneous multiple cores of the CPUs and GPU concurrently. We report more than 16.6 Gb/s for the predictor and 1.4-Gb/s for the whole system, requiring less than 6.3 W and providing an efficiency of 245.6 Mb/s/W.
The RSA algorithm is an asymmetric encryption algorithm used to ensure the confidentiality and integrity of data as it travels across networks. Security has grown in importance over time, resulting into more data requ...
详细信息
In this paper was described the GR1 algorithm that provides feasible execution times for the subgraph isomorphism problem. It is a parallel algorithm that uses a variant of the producer–consumer pattern. It was desig...
详细信息
Since the first release in 2015, OpenTimer v1 has been used in many industrial and academic projects for analyzing the timing of custom designs. After four-year research and developments, we have announced OpenTimer v...
详细信息
Since the first release in 2015, OpenTimer v1 has been used in many industrial and academic projects for analyzing the timing of custom designs. After four-year research and developments, we have announced OpenTimer v2-a major release that efficiently supports: 1) a new task-based parallel incremental timing analysis engine to break through the performance bottleneck of existing loop-based methods;2) a new application programming interface (API) concept to exploit high degrees of parallelisms;and 3) an enhanced support for industry-standard design formats to improve user experience. Compared with OpenTimer v1, we rearchitect v2 with a modern C++ programming language and advanced parallel computing techniques to largely improve the tool performance and usability. For a particular example, OpenTimer v2 achieved up to 5.33x speedup over v1 in incremental timing, and scaled higher with increasing cores. Our contributions include both technical innovations and engineering knowledge that are open and accessible to promote timing research in the community.
暂无评论