Many of today's computing and communication models are distributed systems that are composed of autonomous computational entities that communicate with each other, usually by passing messages. Distributed systems ...
详细信息
Many of today's computing and communication models are distributed systems that are composed of autonomous computational entities that communicate with each other, usually by passing messages. Distributed systems encompass a variety of applications and wireless sensor networks (WSN) is an important application of it. The tiny, multiple functionality and low power sensor nodes are considered to be interconnected in the WSN for efficient process of aggregating and transmitting the data to the base station. The clustering-based schemes of sensor networks are capable of organizing the network through the utilization of a specifically designated node termed as the cluster head for the objective of energy conservation and data aggregation. Further, the cluster head is responsible for conveying potential information collected by the cluster member nodes and aggregate them before transmitting it to the base station. In this paper, a Reliable Cluster Head Selection Technique using Integrated Energy and Trust-based Semi-Markov Prediction (RCHST-IETSMP) is proposed with the view to extend the lifetime of sensor networks. This proposed RCHST-IETSMP incorporated two significant parameters associated with energy and trust for effective selection of cluster head facilitated through the merits of Semi-Markoc prediction integrated with the Hyper Erlang distribution process. The simulation results of the proposed RCHST-IETSMP scheme is proving to be efficient in upholding the residual energy of the network and the throughput to a maximum level of 23% and 19% predominant to the trust and energy-based clustering schemes considered for investigation.
Editor's notes: Controller synthesis using formal specifications has shown considerable promise in recent years. However, it is computationally very expensive. This article shows how cloud computing can come to th...
详细信息
Editor's notes: Controller synthesis using formal specifications has shown considerable promise in recent years. However, it is computationally very expensive. This article shows how cloud computing can come to the rescue. -Samarjit Chakraborty, University of North Carolina at Chapel Hill
In the TrAdaBoost method that based on sample transfer learning, weak classifier is the kernel of its design. Aiming at the problem of high time cost for the serial algorithm of weak classifier, a parallel algorithm o...
详细信息
With continuing advances in high-performance parallel computing platforms, parallel algorithms have become powerful tools for development of faster than real-time power system dynamic simulations. In particular, it ha...
详细信息
With continuing advances in high-performance parallel computing platforms, parallel algorithms have become powerful tools for development of faster than real-time power system dynamic simulations. In particular, it has been demonstrated in recent years that parallel-in-time (Parareal) algorithms have the potential to achieve such an ambitious goal. The selection of a fast and reasonably accurate coarse operator of the Parareal algorithm is crucial for its effective utilization and performance. This paper examines semi-analytical solution (SAS) methods as the coarse operators of the Parareal algorithm and explores performance of the SAS methods to the standard numerical time integration methods. Two promising time-power series-based SAS methods were considered;Adomian decomposition method and Homotopy analysis method with a windowing approach for improving the convergence. Numerical performance case studies on 10-generator 39-bus system and 327-generator 2383-bus system were performed for these coarse operators over different disturbances, evaluating the number of Parareal iterations, computational time, and stability of convergence. All the coarse operators tested with different scenarios have converged to the same corresponding true solution (if they are convergent) and the SAS methods provide comparable computational speed, while having more stable convergence to the true solution in many cases.
We present a multi-GPU design, implementation and performance evaluation of the Halevi-Polyakov-Shoup (HPS) variant of the Fan-Vercauteren (FV) levelled Fully Homomorphic Encryption (FHE) scheme. Our design follows a ...
详细信息
We present a multi-GPU design, implementation and performance evaluation of the Halevi-Polyakov-Shoup (HPS) variant of the Fan-Vercauteren (FV) levelled Fully Homomorphic Encryption (FHE) scheme. Our design follows a data parallelism approach and uses partitioning methods to distribute the workload in FV primitives evenly across available GPUs. The design is put to address space and runtime requirements of FHE computations. It is also suitable for distributed-memory architectures, and includes efficient GPU-to-GPU data exchange protocols. Moreover, it is user-friendly as user intervention is not required for task decomposition, scheduling or load balancing. We implement and evaluate the performance of our design on two homogeneous and heterogeneous NVIDIA GPU clusters: K80, and a customized P100. We also provide a comparison with a recent shared-memory-based multi-core CPU implementation using two homomorphic circuits as workloads: vector addition and multiplication. Moreover, we use our multi-GPU Levelled-FHE to implement the inference circuit of two Convolutional Neural Networks (CNNs) to perform homomorphically image classification on encrypted images from the MNIST and CIFAR - 10 datasets. Our implementation provides 1 to 3 orders of magnitude speedup compared with the CPU implementation on vector operations. In terms of scalability, our design shows reasonable scalability curves when the GPUs are fully connected.
Sparse Triangular Solve (SpTRSV) is an important and extensively used kernel in scientific computing. parallelism within SpTRSV depends upon matrix sparsity pattern and, in many cases, is non-uniform from one computat...
详细信息
Sparse Triangular Solve (SpTRSV) is an important and extensively used kernel in scientific computing. parallelism within SpTRSV depends upon matrix sparsity pattern and, in many cases, is non-uniform from one computational step to the next. In cases where the SpTRSV computational steps have contrasting parallelism characteristics- some steps are more parallel, others more sequential in nature, the performance of an SpTRSV algorithm may be limited by the contrasting parallelism characteristics. In this work, we propose a split-execution model for SpTRSV to automatically divide SpTRSV computation into two sub-SpTRSV systems and an SpMV, such that one of the sub-SpTRSVs has more parallelism than the other. Each sub-SpTRSV is then computed using different SpTRSV algorithms, which are possibly executed on different platforms (CPU or GPU). By analyzing the SpTRSV Directed Acyclic Graph (DAG) and matrix sparsity features, we use a heuristics-based approach to (i) automatically determine the suitability of an SpTRSV for split-execution, (ii) find the appropriate split-point, and (iii) execute SpTRSV in a split fashion using two SpTRSV algorithms while managing any required inter-platform communication. Experimental evaluation of the execution model on two CPU-GPU machines with a matrix dataset of 327 matrices from the SuiteSparse Matrix Collection shows that our approach correctly selects the fastest SpTRSV method (split or unsplit) for 88 percent of matrices on the Intel Xeon Gold (6148) + NVIDIA Tesla V100 and 83 percent on the Intel Core I7 + NVIDIA G1080 Ti platform achieving speedups up to 10x and 6.36x respectively.
Multiprocessor task scheduling problem is a pressing problem that affects systems' performance and is still being investigated by the researchers. Finding the optimal schedules is considered to be a computationall...
详细信息
Multiprocessor task scheduling problem is a pressing problem that affects systems' performance and is still being investigated by the researchers. Finding the optimal schedules is considered to be a computationally hard problem. Recently, researchers have used fuzzy logic in the field of task scheduling to achieve optimal performance, but this area of research is still not well investigated. In addition, there are various scheduling algorithms that used fuzzy logic but most of them are often performed on uniprocessor systems. This article presents a new proposed algorithm in which the priorities of the tasks are derived from the fuzzy logic and bottom level parameter. This approach is designed to find task schedules with optimal or sub-optimal lengths in order to achieve high performance for a multiprocessor environment. With respect to the proposed algorithm, the precedence constraints between the non-preemptive tasks and their execution times are known and described by a directed acyclic graph. The number of processors is fixed, the communication costs are negligible and the processors are homogeneous. The suggested technique is tested and compared with the Prototype Standard Task Graph Set.
In task-parallel code, a determinancy race occurs when two logically parallel instructions access the same memory location in a conflicting way. A determinacy race tends to be a bug as it leads to non-deterministic pr...
详细信息
ISBN:
(纸本)9781450380706
In task-parallel code, a determinancy race occurs when two logically parallel instructions access the same memory location in a conflicting way. A determinacy race tends to be a bug as it leads to non-deterministic program behaviors. Researchers have studied algorithms for detecting determinacy races in task-parallel code, with most prior work focuses on computations with nice structural properties (e.g., fork-join or pipeline parallelism). For such computations, one can devise provably efficient algorithms with constant overhead, leading to a asymptotically optimal running time - - O(T_1 / P + T_∞) on P cores for a computation with T_1 work and T_∞ span. More recently, researchers have begun to address the problem of race detecting computations with less structural properties, such as ones that arise from the use of futures. Due to the lack of structural properties, the race detection algorithm incurs higher overhead. % Given a computation with work T_1 and span T_∞, the state-of-the-art parallel algorithm for race detecting programs with futures runs in time O((T_1 lgk + k^2) / P + T_∞(lg k + lgr lgk )) on P cores (Xu γ•l, 2020), where k is the total number of futures used, k is the maximum number of future operations per "future task,'' and r is the maximum number of readers between two consecutive writes to a given memory location. Interestingly, it has been shown that when one imposes certain restrictions on the use of futures, referred to as the structured futures, although the restrictions do not entirely eliminate arbitrary dependences among subcomputations, one can race detect such programs more efficiently than that for programs with general futures (i.e., no restrictions). The improved efficiency has only been demonstrated for a sequential algorithm (Utterback γ•l, 2019) that race detects while executing the computation sequentially, however. The algorithm requires sequential execution, because the correctness of the algorithm relies on updating the neces
With the rapid development of virtual reality technology and collision detection technology, virtual reality has become a lot of research fields in China. Collision detection is one of the most important technologies,...
详细信息
The obnoxious p-median problem consists of selecting p locations, considered facilities, in a way that the sum of the distances from each nonfacility location, called customers, to its nearest facility is maximized. T...
详细信息
The obnoxious p-median problem consists of selecting p locations, considered facilities, in a way that the sum of the distances from each nonfacility location, called customers, to its nearest facility is maximized. This is an NP-hard problem that can be formulated as an integer linear program. In this paper, we propose the application of a variable neighborhood search (VNS) method to effectively tackle this problem. First, we develop new and fast local search procedures to be integrated into the basic VNS methodology. Then, some parameters of the algorithm are tuned in order to improve its performance. The best VNS variant is parallelized and compared with the best previous methods, namely branch and cut, tabu search, and GRASP over a wide set of instances. Experimental results show that the proposed VNS outperforms previous methods in the state of the art. This fact is finally confirmed by conducting nonparametric statistical tests.
暂无评论