the issues of semantic information processing are considered. the central idea of information in the modern world is still an elusive concept. It is known that information must be quantified, at least in terms of part...
详细信息
High Performance Computing (HPC) demand is on the rise, particularly for large distributed computing. HPC systems have, by design, very heterogeneous architectures, both in computation and in communication bandwidth, ...
详细信息
ISBN:
(纸本)9781450362955
High Performance Computing (HPC) demand is on the rise, particularly for large distributed computing. HPC systems have, by design, very heterogeneous architectures, both in computation and in communication bandwidth, resulting in wide variations in the cost of communications between compute units. If large distributed applications are to take full advantage of HPC, the physical communication capabilities must be taken into consideration when allocating workload. Hypergraphs are good at modelling total volume of communication in parallel and distributed applications. To the best of our knowledge, there are no hypergraph partitioning algorithms to date that are architecture-aware. We propose a novel restreaming hypergraph partitioning algorithm (HyperPRAW) that takes advantage of peer to peer physical bandwidth profiling data to improve distributed applications performance in HPC systems. Our results show that not only the quality of the partitions achieved by our algorithm is comparable with state-of-the-art multilevel partitioning, but that the runtime performance in a synthetic benchmark is significantly reduced in 10 hypergraph models tested, with speedup factors of up to 14x.
In this paper, we propose a distributed, unordered, label-correcting distance-1 Grundy (vertex) coloring algorithm, namely, Distributed Control (DC) coloring algorithm. Our algorithm eliminates the need for vertex-cen...
详细信息
ISBN:
(纸本)9781728136134
In this paper, we propose a distributed, unordered, label-correcting distance-1 Grundy (vertex) coloring algorithm, namely, Distributed Control (DC) coloring algorithm. Our algorithm eliminates the need for vertex-centric barriers and global synchronization for color refinement, relying only on atomic operations and local termination detection to update vertex color. DC proceeds optimistically, correcting the colors asynchronously as the algorithm progresses and depends on local ordering of tasks to minimize the execution of sub-optimal work. We implement our DC coloring algorithm and the well-known Jones-Plassmann algorithm and compare their performance with 4 different types of standard RMAT graphs and real-world graphs. We show that the elimination of waiting time of global and vertex-centric barriers and investing this time for local ordering leads to improved scaling for graphs with prominent power-law characteristics and densely interconnected local subgraphs.
the current approach to marking attendance in colleges is tedious and time consuming. I propose AttenFace, a standalone system to analyze, track and grant attendance in real time using face recognition. Using snapshot...
详细信息
ISBN:
(数字)9781665473125
ISBN:
(纸本)9781665473132
the current approach to marking attendance in colleges is tedious and time consuming. I propose AttenFace, a standalone system to analyze, track and grant attendance in real time using face recognition. Using snapshots of class from live camera feed, the system identifies students and marks them as present in a class based on their presence in multiple snapshots taken throughout the class duration. Face recognition for each class is performed independently and in parallel, ensuring that the system scales with number of concurrent classes. Further, the separation of the face recognition server from the back-end server for attendance calculation allows the face recognition module to be integrated with existing attendance tracking software like Moodle. the face recognition algorithm runs at 10 minute intervals on classroom snapshots, significantly reducing computation compared to direct processing of live camera feed. this method also provides students the flexibility to leave class for a short duration (such as for a phone call) without losing attendance for that class. Attendance is granted to a student if he remains in class for a number of snapshots above a certain threshold. the system is fully automatic and requires no professor intervention or any form of manual attendance or even camera set-up, since the back-end directly interfaces with in-class cameras. AttenFace is a first-of-its-kind one-stop solution for face-recognition-enabled attendance in educational institutions that prevents proxy, handling all aspects from students checking attendance to professors deciding their own attendance policy, to college administration enforcing default attendance rules.
Many modern sequence alignment tools implement fast string matching using the space efficient data structure called a FM-index. the succinct nature of this data structure presents unique challenges for the algorithm d...
详细信息
ISBN:
(纸本)9789897583537
Many modern sequence alignment tools implement fast string matching using the space efficient data structure called a FM-index. the succinct nature of this data structure presents unique challenges for the algorithm designers. In this paper, we explore the opportunities for parallelization of the exact and inexact matches, and present an efficient solution for the Occ portion of the algorithm that utilizes the instruction-level parallelism of the modern CPUs. Our implementation computes all eight Occ values required for the inexact match algorithm step in a single pass. We showcase the algorithm performance in a multi-core genome aligner and discuss effects of the memory prefetch.
Most existing optimization methods for neural architecture search (NAS), including evolutionary algorithms, reinforcement learning and gradient-based approaches, have not employed memory strategies explicitly, which m...
详细信息
Most existing optimization methods for neural architecture search (NAS), including evolutionary algorithms, reinforcement learning and gradient-based approaches, have not employed memory strategies explicitly, which may lack of efficiency when searching neural architectures. To solve this issue, we propose a new NAS approach by using an evolutionary algorithm which employs a tabu mechanism to help to improve the search efficiency. To be more specific, the individuals of parent population are selected by tournament selection and tabu list. the tournament selection select parent population according to the accuracy of each individual. And the tabu mechanism builds a tabu list to record the chosen operations in the last previous search process, which employs a search memory mechanism to improve the efficiency explicitly. To confirm the superior performance of our approach, a well-designed surrogate model is used to accelerate the process of performance evaluation on CIFAR-10. the comprehensive experimental results show that the proposed method can reach to 2.48% error rate with about 2 GPU days, which demonstrates the superiority of the suggested method.
IoT, being a field of great interest and importance for the coming generations, involves certain challenging and improving aspects for the IoT application developers and researchers to work upon. A wireless sensor mes...
详细信息
Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally ineff...
ISBN:
(纸本)9781713845393
Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures – DEEPNETS-1M– and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. the proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60% accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50%. Our task along withthe model and results can potentially lead to a new, more computationally efficient paradigm of training networks. Our model also learns a strong representation of neural architectures enabling their analysis.
Triangle counting is a fundamental graph analytic operation that is used extensively in network science and graph mining. As the size of the graphs that needs to be analyzed continues to grow, there is a requirement i...
详细信息
ISBN:
(纸本)9781450362955
Triangle counting is a fundamental graph analytic operation that is used extensively in network science and graph mining. As the size of the graphs that needs to be analyzed continues to grow, there is a requirement in developing scalable algorithms for distributed-memory parallel systems. To this end, we present a distributedmemory triangle counting algorithm, which uses a 2D cyclic decomposition to balance the computations and reduce the communication overheads. the algorithm structures its communication and computational steps such that it reduces its memory overhead and includes key optimizations that leverage the sparsity of the graph and the way the computations are structured. Experiments on synthetic and real-world graphs show that our algorithm obtains an average relative speedup range between 3.24 to 7.22 out of 10.56 across the datasets using 169 MPI ranks over the performance achieved by 16 MPI ranks. Moreover, we obtain an average speedup of 10.2 times on comparison with previously developed distributed-memory parallelalgorithms.
In recent years, there has been an increasing interest in utilising Differential Power processing converters (DPP) in Photovoltaic (PV) applications to achieve the maximum power point tracking (MPPT), minimum losses a...
详细信息
In recent years, there has been an increasing interest in utilising Differential Power processing converters (DPP) in Photovoltaic (PV) applications to achieve the maximum power point tracking (MPPT), minimum losses and high efficiency under unequal lighting conditions. this paper presents a novel Series and parallel (SP) DPP converters scheme, with a proper control technique to optimise the system output power under mismatch conditions compared to that of a conventional 2×2 SP array which is protected with bypass diodes. the simulation results of such system show significant improvements in the total power of the SP-DPP system under PSCs.
暂无评论