As the number of cores grow in HPC systems, so does the effect of system noise on applications running on these systems. With the knowledge that future large-scale parallel computer systems, including exascale systems...
详细信息
ISBN:
(纸本)9781509036820
As the number of cores grow in HPC systems, so does the effect of system noise on applications running on these systems. With the knowledge that future large-scale parallel computer systems, including exascale systems, will operate under an overall power bound, we claim to have found a solution that can counter the effects of noise. We present two methods that estimate the effects of noise on an application and then optimally redistributes power among nodes, such that the effects of noise are "hidden".
In this paper, we propose an accurate contactless pose estimation method for the pose of a target in the earth coordinate frame using a mobile phone equipped with a camera and an inertial measurement unit (IMU). Under...
详细信息
ISBN:
(纸本)9781538637906
In this paper, we propose an accurate contactless pose estimation method for the pose of a target in the earth coordinate frame using a mobile phone equipped with a camera and an inertial measurement unit (IMU). Under the Manhattan world assumption, we first estimate the relative pose between the target and the camera using an existing line-based pose estimation method and then seek the target's pose in the earth frame by combining the visual pose and the IMU orientation. To improve the overall estimation accuracy, we propose a new camera-IMU relative pose calibration method, where we take the different stochastic nature the accelerometer and magnetometer measurements into consideration, and propose a new a consistency measure defined on elevation and azimuth angles. When compared to state-of-the-art methods, it demonstrates strong advantage on both elevation and azimuth estimation accuracy while featuring convenient calibration setting-up. We also report the improved overall pose estimation accuracy based on evaluations on a real-world dataset using our method, in contrast to an uncalibrated phone.
In the last years, the performance and capabilities of Graphics processing Units (GPUs) improved drastically, mostly due to the demands of the entertainment market, with consumers and companies alike pushing for impro...
详细信息
ISBN:
(纸本)9781424437511
In the last years, the performance and capabilities of Graphics processing Units (GPUs) improved drastically, mostly due to the demands of the entertainment market, with consumers and companies alike pushing for improvements in the level of visual fidelity, which is only achieved with high performing GPU solutions. Beside the entertainment market, there is an ongoing global research effort for using such immense computing power for applications beyond graphics, such as the domain of general purpose computing. Efficiently combining these GPUs resources with existing CPU resources is also an important and open research task. This paper is a contribution to that effort, focusing on analysis of performance factors of combining both resource types, while introducing also a novel job scheduler that manages these two resources. Through experimental performance evaluation, this paper reports what are the most important factors and design considerations that must be taken into account while designing such job scheduler.
In the previous research, the assessment of author's influence is mainly based on the historical information of literature, such as the number of author's publications and times cited, and the reference relati...
详细信息
ISBN:
(纸本)9781538637906
In the previous research, the assessment of author's influence is mainly based on the historical information of literature, such as the number of author's publications and times cited, and the reference relationship. However, the author influence is not only reflected in the amount of static data, but also in the behavior that the author's point of view is noticed and communicated. Meanwhile, the influence spreads through the relational path of cooperation and citation between authors, on which the authors should have similar academic interests. Therefore, this paper proposed an influence spreading model with the author's co-citation interest similarity and the path of citation and cooperation. On the basis of this, a novel algorithm of influence spreading prediction is designed, and carried on the experiment verification using the public literature information resources. The results of AUC indicator show the effectiveness on the proposed method.
Synchronizing transmission and reception opportunities for different sender-receiver pairs and providing traffic priority aware access to different channels are challenging problems in Directional Sensor Networks (DSN...
详细信息
ISBN:
(纸本)9781538637906
Synchronizing transmission and reception opportunities for different sender-receiver pairs and providing traffic priority aware access to different channels are challenging problems in Directional Sensor Networks (DSNs). We propose a quality-aware multi-channel medium access control protocol, Q-DMAC for DSNs that increases reusability of multi-channel resources using nodes with directional antennas. The Q-DMAC nodes gives high priority access to the medium for nodes having important data packets and allocate more slots in a data window for nodes having larger number of packets awaiting for transmission. Finally, the simulation results show that our proposed Q-DMAC results improved network performance in terms of throughput, end-to-end delay, network lifetime.
As the parallel scale of HPC applications represented by earth system models becomes larger and the computing cost becomes higher, the performance of HPC applications is increasingly critical. Profiling HPC applicatio...
详细信息
ISBN:
(纸本)9781665435741
As the parallel scale of HPC applications represented by earth system models becomes larger and the computing cost becomes higher, the performance of HPC applications is increasingly critical. Profiling HPC applications accurately helps to model the applications and find the performance bottlenecks. However, due to the complexity of HPC applications, the diversity of programming languages, the differences of individual programming habits, and multiple architectures, accurate profiling becomes very tough. In this paper, we propose LPerf: a low-overhead and high-accuracy profiler for HPC applications. To reduce the profiling overhead and improve the profiling accuracy, we propose a preprocessing method which can automatically instrument with tunable granularity thus significantly reducing the run-time overhead of profiling, an aggregated caller-callee relationship which is used to locate relationship of functions efficiently, and a profiling-aware method which can precisely calculate running time of functions. The experimental results show that the error rate of profiling reaches 0.02%, and the overhead reaches 1.6%, in the earth system model named CAS-ESM. Compared with the baselines, the precision, accuracy, and overhead of LPerf have reached the state of the art.
Development of a decent parallel simulator is challenging work. It should achieve enough performance, scalability and fault tolerance. Our proposal is utilizing general-purpose data processing engines such as MapReduc...
详细信息
ISBN:
(纸本)9781509035052
Development of a decent parallel simulator is challenging work. It should achieve enough performance, scalability and fault tolerance. Our proposal is utilizing general-purpose data processing engines such as MapReduce implementations for parallel simulation. Widely used and mature engines take away a large part of the development effort and support scalability and fault tolerance. We demonstrate that a parallel discrete-event simulator can be implemented on such engines, Apache Hadoop and Apache Spark, by modeling message passing of distributed systems on MapReduce key-value processing model. Implemented simulators could handle 10 8 nodes with 10 computers. Preliminary evaluation showed that our Spark-based simulator is about 20 times as fast as an existing simulator thanks to Time Warp.
Convolutional Neural Networks (CNNs) have become more and more powerful in the computer vision domain, as they achieve the state-of-the-art accuracy. Despite this, it is generally difficult to apply CNNs on mobile pla...
详细信息
ISBN:
(纸本)9781538637906
Convolutional Neural Networks (CNNs) have become more and more powerful in the computer vision domain, as they achieve the state-of-the-art accuracy. Despite this, it is generally difficult to apply CNNs on mobile platforms. Client server paradigm is a straightforward way to deploy CNNs on mobile phones, but studies have shown that it suffers serious problems, such as privacy leaks. Recently, researchers focus on using heterogeneous local processors (e.g., GPUs, CPUs) to accelerate the inference of CNNs. Utilizing all local processors available can achieve the highest performance, but it might incur energy-inefficiency. Different from previous works, this paper concerns more about energy-efficiency of CNN based mobile applications. We present an adaptive strategy, which is able to compute the energy-efficiency of all local processors, and further to obtain the energy-efficient device processor combination to perform CNN inference in parallel. The strategy is implemented on ODROID platform, where the evaluation results show that our proposed approach provides 3.67 x higher energy-efficiency with only 9.7% performance degradation on average compared with the greedy strategy which tries to use all local processors available.
Be aware enough of the fact that performance gap between CPU and memory, employing novel memory techniques in embedded systems is a feasible way to reduce the performance gap. For MPSoC which is equipped with SRAM and...
详细信息
ISBN:
(纸本)9781538637906
Be aware enough of the fact that performance gap between CPU and memory, employing novel memory techniques in embedded systems is a feasible way to reduce the performance gap. For MPSoC which is equipped with SRAM and STT-RAM based hybrid SPMs, data can be effectively parallel accessed. This paper explores data allocation, task assignment and scheduling on MPSoC with SRAM and STT-RAM based hybrid SPMs. We proposed a mixed integer quadratically constrained program(MIQCP) formulations and a heuristic method (HA) to generates optimal and near optimal data allocation, task assignment, and scheduling solution. Experimental results show that MIQCP and HA can reduce 32.6% and 20.1% schedule length on average.
Since they were introduced, Java streams were very fast embraced by the industry, being currently used at a large scale. The parallelism enabled by them is very easy to achieve, but it is constrained either by the use...
详细信息
ISBN:
(纸本)9781728174457
Since they were introduced, Java streams were very fast embraced by the industry, being currently used at a large scale. The parallelism enabled by them is very easy to achieve, but it is constrained either by the used parallelism model (in some cases), or by the set of operations that could be specified using streams. We investigate in this paper the possibility to enhance the computation types that could be defined using the Java streams API by introducing into this infrastructure the PowerList theory based computation. Powerlists are recursive data structures that together with their associated algebraic theory offer both abstractions in order to ease the development of parallelapplications, and also a methodology to design parallel algorithms. The Java streaming infrastructure could be adapted to support them in a great measure. We present here such an adaptation, and we analyse and discuss the advantages and constraints. This analysis is exemplified by application examples.
暂无评论