To obtain the efficiency of DBMS, HadoopDB combines Hadoop and DBMS, and claims the superiority over Hadoop in terms of performance. However, the approach of HadoopDB is simply putting Map Reduce onto unmodified singl...
详细信息
To obtain the efficiency of DBMS, HadoopDB combines Hadoop and DBMS, and claims the superiority over Hadoop in terms of performance. However, the approach of HadoopDB is simply putting Map Reduce onto unmodified single-machined DBMSs which has several obvious weaknesses. In essence, HadoopDB is a parallel DBMS with fault tolerance, which incurs unnecessary overhead due to the DBMS legacy. Instead of augmenting DBMS with Hadoop techniques, we propose a new systemarchitecture integrating modified DBMS engines as a read-only execution layer into Hadoop, where DBMS plays a role of providing efficient read-only operators rather than managing the data. Besides the obtained efficiency from DBMS engine, there are other advantages. The modified DBMS engine is able to directly process data from the HDFS (Hadoop Distributed File system) files at the block level, which means that the data replication can be handled by HDFS naturally, and the block-level parallelism is easily achieved. The global index access mechanism is added according to the Map Reduce paradigm. The data loading speed is also guaranteed by directly writing the data into HDFS with simplified logic. Experiments show that our system outperforms both original Hadoop and HadoopDB styled system.
MPI All to all communication is widely used in many high performance computing (HPC) applications. In All to all communication, each process sends a distinct message to all other participating processes. In multicore ...
详细信息
MPI All to all communication is widely used in many high performance computing (HPC) applications. In All to all communication, each process sends a distinct message to all other participating processes. In multicore clusters, processes within a node simultaneously contend for the same network resource of the node in All to all communication. However, many small synchronization messages are required in All to all communication of large messages. With the contention, their latency is orders of magnitude larger than that without contention. As a result, the synchronization overhead is significantly increased and accounts for a large proportion to the whole latency of All to all communication. In this paper, we analyse the considerable overhead of synchronization messages. Base on the analysis, an optimization is presented to reduce the number of synchronization messages from 3N to 2¡ÌN. Evaluations on a 240-core cluster show that the performance is improved by almost constant ratio, which is mainly determined by message size and independent of system scale. The performance of All to all communication is improved by 25% for 32K and 64K bytes messages. For FFT application, performance is improved by 20%.
In this paper, the problem of trajectory design for energy harvesting unmanned aerial vehicles (UAVs) is studied. In the considered model, the UAV acts as a moving base station to serve the ground users, while collect...
ISBN:
(数字)9781728131061
ISBN:
(纸本)9781728131078
In this paper, the problem of trajectory design for energy harvesting unmanned aerial vehicles (UAVs) is studied. In the considered model, the UAV acts as a moving base station to serve the ground users, while collecting energy from the charging stations located at the center of a user group. Meanwhile, to serve ground users and harvest energy, the UAV must be examined and repaired regularly. In consequence, it is necessary to optimize the trajectory design of the UAV while jointly considering the maintenance costs, the number of users that are served by the UAV, and the energy consumption and harvesting. To capture the relationship among these factors, we first model the completion of service and the harvested energy as reward, and the energy consumption during the deployment as cost. Then, the deployment profitability is defined as the reward to the cost of the UAV trajectory. Based on this definition, the trajectory design problem is formulated as an optimization problem whose goal is to maximize the deployment profitability of the UAV. To solve this problem, a foraging algorithm is proposed to find the optimal trajectory so as to maximize the deployment profitability. The proposed algorithm can find the optimal trajectory for the UAV with a polynomial time complexity. Fundamental analysis shows that the proposed algorithm can achieve the maximal deployment profitability. Simulation results show that the proposed algorithm can effectively reduce the operation time and achieve up to 25.6% gain in terms of the deployment profitability compared to Q-learning algorithm.
Dynamic voltage/frequency scaling (DVFS) has been widely applied to reduce the power dissipation of multi-cores processor. However, when applying DVFS, signals need to be synchronized between asynchronous clock domain...
详细信息
Dynamic voltage/frequency scaling (DVFS) has been widely applied to reduce the power dissipation of multi-cores processor. However, when applying DVFS, signals need to be synchronized between asynchronous clock domains with overhead of several cycles, which will result in performance penalty, and during frequency scaling the circuit cannot work. This paper proposes a novel variable frequency clock scheme in chip multiprocessors. In our scheme, processor cores running at different frequency can communicate with each other without the overhead of synchronizing signals. The results of simulation show that our scheme can achieve EDP improvement by 16.8percent, with only 3.6percent performance degradation.
The characteristics of advanced integrated circuit technologies require architects to look for new ways to utilize large numbers of gates and mitigate the effects of high interconnect delays. Chip multiprocessors (CMP...
详细信息
The characteristics of advanced integrated circuit technologies require architects to look for new ways to utilize large numbers of gates and mitigate the effects of high interconnect delays. Chip multiprocessors (CMPs) exploit increasing transistor counts by placing multiple processors on a single die. As the chip multiprocessors (CMPs) have become the trend of high performance microprocessors, the target workloads become more and more diversified. Due to the wire delay problem and diversity of applications, neither private nor shared caches can provide both large capacity and fast access in CMPs. A novel CMP cache design, the heterogeneous CMP cache (HCC) is presented, in which chips are constructed by tiles of two different categories. L2 caches of private tiles provide lowest hit latency and L2 cache of shared tiles increases the effective cache capacity for shared data. Incorporating indirect-index cache technology to share capacity between different hierarchies, HCC provide a both capacity-effective and access-fast on-chip memory subsystem. Detailed full-system simulations are used to analyze the HCC performance for various programs, including SPEC CPU2000, SPLASH2 and commercial workloads. The result shows that HCC improves performance by 16% for single-threaded benchmarks and 9% for multi-thread benchmarks. HCC is easy to implement and the design ideas will be used in the future multi-core processors of Godson series.
This paper proposes a hybrid dimming scheme based on joint LED selection and precoding design (TASP-HD) for multiple-user (MU) multiple-cell (MC) visible light communications (VLC) systems. In TASP-HD, both the LED se...
详细信息
Providing seamless connectivity for wireless virtual reality (VR) users has emerged as a key challenge for future cloud-enabled cellular networks. In this paper, the problem of wireless VR resource management is inves...
详细信息
Providing seamless connectivity for wireless virtual reality (VR) users has emerged as a key challenge for future cloud-enabled cellular networks. In this paper, the problem of wireless VR resource management is investigated for a wireless VR network in which VR contents are sent by a cloud to cellular small base stations (SBSs). The SBSs will collect tracking data from the VR users, over the uplink, in order to generate the VR content and transmit it to the end-users using downlink cellular links. For this model, the data requested or transmitted by the users can exhibit correlation, since the VR users may engage in the same immersive virtual environment with different locations and orientations. As such, the proposed resource management framework can factor in such spatial data correlation, so as to better manage uplink and downlink traffic. This potential spatial data correlation can be factored into the resource allocation problem to reduce the traffic load in both uplink and downlink. In the downlink, the cloud can transmit 360 contents or specific visible contents (e.g., user field of view) that are extracted from the original 360 contents to the users according to the users' data correlation so as to reduce the backhaul traffic load. In the uplink, each SBS can associate with the users that have similar tracking information so as to reduce the tracking data size. This data correlation-Aware resource management problem is formulated as an optimization problem whose goal is to maximize the users' successful transmission probability, defined as the probability that the content transmission delay of each user satisfies an instantaneous VR delay target. To solve this problem, a machine learning algorithm that uses echo state networks (ESNs) with transfer learning is introduced. By smartly transferring information on the SBS's utility, the proposed transfer-based ESN algorithm can quickly cope with changes in the wireless networking environment due to users' conten
2-D projective moment invariants were firstly proposed by Suk and Flusser in [12]. We point out here that there is a useless projective moment invariant which is equivalent to zero in their paper. 3-D projective momen...
详细信息
2-D projective moment invariants were firstly proposed by Suk and Flusser in [12]. We point out here that there is a useless projective moment invariant which is equivalent to zero in their paper. 3-D projective moment invariants are generated theoretically by investigating the property of signed volume of a tetrahedron. The main part is the selection of permutation invariant cores for multiple integrals to generate independent and nonzero 3-D projective moment invariants. We give the conclusion that projective moment invariants don't exist strictly speaking because of their convergence problem.
With the widespread adoption of embedded microprocessor-based systems in safety critical applications, such as aircrafts, spaceships and nuclear power plants, how to rapidly and conveniently evaluate these fault-toler...
详细信息
With the widespread adoption of embedded microprocessor-based systems in safety critical applications, such as aircrafts, spaceships and nuclear power plants, how to rapidly and conveniently evaluate these fault-tolerant mechanisms with low cost is an important problem. The traditional method requires a detailed hardware protocol to do evaluation, which lengthens evaluation period and increases the cost. A new dependability evaluation technique based on microprocessor function model is proposed, which can evaluate fault-tolerant mechanisms more rapidly, more conveniently and more economically than the conventional systems. As a case for study, the new system evaluates three fault-tolerant techniques;the software redundancy technique, the assertion validation technique and the instruction re-fetching and re-execution technique. The results show that the evaluation is reasonable.
The bandwidth becomes the major bottleneck of the performance improvement for modern microprocessors. A cache adaptive write allocate policy that improves the bandwidth of microprocessor significantly is proposed by i...
详细信息
The bandwidth becomes the major bottleneck of the performance improvement for modern microprocessors. A cache adaptive write allocate policy that improves the bandwidth of microprocessor significantly is proposed by investigating cache store misses. The cache adaptive write allocate policy collects fully modified blocks in miss queue. Fully modified blocks are written to lower level memory based on non-write allocate policy which can switch to write allocate policy adaptively. Compared with other cache store miss policies, the cache adaptive write allocate policy avoids unnecessary memory traffic, reduces cache pollution and decreases load and store queue full rate without increasing hardware overhead. Experiment results indicate that on average 62.6% memory bandwidth in STREAM benchmarks is improved by utilizing the cache adaptive write allocate policy. The performance of SPEC CPU 2000 benchmarks is also improved efficiently. The average IPC speedup is 5.9%.
暂无评论