In order to improve the performance of applications on OpenMP/JIAJIA, we present a new abstraction, Array Relation Vector (ARV), to describe the relation between the data elements of two consistent shared arrays acces...
详细信息
ISBN:
(纸本)0769524052
In order to improve the performance of applications on OpenMP/JIAJIA, we present a new abstraction, Array Relation Vector (ARV), to describe the relation between the data elements of two consistent shared arrays accessed in one computation phase. Based on ARV, we use array grouping to eliminate the pseudo data distributing of small shared data and improve the page locality. Experimental results show that ARV-based array grouping can greatly improve the performance of applications with non-continuous data access and strict access affinity on OpenMP/JIAJIA cluster. For applications with small shared arrays, array grouping can improve the performance obviously when the processor number is small.
This paper presents a novel algorithm to detect null pointer dereference errors. The algorithm utilizes both of the must and may alias information in a compact way to improve the precision of the detection. Using may ...
详细信息
ISBN:
(纸本)9783540884781
This paper presents a novel algorithm to detect null pointer dereference errors. The algorithm utilizes both of the must and may alias information in a compact way to improve the precision of the detection. Using may alias information obtained by a fast flow- and context- insensitive analysis algorithm, we compute the must alias generated by the assignment statements and the must alias information is also used to improve the precision of the may alias. We can strong update more expressions using the must alias information, which will reduce the false positives of the detection for null pointer dereference. We have implemented our algorithm in the SUIF2 compiler infrastructure and the experiments results are as expected.
Deep reinforcement learning(RL)has become one of the most popular topics in artificial intelligence *** has been widely used in various fields,such as end-to-end control,robotic control,recommendation systems,and natu...
详细信息
Deep reinforcement learning(RL)has become one of the most popular topics in artificial intelligence *** has been widely used in various fields,such as end-to-end control,robotic control,recommendation systems,and natural language dialogue *** this survey,we systematically categorize the deep RL algorithms and applications,and provide a detailed review over existing deep RL algorithms by dividing them into modelbased methods,model-free methods,and advanced RL *** thoroughly analyze the advances including exploration,inverse RL,and transfer ***,we outline the current representative applications,and analyze four open problems for future research.
Broadcast authentication is a critical security service in wireless sensor networks. A protocol named μTESLA[1] has been proposed to provide efficient authentication service for such networks. However, when applied t...
详细信息
Broadcast authentication is a critical security service in wireless sensor networks. A protocol named μTESLA[1] has been proposed to provide efficient authentication service for such networks. However, when applied to applications such as time synchronization and fire alarm in which broadcast messages are sent infrequently, μTESLA encounters problems of wasted key resources and slow message verification. This paper presents a new protocol named GBA (Generalized broadcast authentication), for efficient broadcast authentication in these applications. GBA utilises the one-way key chain mechanism of μTESLA, but modifies the keys and time intervals association, and changes the key disclosure mechanism according to the message transmission model in these applications. The proposed technique can take full use of key resources, and shorten the message verification time to an acceptable level. The analysis and experiments show that GBA is more efficient and practical than μESLA in appli ations with various message transmission models.
With the rapid development of computing and networking technologies, people propose to build harmonious, trusted and transparent Internet-based virtual computing environments (iVCE). The overlay-based organization of ...
详细信息
With the rapid development of computing and networking technologies, people propose to build harmonious, trusted and transparent Internet-based virtual computing environments (iVCE). The overlay-based organization of dynamic Internet resources is an important approach for iVCE to realizing efficient resource sharing. DHT-based overlays are scalable, low-latency and highly available; however, the current DHT overlay (SKY) in iVCE cannot satisfy the "trust" requirements of Internet applications. To address this problem, in this paper we modify SKY and propose TrustedSKY, an embedded DHT overlay technique in iVCE which supports applications to select trusted nodes to form a "trusted subgroup" in the base overlay and realize secure and trusted DHT routing.
Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grain...
详细信息
Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grained reconfigurable array, and proposes a speculative execution mechanism for dynamic loop scheduling with the goal of one iteration per cycle and implementation techniques to support decoupling synchronization between the token generator and the collector. This paper also in- troduces the techniques of exploiting both data dependences of intra- and inter-iteration, with the help of two instructions for special data reuses in the loop-carried dependences. The experimental results show that the number of memory accesses reaches on average 3% of an RISC processor simulator with no memory optimization. In a practical image matching application, LEAP architecture achieves about 34 times of speedup in execution cycles, compared with general-purpose processors.
Many proposed P2P networks are based on traditional interconnection topologies. Given a static topology, the maintenance mechanism for node join/departure is critical to designing an efficient P2P network. Kautz graph...
详细信息
Many proposed P2P networks are based on traditional interconnection topologies. Given a static topology, the maintenance mechanism for node join/departure is critical to designing an efficient P2P network. Kautz graphs have many good properties such as constant degree, low congestion and optimal diameter. Due to the complexity in topology maintenance, however, to date there have been no effective P2P networks that are proposed based on Kautz graphs with base ~ 2. To address this problem, this paper presents the "distributed Kautz (D-Kautz) graphs", which adapt Kautz graphs to the characteristics of P2P networks. Using the D-Kautz graphs we further propose SKY, the first effective P2P network based on Kautz graphs with arbitrary base. The effectiveness of SKY is demonstrated through analysis and simulations.
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event cha...
详细信息
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event charge collection is composed of diffusion, drift, and the parasitic bipolar effect, while for PMOSs in the special layout, the parasitic bipolar junction transistor cannot turn on. Heavy ion experimental results show that PMOSs without parasitic bipolar amplification have a 21.4% decrease in the average SET pulse width and roughly a 40.2% reduction in the SET cross-section.
The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer ar...
详细信息
The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer architectures. However, the number of on-chip cores grows quickly with the scale-down of feature size in semiconductor technology. In this paper, we present a scalability investigation of one energy group time-independent deterministic discrete ordinates neutron transport in 3D Cartesian geometry(Sweep3D) on Intel's Many Integrated Core(MIC) architecture, which can provide up to 62 cores with four hardware threads per core now and will own up to 72 in the future. The parallel programming model, Open MP, and vector intrinsic functions are used to exploit thread parallelism and vector parallelism for the discrete ordinates method, respectively. The results on a 57-core MIC coprocessor show that the implementation of Sweep3 D on MIC has good scalability in performance. In addition, the application of the Roofline model to assess the implementation and performance comparison between MIC and Tesla K20 C Graphics processing Unit(GPU) are also reported.
Recently correlation filter based trackers have attracted considerable attention for their high computational efficiency. However, they cannot handle occlusion and scale variation well enough. This paper aims at preve...
详细信息
Recently correlation filter based trackers have attracted considerable attention for their high computational efficiency. However, they cannot handle occlusion and scale variation well enough. This paper aims at preventing the tracker from failure in these two situations by integrating the depth information into a correlation filter based tracker. By using RGB-D data, we construct a depth context model to reveal the spatial correlation between the target and its surrounding regions. Furthermore, we adopt a region growing method to make our tracker robust to occlusion and scale variation. Additional optimizations such as a model updating scheme are applied to improve the performance for longer video sequences. Both qualitative and quantitative evaluations on challenging benchmark image sequences demonstrate that the proposed tracker performs favourably against state-of-the-art algorithms.
暂无评论