The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer ar...
详细信息
The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer architectures. However, the number of on-chip cores grows quickly with the scale-down of feature size in semiconductor technology. In this paper, we present a scalability investigation of one energy group time-independent deterministic discrete ordinates neutron transport in 3D Cartesian geometry(Sweep3D) on Intel's Many Integrated Core(MIC) architecture, which can provide up to 62 cores with four hardware threads per core now and will own up to 72 in the future. The parallel programming model, Open MP, and vector intrinsic functions are used to exploit thread parallelism and vector parallelism for the discrete ordinates method, respectively. The results on a 57-core MIC coprocessor show that the implementation of Sweep3 D on MIC has good scalability in performance. In addition, the application of the Roofline model to assess the implementation and performance comparison between MIC and Tesla K20 C Graphics processing Unit(GPU) are also reported.
To reduce the access latencies of end hosts,latency-sensitive applications need to choose suitably close service machines to answer the access requests from end *** K nearest neighbor search locates K service machines...
详细信息
To reduce the access latencies of end hosts,latency-sensitive applications need to choose suitably close service machines to answer the access requests from end *** K nearest neighbor search locates K service machines closest to end hosts,which can efficiently optimize the access latencies for end *** work has weakness in terms of the accuracy and *** to the scalable and accurate K nearest neighbor search problem,we propose a distributed K nearest neighbor search method called DKNNS in this *** machines are organized into a locality-aware multilevel *** first locates a service machine that starts the search process based on a farthest neighbor search scheme,then discovers K nearest service machines based on a backtracking approach within the proximity region containing the target in the latency *** analysis,simulation results and deployment experiments on the PlanetLab show that,DKNNS can determine K approximately optimal service machines,with modest completion time and query ***,DKNNS is also quite stable that can be used for reducing frequent searches by caching found nearest neighbors.
Service-oriented systems are widely-employed in e-business, e-government, finance, management systems, and so on. Service fault tolerance is one of the most important techniques for building highly reliable service-or...
详细信息
Service-oriented systems are widely-employed in e-business, e-government, finance, management systems, and so on. Service fault tolerance is one of the most important techniques for building highly reliable service-oriented systems. In this paper, we provide an overview of various service fault tolerance techniques,including sections on fault tolerance strategy design, fault tolerance strategy selection, and Byzantine fault tolerance. In the first section, we introduce the design of static and dynamic fault tolerance strategies, as well as the major problems when designing fault tolerance strategies. After that, based on various fault tolerance strategies, in the second section, we identify significant components from a complex service-oriented system, and investigate algorithms for optimal fault tolerance strategy selection. Finally, in the third section, we discuss a special type of service fault tolerance techniques, i.e., the Byzantine fault tolerance.
There is an increasing need to build scalable distributed systems over the Internet infrastructure. However the development of distributed scalable applications suffers from lack of a wide accepted virtual computing e...
详细信息
There is an increasing need to build scalable distributed systems over the Internet infrastructure. However the development of distributed scalable applications suffers from lack of a wide accepted virtual computing environment. Users have to take great efforts on the management and sharing of the involved resources over Internet, whose characteristics are intrinsic growth, autonomy and diversity. To deal with this challenge, Internet-based Virtual Computing Environment (iVCE) is proposed and developed to serve as a platform for distributed scalable applications over the open infrastructure, whose kernel mechanisms are on-demand aggregation and autonomic collaboration of resources. In this paper, we present a programming language for iVCE named Owlet. Owlet conforms with the conceptual model of iVCE, and exposes the iVCE to application developers. As an interaction language based on peer-to-peer content-based publish/subscribe scheme, Owlet abstracts the Internet as an environment for the roles to interact, and uses roles to build a relatively stable view of resources for the on-demand resource aggregation. It provides language constructs to use 1) distributed event driven rules to describe interaction protocols among different roles, 2) conversations to correlate events and rules into a common context, and 3) resource pooling to do fault tolerance and load balancing among networked nodes. We have implemented an Owlet compiler and its runtime environment according to the architecture of iVCE, and built several Owlet applications, including a peer-to-peer file sharing application. Experimental results show that, with iVCE, the separation of resource aggregation logic and business logic significantly eases the process of building scalable distributed applications.
Concurrency bugs widely exist in concurrent programs and have caused severe failures in the real world. Researchers have made significant progress in detecting concurrency bugs, which improves software reliability. In...
详细信息
Concurrency bugs widely exist in concurrent programs and have caused severe failures in the real world. Researchers have made significant progress in detecting concurrency bugs, which improves software reliability. In this paper, we survey the most up-to-date and well-known concurrency bug detectors. We categorize the existing detectors based on the types of concurrency bugs. Consequently, we analyze data race detectors, atomicity violation detectors, order violation detectors, and deadlock detectors, respectively. We also discuss some other techniques which are mostly related to concurrency bug detection, including schedule bounding techniques, interleaving optimizing techniques, path expanding techniques, and deterministic replay techniques. Additionally, we statistically analyze the reviewed detectors and get some interesting findings, for instance, nearly 86% of previous detectors focus on data races and atomicity violations, and dynamic approaches are popular(74%). We also discuss the limitations of previous detectors, finding that 91% of previous detectors suffer from false negatives and 64% of previous detectors suffer from runtime overhead. Based on the reviewed detectors and statistical analysis, we conclude some future research directions, including accuracy, performance,applicability, and integrality.
The performance of IO-intensive applications is determined by the hit ratio of local disk cache and the IO latency of missed disk accesses. To improve the IO performance, in this paper we propose PIB, a peer-to-peer I...
详细信息
Broadcast authentication is a critical security service in wireless sensor networks. A protocol named μTESLA[1] has been proposed to provide efficient authentication service for such networks. However, when applied t...
详细信息
Broadcast authentication is a critical security service in wireless sensor networks. A protocol named μTESLA[1] has been proposed to provide efficient authentication service for such networks. However, when applied to applications such as time synchronization and fire alarm in which broadcast messages are sent infrequently, μTESLA encounters problems of wasted key resources and slow message verification. This paper presents a new protocol named GBA (Generalized broadcast authentication), for efficient broadcast authentication in these applications. GBA utilises the one-way key chain mechanism of μTESLA, but modifies the keys and time intervals association, and changes the key disclosure mechanism according to the message transmission model in these applications. The proposed technique can take full use of key resources, and shorten the message verification time to an acceptable level. The analysis and experiments show that GBA is more efficient and practical than μESLA in appli ations with various message transmission models.
Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grain...
详细信息
Reconfigurable computing tries to achieve the balance between high efficiency of custom computing and flexibility of general-purpose computing. This paper presents the implementation techniques in LEAP, a coarse-grained reconfigurable array, and proposes a speculative execution mechanism for dynamic loop scheduling with the goal of one iteration per cycle and implementation techniques to support decoupling synchronization between the token generator and the collector. This paper also in- troduces the techniques of exploiting both data dependences of intra- and inter-iteration, with the help of two instructions for special data reuses in the loop-carried dependences. The experimental results show that the number of memory accesses reaches on average 3% of an RISC processor simulator with no memory optimization. In a practical image matching application, LEAP architecture achieves about 34 times of speedup in execution cycles, compared with general-purpose processors.
Many proposed P2P networks are based on traditional interconnection topologies. Given a static topology, the maintenance mechanism for node join/departure is critical to designing an efficient P2P network. Kautz graph...
详细信息
Many proposed P2P networks are based on traditional interconnection topologies. Given a static topology, the maintenance mechanism for node join/departure is critical to designing an efficient P2P network. Kautz graphs have many good properties such as constant degree, low congestion and optimal diameter. Due to the complexity in topology maintenance, however, to date there have been no effective P2P networks that are proposed based on Kautz graphs with base ~ 2. To address this problem, this paper presents the "distributed Kautz (D-Kautz) graphs", which adapt Kautz graphs to the characteristics of P2P networks. Using the D-Kautz graphs we further propose SKY, the first effective P2P network based on Kautz graphs with arbitrary base. The effectiveness of SKY is demonstrated through analysis and simulations.
Hierarchical Automata has been widely used in modeling dynamic aspects of reactive software, such as in UML Statecharts. At the same time, model checking is an automatic technique to ensure the correctness of software...
详细信息
暂无评论