this paper proposes a data race prevention scheme, which can prevent data races in the View-Oriented parallel Programming (VOPP) model. VOPP is a novel shared-memory data-centric parallel programming;model, which uses...
详细信息
ISBN:
(纸本)9781424452910
this paper proposes a data race prevention scheme, which can prevent data races in the View-Oriented parallel Programming (VOPP) model. VOPP is a novel shared-memory data-centric parallel programming;model, which uses views to bundle mutual exclusion with data access. We have implemented the data race prevention scheme with a memory protection mechanism. Experimental results show that the extra overhead of memory protection is trivial in our applications. We also present a new VOPP implementation-Maotai 2.0, which has advanced features such as deadlock avoidance, producer/consumer view and system queues, in addition to the data race prevention scheme. the performance of Maotai 2.0 is evaluated and compared with modern programming models such as OpenMP and Cilk.
the proceedings contain 54 papers. the topics discussed include: an experimental study of diversity with off-the-shelf antivirus engines;simulating fixed virtual nodes for adapting wireline protocols to MANET;seed sch...
ISBN:
(纸本)9780769536989
the proceedings contain 54 papers. the topics discussed include: an experimental study of diversity with off-the-shelf antivirus engines;simulating fixed virtual nodes for adapting wireline protocols to MANET;seed scheduling for peer-to-peer networks;proximity-aware distributed mutual exclusion for effective peer-to-peer replica management;towards improved overlay simulation using realistic topologies;analysis of round-robin implementations of processor sharing, including overhead;comparison of price-based static and dynamic job allocation schemes for grid computing systems;a rule based co-operative approach for cell selection in high speed cellular networks;sharing private information across distributed databases;energy-aware prefetching for parallel disk systems;attribute-based prevention of phishing attacks;TTM based security enhancement for inter-domain routing protocol;and introducing probability in RFID reader-to-reader anti-collision.
Fault tolerance is an important requirement for long-running parallel programs. this paper presents a different approach to fault-tolerance support in message-passing parallel programs based on their structural and be...
详细信息
ISBN:
(纸本)9781424452910
Fault tolerance is an important requirement for long-running parallel programs. this paper presents a different approach to fault-tolerance support in message-passing parallel programs based on their structural and behavioral characteristics, commonly known as patterns. A classification of these patterns and their applicable fault-tolerance strategies is aimed to facilitate an application developer to incorporate appropriate fault-tolerance strategies to an application. Fault-tolerance strategies for two of the patterns are discussed, and one specific strategy is elaborated and analyzed. the presented strategies have been incorporated into a fault-tolerance support framework called FT-PAS. One objective of the framework is to separate the fault tolerance related details from an application developer's main objectives (separation-of-concerns). the paper presents the additional key features of the framework, and concludes with a discussion on current and future research directions.
parallel programming is notoriously difficult. this becomes even more critical as multicore processors bring parallelcomputing into the mainstream. In order to ease the difficulty, tools have been designed that help ...
详细信息
ISBN:
(纸本)9781424452910
parallel programming is notoriously difficult. this becomes even more critical as multicore processors bring parallelcomputing into the mainstream. In order to ease the difficulty, tools have been designed that help the programmer with some aspects of parallelisation. Unfortunately, the programmer is mostly left along when it comes to the difficult task of dependence analysis among the subtasks to be executed concurrently. this paper presents a new visual tool that supports the programmer withthe dependence analysis in loops. this is very useful in combination with an automatically parallelising compiler or when loops are parallelised with OpenMP. the tool displays on-the-fly the dependences between the statements of the loop nest on which the developer is currently working. To maximise the usefulness of the tool, it is unobtrusive, customisable and flexible, and based on dependence analysis theory. A prototype was implemented for the Eclipse IDE as a plug-in that seamlessly integrates into the normal development process. the evaluation of the tool, including an evaluation against cognitive dimensions, demonstrates the usability and usefulness of the tool.
In a ubiquitous computing environment, contexts are initially got and stored on those nodes scattered over the environment. However, the traditional reasoning about contexts applied a centralized approach which aggrav...
详细信息
ISBN:
(纸本)9781424452910
In a ubiquitous computing environment, contexts are initially got and stored on those nodes scattered over the environment. However, the traditional reasoning about contexts applied a centralized approach which aggravated the load of reasoning server and cost for communication of context. As an important approach for context reasoning, the rule-based reasoning can be easily decomposed of independent propositions. therefore, the rule-based context reasoning can be decomposed and distributed over those nodes of the ubiquitous environment to decrease the computing load of reasoning server. In this paper, we propose a distributed fuzzy reasoning Petri net model (dFRPN) towards the decomposition of distributed fuzzy reasoning. dFRPN model is able to formalize the distribution of fuzzy reasoning over every node of the whole environment. dFRPN model is characterized as a hierarchical structure to make a description of reasoning on nodes more clear. Considering the limited capabilities of some mobile nodes such as PDA and smart phone, we add the special migrating transition to define the load detection and corresponding actions. At the end of the paper, the feasibility of dFRPN model is validated through a case of context-awareness based personalized recommendation system.
We propose a new method that accelerates existing Byzantine Fault Tolerance (BFT) protocols for asynchronous distributed systems by parallelizing the involved consensuses. BFT realizes a reliable system against Byzant...
详细信息
ISBN:
(纸本)9781424452910
We propose a new method that accelerates existing Byzantine Fault Tolerance (BFT) protocols for asynchronous distributed systems by parallelizing the involved consensuses. BFT realizes a reliable system against Byzantine failures and is usually solved by repeatedly executing a consensus for a set of requests. Our method consistently parallelizes the consensus by introducing a new extra consensus on the order of processing agreed requests. We show the correctness of our method and analyze its performance in comparison with an existing non-parallelizing method and a naively parallelizing method. the results indicate that our parallelizing method is approximately 20% faster than those methods in such configurations where many replicas are running in order to increase reliability.
though XML is applied intensively in a lot of applications, XML parsing is not practical in many fields because of its poor performance. parallel XML parsing on multi-core is a promising choice. Previous methods all a...
详细信息
ISBN:
(纸本)9781424452910
though XML is applied intensively in a lot of applications, XML parsing is not practical in many fields because of its poor performance. parallel XML parsing on multi-core is a promising choice. Previous methods all adopt data parallel approach on XML parsing. As the semi-structured nature of XML, they were obliged to divide the data into well-formed XML chunks and then parse these chunks parallel. the division process is named as preparsing. As the preparsing is serial, it becomes the bottleneck of parallel XML parsing. Related work Simultaneous Finite Transducer (SFTXP) :parallelized the preparsing stage. It maintained multiple preparser results for each equal sized chunk according to enumerated all possible parsing states. In spite of finite states for each XML, the overhead by SFTXP is tremendous, including CPU time and memory for multiple results generating and storing, respectively. In this work, we address parallel XML parsing by Key Element Parse Tracing (KEPT) method which parallelizes the preparsing and parsing at element level. It remolds the preparsing as a key element extracting process and schedules the processing of key elements in 1:he framework of KEPT. then parsing process is parallelized as a whole. To demonstrate the effectiveness, we implement it on libxml2 and obtain good scalability on both an 8-core Linux machine and an 8-core 24 SMT Sun machine running Solaris.
General Purpose computing over Graphical Processing Units (GPGPUs) is a huge shift of paradigm in parallelcomputingthat promises a dramatic increase in performance. But GPGPUs also bring an unprecedented level of co...
详细信息
ISBN:
(纸本)9781424452910
General Purpose computing over Graphical Processing Units (GPGPUs) is a huge shift of paradigm in parallelcomputingthat promises a dramatic increase in performance. But GPGPUs also bring an unprecedented level of complexity in algorithmic design and software development. In this paper we describe the challenges and design choices involved in parallelization of Bayesian Optimization Algorithm (BOA) to solve complex combinatorial optimization problems over nVidia commodity graphics hardware using Compute Unified Device Architecture (CUDA). BOA is a well-known multivariate Estimation of Distribution Algorithm (EDA) that incorporates methods for learning Bayesian Network (BN). It then uses BN to sample new promising solutions. Our implementation is fully compatible with modern commodity GPUs and therefore we call it gBOA (BOA on GPU). In the results section, we show several numerical tests and performance measurements obtained by running gBOA over an nVidia Tesla C1060 GPU. We show that in the best case we can obtain a speedup of up to 13x.
We have designed and implemented the Blue Whale File System (BWFS), a scalable distributed file system for large distributed data-intensive applications. With many of the features as previous distributed file systems,...
详细信息
ISBN:
(纸本)9781424452910
We have designed and implemented the Blue Whale File System (BWFS), a scalable distributed file system for large distributed data-intensive applications. With many of the features as previous distributed file systems, BWFS has successfully met our storage needs and is widely deployed within many fields. Although excellent for high-bandwidth access to large files, BWFS's out-of-band data transfer mode provides low efficiency under small files intensive workloads. In order to improve the overall performance of the file system, we propose a novel data transfer scheme. In such novel scheme, BWFS transfers data withthe hybrid data transfer policy that small files are transferred with in-band mode while large files are transferred with out-of-band mode. the prototype design and implementation is described and the various experiments are presented to demonstrate that the significant performance benefits of our prototype implementation under the small files intensive workloads. For small files intensive applications, BWF'S can achieve significantly higher throughput which increases by 60%.
We propose an approximation algorithm for the problem of Fault-Tolerant Facility Location which is implemented in a distributed and asynchronous manner within O(n) rounds of communication. Here n is the number of vert...
详细信息
ISBN:
(纸本)9781424452910
We propose an approximation algorithm for the problem of Fault-Tolerant Facility Location which is implemented in a distributed and asynchronous manner within O(n) rounds of communication. Here n is the number of vertices in the network. As far as we know, the performance guarantee of similar algorithms (centralized) remains unknown except a special case where all cities have a uniform connectivity requirement. In this paper, we assume the shortest-path routing scheme deployed, as well as a constant (given) size of R, which represents the distinct levels of fault-tolerant capability provided by the system (i.e distinct connectivity requirements), and prove that the cost of our solution is no more than vertical bar R vertical bar . F* + C* in the general case, where F* and C* are respectively the facility cost and connection cost in an optimal solution. Further more, extensive numerical experiments showed that the quality of our solutions is comparable to the optimal solutions when vertical bar R vertical bar is no more than 10.
暂无评论