Performance fluctuations are common in various software such as databases and software networking stacks. A fluctuation refers different performance (latency, throughput) for similar or identical data-items (e.g. requ...
详细信息
ISBN:
(纸本)9781538655559
Performance fluctuations are common in various software such as databases and software networking stacks. A fluctuation refers different performance (latency, throughput) for similar or identical data-items (e.g. requests, queries, packets) due to non-functional states such as cache warmth. While tail latency caused by fluctuations badly affect user experiences, diagnosing them is difficult as reproducing non-functional states in a controlled environment is not feasible. To this end, we estimate elapsed time of each function for each data-item individually to observe a single fluctuation occurrence online so that reproducing non-functional states is no longer needed. The issue is that instrumentation-based tracing methods are too heavy because a function takes a few micro seconds in high-throughput softwaresystems for the multi-core age. We propose a hybrid approach of instrumentation and hardware-based sampling. It enables to diagnose performance fluctuations of high-throughput softwaresystems with acceptable and adjustable overhead. Our evaluations show that it can clearly show a performance fluctuation that occurs by different cache-warmth in a sample application, and that it can be also applied to realistic software.
The market for parallel and distributed computing systems keeps growing. Technological advances in processor power, networking, telecommunication and multimedia are stimulating the development of applications requirin...
详细信息
The market for parallel and distributed computing systems keeps growing. Technological advances in processor power, networking, telecommunication and multimedia are stimulating the development of applications requiring parallel and distributed computing. An important research problem in this area is the need to find a robust bridge between the decentralisation of knowledge sources in information-based systems and the distribution of computational power. Consequently, the attention of the research community has been directed towards high-level, concurrent, distributed programming. This work proposes a new hypermedia framework based on the metaphor of the actor model. The storage and run-time layers are represented entirely as communities of independent actors that cooperate in order to accomplish common goals, such as version management or user adaptivity. These goals involve fundamental and complex hypermedia issues, which, thanks to the distribution of tasks, are treated in an efficient and simple way.
This paper presents a new debugging methodology for applications targeting reconfigurable platforms. The key issue behind is that bringing softwareengineering techniques advantages to hardware design would reduce des...
详细信息
ISBN:
(纸本)9781424437511
This paper presents a new debugging methodology for applications targeting reconfigurable platforms. The key issue behind is that bringing softwareengineering techniques advantages to hardware design would reduce design cycles hence time-to-market. Our high-level synthesis framework supports probes insertion both in the behavioural description of the application and in its hierarchical netlist. Probe status can control the execution, and traced signals can be read back from software. Probes' conditions can be reassigned at runtime tackling the main disadvantage of modifications through re-synthesis and favours short debugging cycles similarly to software development.
Domain decomposition method is a popular algorithm, which is adopted to the parallel finite element method(FEM). The formulation for solving sparse linear systems of equations is presented. The TAU performance analysi...
详细信息
ISBN:
(纸本)9780769541105
Domain decomposition method is a popular algorithm, which is adopted to the parallel finite element method(FEM). The formulation for solving sparse linear systems of equations is presented. The TAU performance analysis software is used to analyze and understand the execution behavior of the parallel algorithm such as: communication patterns, processor load balance, and computation versus communication ratios, timing characteristics, and processor idle time. This is all done by displays of post-mortem trace-files. Performance bottlenecks can easily be identified at the appropriate level of detail. A large-scale mechanical calculation of a dam by the parallel FEM program was brought out using the Dawning 5000A parallel computer at the Henan technical University Supercomputer Center. The TAU performance analysis software are used to analyze and understand the execution behavior of the parallel algorithm such as: communication patterns, processor load balance, computation versus communication ratios, timing characteristics, and processor idle time. This is all done by displays of post-mortem trace-files. Statistics show that the formulation is efficient in parallel computing environments and that the formulation is significantly faster and consumes less memory.
Modern reconfigurable devices such as FPGAs can be reconfigured at run time. Some of them can be dynamically partially reconfigured, which means part of the FPGA is changed without interrupting other parts. This featu...
详细信息
ISBN:
(纸本)9780769546766
Modern reconfigurable devices such as FPGAs can be reconfigured at run time. Some of them can be dynamically partially reconfigured, which means part of the FPGA is changed without interrupting other parts. This feature adds tremendous flexibility to the Reconfigurable Computing (RC) Field but also introduces challenges. Reconfigurable Operating systems tend to ease applications development and most importantly applications verifications and maintenance. In this paper we propose novel scheduling algorithms for reconfigurable computing that can handle both hardware and software tasks. The algorithms proposed reuse hardware tasks to reduce reconfiguration overhead, migrate tasks between software/hardware, and give priority to hardware tasks. Results obtained indicate that adding a software processor element not only adds flexibility, but also increases system performance. Two on-line schedulers were designed and implemented. RCSched-I is a simple based implementation that nominates the first available free Partial Reconfigurable Region (PRR) for new tasks. RCSched-II on the other hand nominates any free PRR. Both schedulers check the nominated PRR(s) against the ready task for a match, then decide if there is a need for reconfiguration or not. RCSched-II reconfigures the least recently configured PRR, which increases hardware tasks reuse and decreases total processing time.
parallel and distributed computing have enabled development of much more scalable software. However, developing concurrent software requires the programmer to be aware of non-determinism, data races, and deadlocks. MP...
详细信息
ISBN:
(纸本)9781538609415
parallel and distributed computing have enabled development of much more scalable software. However, developing concurrent software requires the programmer to be aware of non-determinism, data races, and deadlocks. MPI (message passing interface) is a popular standard for writing message-oriented distributed applications. Some messages in MPI systems can be processed by one of the many machines and in many possible orders. This non-determinism can affect the result of an MPI application. The alternate results may or may not be correct. To verify MPI applications, we need to check all these possible orderings and use an application specific oracle to decide if these orderings give correct output. MPJ Express is an open source Java implementation of the MPI standard. Model checking of MPI Java programs is a challenging task due to their parallel nature. We developed a Java based model of MPJ Express, where processes are modeled as threads, and which can run unmodified MPI Java programs on a single system. This model enabled us to adapt the Java PathFinder explicit state software model checker (JPF) using a custom listener to verify our model running real MPI Java programs. The evaluation of our approach shows that model checking reveals incorrect system behavior that results in very intricate message orderings.
The ability to check whether the modeled system satisfies certain properties is a very important aspect in the software development process. Many object-oriented methods do not pay enough attention to behavioral descr...
详细信息
Spaceflight software continues to experience exponential growth as functionality migrates from hardware to software. The resulting complexity of these mission critical systems demands new approaches to software system...
详细信息
ISBN:
(纸本)9781479944309
Spaceflight software continues to experience exponential growth as functionality migrates from hardware to software. The resulting complexity of these mission critical systems demands new approaches to softwaresystemsengineering in order to effectively manage the development efforts and ensure that reliability is not compromised. Model-based systems / softwareengineering (MBE) approaches present attractive solutions to address the size and complexity through abstraction and analytical models. However, there are many challenges that must be addressed before MBE approaches can be effectively adopted on a large scale across an entire system. In this paper, we discuss some of the key motivators and challenges based on our experiences with flight software programs employing elements of MBE.
Reductions matter and they are here to stay. Wide adoption of parallel processing hardware in a broad range of computer applications has encouraged recent research efforts on their efficient parallelization. Furthermo...
详细信息
ISBN:
(纸本)9780769549712
Reductions matter and they are here to stay. Wide adoption of parallel processing hardware in a broad range of computer applications has encouraged recent research efforts on their efficient parallelization. Furthermore, trends towards high productivity languages in mainstream computing increases the demand for efficient programming support. In this paper we present a new approach on parallel reductions for distributed memory systems that provides both scalability and programmability. Using OmpSs, a task-based parallel programming model, the developer has the ability to express scalable reductions through a single pragma annotation. This pragma annotation is applicable for tasks as well as for work-sharing constructs (with implicit tasking) and instructs the compiler to generate the required runtime calls. The supporting runtime handles data and task distribution, parallel execution and data reduction. Scalability is achieved through a software cache that maximizes local and temporal data reuse and allows overlapped computation and communication. Results confirm scalability for up to 32 12-core cluster nodes.
Contemporary activities at CSCS/SCSC have resulted in two complementary softwaresystems for practical parallel programming. Both developments are user oriented and application driven, efficiently exploiting and reusi...
详细信息
Contemporary activities at CSCS/SCSC have resulted in two complementary softwaresystems for practical parallel programming. Both developments are user oriented and application driven, efficiently exploiting and reusing demonstrated portable technologies at multiple levels that we have proven to scale to systems with large numbers of processors. The Annai parallel application engineering environment supports existing standards for portable program development (HPF, Fortran, C, MPI) and offers convenient program browsing and navigation, execution control, and interaction mechanisms. Specific functionality for parallel programming includes high-level language support for unstructured computations, interactive source-level symbolic debugging with deadlock detection and deterministic execution replay, SPMD/data-parallel debugging with distributed breakpoints and array visualization, scalable profile summary displays of execution statistics accumulated at runtime, and detailed program evolution anti processor interaction charts. The intelligent program development environment (PDE) is typified by four characteristics supporting programming at a very abstract level, closer to the scientist's perspective: application-oriented problem description formalisms, the use of design skeletons and templates, an interactive user guidance mechanism, and automatic program synthesis techniques. Ongoing work aims at the integration and further development of sofar demonstrated results with additional advanced technologies into comprehensive application engineering and problem-solving environments for productive parallel computing with distributed resources.
暂无评论