In classical machine virtualization, a hypervisor runs multiple operating systems simultaneously, each on its own virtual machine. In nested virtualization, a hypervisor can run multiple other hypervisors with their a...
详细信息
ISBN:
(纸本)9781931971799
In classical machine virtualization, a hypervisor runs multiple operating systems simultaneously, each on its own virtual machine. In nested virtualization, a hypervisor can run multiple other hypervisors with their associated virtual machines. As operating systems gain hypervisor functionality-Microsoft Windows 7 already runs Windows XP in a virtual machine-nested virtualization will become necessary in hypervisors that wish to host them. We present the design, implementation, analysis, and evaluation of high-performance nested virtualization on Intel x86-based systems. The Turtles project, which is part of the linux/KVM hypervisor, runs multiple unmodified hypervisors (e.g., KVM and VMware) and operating systems (e.g., linux and Windows). Despite the lack of architectural support for nested virtualization in the x86 architecture, it can achieve performance that is within 6-8% of single-level (non-nested) virtualization for common workloads, through multi-dimensional paging for MMU virtualization and multi-level device assignment for I/O virtualization.
We present algorithms for shrinking and expanding a hash table while allowing concurrent, wait-free, linearly scalable lookups. These resize algorithms allow Read-Copy Update (RCU) hash tables to maintain constant-tim...
详细信息
ISBN:
(纸本)9781931971850
We present algorithms for shrinking and expanding a hash table while allowing concurrent, wait-free, linearly scalable lookups. These resize algorithms allow Read-Copy Update (RCU) hash tables to maintain constant-time performance as the number of entries grows, and reclaim memory as the number of entries decreases, without delaying or disrupting readers. We call the resulting data structure a relativistic hash table. Benchmarks of relativistic hash tables in the linux kernel show that lookup scalability during resize improves 125x over reader-writer locking, and 56% over linux's current state of the art. Relativistic hash lookups experience no performance degradation during a resize. Applying this algorithm to memcached removes a scalability limit for get requests, allowing memcached to scale linearly and service up to 46% more requests per second. Relativistic hash tables demonstrate the promise of a new concurrent programming methodology known as relativistic programming. Relativistic programming makes novel use of existing RCU synchronization primitives, namely the wait-for-readers operation that waits for unfinished readers to complete. This operation, conventionally used to handle reclamation, here allows ordering of updates without read-side synchronization or memory barriers.
Read-Copy Update (RCU) is a scalable, high-performance linux-kernel synchronization mechanism that runs low-overhead readers concurrently with updaters. Production-quality RCU implementations are decidedly non-trivial...
详细信息
Developers often take a proactive approach to software design, especially those from cultures valuing industriousness over procrastination. Lazy approaches, however, have proven their value, with examples including re...
Developers often take a proactive approach to software design, especially those from cultures valuing industriousness over procrastination. Lazy approaches, however, have proven their value, with examples including reference counting, garbage collection, and lazy evaluation. This structured deferral takes the form of synchronization via procrastination, specifically reference counting, hazard pointers, and RCU (read-copy-update).
Embedded systems are increasingly present in many electronic devices and is often related to critical applications. Therefore, the need for a well planned and executed testing procedure is even higher. We intend to co...
详细信息
Effectiveness and quality are fundamental characteristics for the development of a product. In order to support them, one needs to ensure that an application optimization level is at its best. The most widely used met...
详细信息
Effectiveness and quality are fundamental characteristics for the development of a product. In order to support them, one needs to ensure that an application optimization level is at its best. The most widely used metric for evaluating an application performance is the CPI (Cycles Per Instruction), i.e., the number of clock cycles that takes place when an instruction is executed. We have developed a CPI Breakdown Model Plug-in, that automates the profiling of an application in the Power architecture, breaking it down into several groups of CPI events and metrics in order to identify possible bottlenecks. When analyzing such events and metrics the user can become aware of which operations are causing the processor to stall, and consequently enhance the application source code. We have discussed the adaptation of some command-line tools to be used by our CPI plug-in, which is integrated into the ibm Software Development Toolkit for Powerlinux, an Eclipse based IDE comprising a set of mainstream C/C++ development tools along with several in-house ibm ones. A case study that shows the usefulness of our approach is presented and details on how to optimize an application are discussed.
With the growing importance of the cloud computing paradigm, it is a challenge for cloud providers to keep the operational costs of the data centers in check, especially in the emerging markets, alongside catering to ...
详细信息
Over the past 25 years the Brazilian Symposium on Software Engineering (SBES) has evolved to become the most important event on software engineering in Brazil. Throughout these years, SBES has gathered a large body of...
详细信息
My parallel-programming education began in earnest when I joined Sequent Computer Systems in late 1990. This education was both brief and effective: within a few short years, my co-workers and I were breaking new grou...
详细信息
Maze solving has been used as an example parallelprogramming problem for some years. Suggested solutions are often based on a sequential program, using work queues to allow multiple threads to explore different portio...
暂无评论