Tightly-coupled parallelapplications in cloud systems may suffer from significant performance degradation because of the resource over-commitment issue. In this paper, we propose a dynamic approach based on the adapt...
详细信息
ISBN:
(纸本)9781509021413
Tightly-coupled parallelapplications in cloud systems may suffer from significant performance degradation because of the resource over-commitment issue. In this paper, we propose a dynamic approach based on the adaptive control over time-slice for virtual clusters, in order to mitigate the performance degradation for parallelapplications in cloud and avoid the negative impact effectively on other non-parallelapplications meanwhile. The key idea is to reduce the synchronization overhead inside and across virtual machines (VMs) in cloud systems, by dynamically adjusting the time-slices of VMs in terms of the spinlock latency at runtime. Such a design is motivated by our experimental finding that VM's time slice is a key factor determining the synchronization overhead as well as the parallel execution performance. We perform the evaluation on a real cluster environment deployed with XEN, using five well-known benchmarks with 10+ applications. Experiments show that our approach obtains 1.5-10× performance gain for running parallelapplications, than other state-of-the-art solutions (including Credit Scheduling of Xen and the well-known methods like Co-Scheduling and Balance Scheduling), with nearly unaffected impact on the performance of non-parallelapplications.
Real-time personalization is being increasingly referred to as the next major technological milestone of the Internet age. Digital assistants such as Siri, Cortana, Google Now are the first steps towards the same. Int...
详细信息
ISBN:
(纸本)9781509032051
Real-time personalization is being increasingly referred to as the next major technological milestone of the Internet age. Digital assistants such as Siri, Cortana, Google Now are the first steps towards the same. Internet of Things (IoT) is expected to drive hyper-personalization going forward, i.e., enable personalization across the board, for example but not limited to, home, health care, marketing, transportation, energy, infrastructure. There exists a large body of prior research in real-time systems. For instance, research and development of real-time systems has been done in domains such as control systems, avionics, operations research et cetera for over 50 years. With BigData analytics becoming mainstream since the turn of the century and the growing need to leverage real-time data to grow business, several real-time systems are being researched and deployed in the consumer space. Although a lot of emphasis is being laid on real-time processing, the notion of real-time is not well defined. This, in part, stems from the fact that the notion of real-time is a function of the target application. In this paper, we walk the reader through a brief history of prior work in real-time systems, give an in-depth view of the classification of real-time systems. Further, we overview the various systems in use today for real-time processing.
Analyzing large dynamic networks is an important problem with applications in a wide range of disciplines. A key operation is updating the network properties as its topology changes. In this paper we present graph spa...
详细信息
ISBN:
(纸本)9781509036837
Analyzing large dynamic networks is an important problem with applications in a wide range of disciplines. A key operation is updating the network properties as its topology changes. In this paper we present graph sparsification as an efficient abstraction for updating the properties of dynamic networks. We demonstrate the applicability of graph sparsification in updating the connected components in random and scale-free networks on shared memory systems. Our results show that the updating is scalable (10X on 16 processors for larger networks). To the best of our knowledge this is the first parallel implementation of graph sparsification. Based on these initial results, we discuss how the current implementation can be further improved and how graph sparsification can be applied to updating other network properties.
The algorithms foundational to visualization are central to the production visualization tools running at computing centers around the world and consume tremendous amounts of finite, limited resources. We believe that...
详细信息
ISBN:
(纸本)9781509036837
The algorithms foundational to visualization are central to the production visualization tools running at computing centers around the world and consume tremendous amounts of finite, limited resources. We believe that understanding the performance characteristics of these algorithms is critical in being good stewards of computational centers' resources. In this paper, we report initial studies on one such foundational algorithm: parallel particle advection. We have performed an extensive parameter study of the de facto standard algorithm that is commonly used in parallel production visualization tools as well as for in situ visualization environments. Our study has shown that the default parameters used in this algorithm lead to generally poor results, and identify settings that optimized performance on the system in our parameter sweep.
We present an evaluation of the performance of a Spark implementation of a classification algorithm in the domain of High Energy Physics (HEP). Spark is a general engine for in-memory, large-scale data processing, and...
详细信息
ISBN:
(纸本)9781509036837
We present an evaluation of the performance of a Spark implementation of a classification algorithm in the domain of High Energy Physics (HEP). Spark is a general engine for in-memory, large-scale data processing, and is designed for applications where similar repeated analysis is performed on the same large data sets. Classification problems are one of the most common and critical data processing tasks across many domains. Many of these data processing tasks are both computation-and data-intensive, involving complex numerical computations employing extremely large data sets. We evaluated the performance of the Spark implementation on Cori, a NERSC resource, and compared the results to an untuned MPI implementation of the same algorithm. While the Spark implementation scaled well, it is not competitive in speed to our MPI implementation, even when using significantly greater computational resources.
The ability to rapidly identify a given protein from small subsamples (i.e. peptides) is at the basis of fundamental applications in the medical field. At the basis of protein identification we have a string matching ...
详细信息
ISBN:
(纸本)9781509036837
The ability to rapidly identify a given protein from small subsamples (i.e. peptides) is at the basis of fundamental applications in the medical field. At the basis of protein identification we have a string matching problem which is computational intensive if we consider that the complexity of the algorithm scales with the length of the string and the number of sweeps of the database that are needed. In this paper we present an improvement for the FPGA-based string matching solution available in the literature improving the amount of parallelism exploited by the solution achieving a 1.63× reduction of the energy needed for the task over the literature and a 5.75× reduction when compared with high-end workstation.
The field of data analytics is currently going through a renaissance as a result of ever-increasing dataset sizes, the value of the models that can be trained from those datasets, and a surge in exible, distributed pr...
详细信息
ISBN:
(纸本)9781450343145
The field of data analytics is currently going through a renaissance as a result of ever-increasing dataset sizes, the value of the models that can be trained from those datasets, and a surge in exible, distributed programming models. In particular, the Apache Hadoop [1] and Spark [5] programming systems, as well as their supporting projects (e.g. HDFS, SparkSQL), have greatly simplified the analysis and transformation of datasets whose size exceeds the capacity of a single machine. While these programming models facilitate the use of distributed systems to analyze large datasets, they have been plagued by performance issues. The I/O performance bottlenecks of Hadoop are partially responsible for the creation of Spark. Performance bottlenecks in Spark due to the JVM object model, garbage collection, interpreted/-managed execution, and other abstraction layers are responsible for the creation of additional optimization layers, such as Project Tungsten [4]. Indeed, the Project Tungsten issue tracker states that the "majority of Spark workloads are not bottlenecked by I/O or network, but rather CPU and memory" [20]. In this work, we address the CPU and memory performance bottlenecks that exist in Apache Spark by accelerating user-written computational kernels using accelerators. We refer to our approach as Spark With Accelerated Tasks (SWAT). SWAT is an accelerated data analytics (ADA) framework that enables programmers to natively execute Spark applications on high performance hardware platforms with co-processors, while continuing to write their applications in a JVM-based language like Java or Scala. Runtime code generation creates OpenCL kernels from JVM bytecode, which are then executed on OpenCL accelerators. In our work we emphasize 1) full compatibility with a modern, existing, and accepted data analytics platform, 2) an asynchronous, event-driven, and resource-aware runtime, 3) multi-GPU memory management and caching, and 4) ease-of-use and programmability. Ou
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
暂无评论