Sparse triangular solver (SpTRSV) is an important and indispensable building block for many scientific applications. the parallelism of SpTRSV is exploited using Level-Set method in literature, however this method sti...
详细信息
ISBN:
(纸本)9781538666142
Sparse triangular solver (SpTRSV) is an important and indispensable building block for many scientific applications. the parallelism of SpTRSV is exploited using Level-Set method in literature, however this method still suffers from high synchronization cost and irregular global memory access especially on many-core architecture such as Sunway. In this paper, we propose an efficient implementation of SpTRSV using the massive computing resources on Sunway architecture. Specifically, we divide the 64 CPEs in a core group into three different roles, worker, router and storer. We also build a logical shared memory by carefully manipulating the scratchpad memory located in each storer and allow synchronization using the unique register communication on Sunway architecture. We partition the sparse matrix into multiple bands and replace the irregular global memory accesses with shared memory accesses, which significantly improves the data locality during the calculation of a band. Our experiments with 12 representative datasets demonstrate that our approach achieves up to 5.14x (2.65x on average) speedup.
the development of new technologies is setting a new era characterized, among other factors, by the rise of sophisticated mobile devices containing CPUs and GPUs. this emerging scenario of heterogeneous mobile archite...
详细信息
ISBN:
(纸本)9781509012336
the development of new technologies is setting a new era characterized, among other factors, by the rise of sophisticated mobile devices containing CPUs and GPUs. this emerging scenario of heterogeneous mobile architectures brings challenging issues regarding the use of the available computing resources. Such issues are mainly related to the intrinsic complexity of coordinating these processors in order to increase application performance. In this sense, this paper presents a high-level programming model to implement parallel patterns that can be executed in a coordinate way by heterogeneous mobile architectures. A comparative analysis of performance and programming complexity is presented, contrasting code generated automatically from the proposed programming model with low-level manually-optimized implementations.
the increasing performance needs in critical real-time embedded systems (CRTES), such as for instance the automotive domain, push for the adoption of high-performance hardware from the consumer electronics domain. How...
详细信息
ISBN:
(纸本)9781538677698
the increasing performance needs in critical real-time embedded systems (CRTES), such as for instance the automotive domain, push for the adoption of high-performance hardware from the consumer electronics domain. However, their time-predictability features are quite unexplored. the ARM *** architecture is a good candidate for adoption in the CRTES market (i.e. in the automotive market it has already started being used). In this paper we study ARM ***'s capabilities to meet CRTES requirements. In particular, we perform a qualitative and quantitative assessment of its timing characteristics, focusing on shared multicore resources, and how this architecture can be reliably used in CRTES.
GPU programmers suffer from programmer-managed GPU memory because bothperformance and programmability heavily depend on GPU memory allocation and CPU-GPU data transfer mechanisms. To improve performance and programma...
详细信息
the last decade has seen several changes in the structure and emphasis of enterprise IT systems. Specific infrastructure trends have included the emergence of large consolidated data centers, the adoption of virtualiz...
详细信息
ISBN:
(纸本)0769522750
the last decade has seen several changes in the structure and emphasis of enterprise IT systems. Specific infrastructure trends have included the emergence of large consolidated data centers, the adoption of virtualization and modularization, and an increased commoditization of hardware. At the application level, boththe workload mix and usage patterns have evolved to an increased emphasis on service-centric computing and SLA-driven performance tuning. these, often dramatic, changes in the enterprise IT landscape motivate equivalent changes in the emphasis of architecture research. In this paper, we summarize some recent trends in enterprise IT systems and discuss the implications for architecture research, suggesting some high-level challenges and open questions for the community to address.
computer architects need a deep understanding of clients' workload in order to design and tune the architecture. Unfortunately, many important clients will not share their software to computer architects due to th...
详细信息
While more cores can find place in the unit chip area every technology generation, excessive growth in power density prevents simultaneous utilization of all. Due to the lower operating voltage, Near-threshold Voltage...
详细信息
Future highperformancecomputing will undoubtedly reach Petascale and beyond. Today's HPC is tomorrow's Personal computing. What are the evolving processor architectures towards Multi-core and Many-core for t...
详细信息
We present HyFlow - a distributed software transactional memory (D-STM) framework for distributed concurrency control. HyFlow is a Java framework for D-STM, with pluggable support for directory lookup protocols, trans...
详细信息
ISBN:
(纸本)9781450305525
We present HyFlow - a distributed software transactional memory (D-STM) framework for distributed concurrency control. HyFlow is a Java framework for D-STM, with pluggable support for directory lookup protocols, transactional synchronization and recovery mechanisms, contention management policies, cache coherence protocols, and network communication protocols. HyFlow exports a simple distributed programming model that excludes locks: using (Java 5) annotations, atomic sections are defined as transactions, in which reads and writes to shared, local and remote objects appear to take effect instantaneously. No changes are needed to the underlying virtual machine or compiler. We describe HyFlow's architecture and implementation, and report on experimental studies comparing HyFlow against competing models including Java remote method invocation (RMI) with mutual exclusion and read/write locks, distributed shared memory (DSM), and directory-based D-STM. Our studies show that HyFlow outperforms competitors by as much as 40-190% on a broad range of transactional workloads on a 72-node system, with more than 500 concurrent transactions.
Cloud computing systems are becoming an important platform for science applications. Infrastructure as a Service (IaaS) clouds provide the capability to provision virtual machines (VMs) on demand with a specific confi...
详细信息
ISBN:
(纸本)9781450305525
Cloud computing systems are becoming an important platform for science applications. Infrastructure as a Service (IaaS) clouds provide the capability to provision virtual machines (VMs) on demand with a specific configuration of hardware resources, but they do not provide functionality for managing those resources once provisioned. In order for such clouds to be used effectively for parallel and distributed scientific applications, tools need to be developed that can help users to deploy their applications in the cloud. this paper describes a system we have developed to provision, configure, and manage clusters of virtual machines.
暂无评论