Modern applications employ text files widely for providing data storage in a readable format for applications ranging from database systems to mobile phones. Traditional text processing tools are built around a byte-a...
详细信息
ISBN:
(纸本)9781467308243;9781467308267
Modern applications employ text files widely for providing data storage in a readable format for applications ranging from database systems to mobile phones. Traditional text processing tools are built around a byte-at-a-time sequential processing model that introduces significant branch and cache miss penalties. Recent work has explored an alternative, transposed representation of text, Parabix (parallel Bit Streams), to accelerate scanning and parsing using SIMD facilities. this paper advocates and develops Parabix as a general framework and toolkit, describing the software toolchain and run-time support that allows applications to exploit modern SIMD instructions for high performance text processing. the goal is to generalize the techniques to ensure that they apply across a wide variety of applications and architectures. the toolchain enables the application developer to write constructs assuming unbounded character streams and Parabix's code translator generates code based on machine specifics (e.g., SIMD register widths). the general argument in support of Parabix technology is made by a detailed performance and energy study of XML parsing across a range of processor architectures. Parabix exploits intra-core SIMD hardware and demonstrates 2x-7x speedup and 4x improvement in energy efficiency when compared with two widely used conventional software parsers, Expat and Apache-Xerces. SIMD implementations across three generations of x86 processors are studied including the new SandyBridge. the 256-bit AVX technology in Intel SandyBridge is compared withthe well established 128-bit SSE technology to analyze the benefits and challenges of 3-operand instruction formats and wider SIMD hardware. Finally, the XML program is partitioned into pipeline stages to demonstrate that thread-level parallelism enables the application to exploit SIMD units scattered across the different cores, achieving improved performance (2x on 4 cores) while maintaining single-threaded energ
the proceedings contain 36 papers. the topics discussed include: the network adapter: the missing link between MPI applications and network performance;on the efficiency of register file versus broadcast interconnect ...
the proceedings contain 36 papers. the topics discussed include: the network adapter: the missing link between MPI applications and network performance;on the efficiency of register file versus broadcast interconnect for collective communications in data-parallel hardware accelerators;network endpoints for clusters of SMPs;assessing energy efficiency of fault tolerance protocols for HPC systems;using heterogeneous networks to improve energy efficiency in direct coherence protocols for many-core CMPs;energy savings via dead sub-block prediction;scalable thread scheduling in asymmetric multicores for power efficiency;divergence analysis with affine constraints;exploiting concurrent GPU operations for efficient work stealing on multi-GPUs;sparse fast Fourier transform on GPUs and multi-core CPUs;cloud workload analysis with SWAT;and scalable algorithms for distributed-memory adaptive mesh refinement.
Recommender systems are mechanisms that filter information and predict a user's preference to an item. parallel implementations of recommender systems improve scalability issues and can be applied to internet-base...
详细信息
the goal of our investigation was to evaluate mechanical properties of the skin wounds during the first seven days of primary healing. We realized two parallel symmetrical skin incision (on the left and right side of ...
详细信息
As the foundation of cloud computing, Server consolidation allows multiple computer infrastructures running as virtual machines in a single physical node. It improves the utilization of most kinds of resource but memo...
详细信息
暂无评论