Flash SSDs are being incorporated in many enterprise storage platforms recently. However, the characteristics of the flash SSD are quite different from that of hard *** IO strategies in the existing systems should be ...
详细信息
A VLSI parallel and distributed computation algorithm has been proposed and mapped onto a VLSI architecture for a 1-D discrete cosine transform (DCT) involving the symmetry property. In this 1-D DCT processor architec...
详细信息
A VLSI parallel and distributed computation algorithm has been proposed and mapped onto a VLSI architecture for a 1-D discrete cosine transform (DCT) involving the symmetry property. In this 1-D DCT processor architecture, there are (log22N) DCT processor units (PUs) required for computation of a frame of N-point data with a time complexity of O(N). Further, a proposed 2-D DCT processor architecture requires (M(log22N) + N(log22M) PUs with a time complexity of O(M + N). An optimal architecture for computation of a multidimensional DCT has been proposed. The 3-D DCT processor architecture requires NL log22M + LM log22N + MN log22L PUs with a time complexity of O(M + N +L). All architectures can be controlled by firmware;hence they are more flexible, efficient, and fault-tolerant and therefore very suitable for VLSI implementation.
Since the dawn of the big data era the search giant Google has been in the lead for meeting the challenge of the new era. Results from Google's big data projects in the past decade have inspired the development of...
详细信息
ISBN:
(纸本)9781479981281
Since the dawn of the big data era the search giant Google has been in the lead for meeting the challenge of the new era. Results from Google's big data projects in the past decade have inspired the development of many other big data technologies such as Apache Hadoop and NoSQL databases. The study article examines ten major milestone papers on big data management published by Google, from Google File system (GFS), MapReduce, Bigtable, Chubby, Percolator, Pregel, Dremel, to Megastore, Spanner and finally Omega. The purpose of the study article is to help provide a high-level understanding of the concepts behind many popular big data solutions and derive insights on building robust and scalable systems for handling big data.
HDTV based applications require FSBM to maintain its significantly higher resolution than traditional broadcasting formats (NTSC, SECAM, PAL). This paper proposes some techniques to increase the speed and reduce the a...
详细信息
ISBN:
(纸本)9783540772194
HDTV based applications require FSBM to maintain its significantly higher resolution than traditional broadcasting formats (NTSC, SECAM, PAL). This paper proposes some techniques to increase the speed and reduce the area requirements of an FSBM hardware. These techniques are based on modifications of the Sum-of-Absolute-Differences (SAD) computation and the MacroBlock (MB) searching strategy. The design of an FSBM architecture based on the proposed approaches has also been outlined. The highlight of the proposed architecture is its split pipelined design to facilitate parallelprocessing of macroblocks (MBs) in the initial stages. The proposed hardware has high throughput, low silicon area and compares favorably with other existing FPGA architectures.
Almost all applications containing indirect array addressing (irregular accesses) have a substantial number of direct array accesses (regular accesses) too. A conspicuous percentage of these direct array accesses usua...
详细信息
Almost all applications containing indirect array addressing (irregular accesses) have a substantial number of direct array accesses (regular accesses) too. A conspicuous percentage of these direct array accesses usually require interprocessor communication for the applications to run on a distributed memory multicomputer. This study highlights how lack of a uniform representation and lack of a uniform scheme to generate communication structures and parallel code for regular and irregular accesses in a mixed regular-irregular application prevent sophisticated optimizations. Furthermore, we also show that code generated for regular accesses using compile-time schemes are not always compatible to code generated for irregular accesses using run-time schemes. In our opinion, existing schemes handling mixed regular-irregular applications either incur unnecessary preprocessing costs or fail to perform the best communication optimization. This study presents a uniform scheme to handle both regular and irregular accesses in a mixed regular-irregular application. While this allows for sophisticated communication optimizations such as message coalescing, message aggregation to be made across regular and irregular accesses, the preprocessing costs incurred are likely to be minimum. Experimental comparisons for various benchmarks on a 16-processor IBM SP-2 show that our scheme is feasible and better than existing schemes.
Deep learning developed in the last decade and has been established as recent, modern and very promising technique with large potential to be successfully applied in various domains. Despite deep learning outperforms ...
详细信息
The proceedings contain 82 papers. The topics discussed include: run-time automatic performance tuning for multicore applications;exploiting cache traffic monitoring for run-time race detection;accelerating data race ...
ISBN:
(纸本)9783642233968
The proceedings contain 82 papers. The topics discussed include: run-time automatic performance tuning for multicore applications;exploiting cache traffic monitoring for run-time race detection;accelerating data race detection with minimal hardware support;event log mining tool for large scale HPC systems;reducing the overhead of direct application instrumentation using prior static analysis;reducing energy usage with memory and computation-aware dynamic frequency scaling;using the last-mile model as a distributed scheme for available bandwidth prediction;self-stabilization versus robust self-stabilization for clustering in ad-hoc network;multilayer cache partitioning for multiprogram workloads;parallel inexact constraint preconditioners for saddle point problems;hardware and software tradeoffs for task synchronization on manycore architectures;and progress guarantees when composing lock-free objects.
The volume of data, one of the five "V" characteristics of Big Data, grows at a rate that is much higher than the increase of ability of the existing systems to manage it within an acceptable time. Several t...
详细信息
ISBN:
(数字)9783319614281
ISBN:
(纸本)9783319614281;9783319614274
The volume of data, one of the five "V" characteristics of Big Data, grows at a rate that is much higher than the increase of ability of the existing systems to manage it within an acceptable time. Several technologies have been developed to approach this scalability issue. For instance, MapReduce has been introduced to cope with the problem of processing a huge amount of data, by splitting the computation into a set of tasks that are concurrently executed. The savings of even a marginal time in the processing of all the tasks of a set can bring valuable benefits to the execution of the whole application and to the management costs of the entire data center. To this end, we propose a technique to minimize the global processing time of a set of tasks, having different service requirements, concurrently executed on two or more heterogeneous systems. The validity of the proposed technique is demonstrated using a multiformalism model that consists of a combination of Queueing Networks and Petri Nets. Application of this technique to an Apache Hive case-study shows that the described allocation policy can lead to performance gains on both total execution time and energy consumption.
Isolating computation and communication concerns into separate pure computation and pure coordination modules enhances modularity, understandability, and reusability of parallel and/or distributed software. MANIFOLD i...
详细信息
The dynamic provisioning of virtual machines (VMs) supported by many cloud computing infrastructures eases the scalability of software applications. Unfortunately, VMs are relatively slow to boot and public cloud prov...
详细信息
ISBN:
(纸本)9783030105495;9783030105488
The dynamic provisioning of virtual machines (VMs) supported by many cloud computing infrastructures eases the scalability of software applications. Unfortunately, VMs are relatively slow to boot and public cloud providers do not allow users to vary their resources (vertical scalability) dynamically. To tackle both problems, a few years ago we presented a solution that combines the management of VMs with the use of containers specifically targeted to the efficient runtime management of the resources provisioned to Web applications. This paper borrows from this solution and addresses the problem of provisioning resources to big data, Spark applications at runtime. Spark does not allow for the runtime scalability of the resources associated with its executors, but resources must be provisioned statically. To tackle this problem, the paper describes a container-based version of Spark that supports the dynamic resizing of the memory and CPU cores associated with the different executors. The evaluation demonstrates the feasibility of the approach and identifies the trade-offs involved.
暂无评论