Clusters of workstations are a popular alternative to integrated parallel systems designed and built by a vendor Besides their huge cumulative processing power they also provide a large data storage capacity which all...
详细信息
ISBN:
(纸本)0769510108;0769510116
Clusters of workstations are a popular alternative to integrated parallel systems designed and built by a vendor Besides their huge cumulative processing power they also provide a large data storage capacity which allows efficient implementations of large-scale applications which are I/O intensive, This paper proposes a language, compiler and runtime software solution to the problem of parallel I/O on clusters. The proposal is presented in the context Of High Performance Fortran and its compilation and runtime environment that is being developed at the University of Vienna. The system provides efficient support for explicit I/O operations on parallel files, accesses to sections of multi-dimensional arrays stored in parallel files, checkpoint/restart operations, and time step output and input operations. The paper also presents experimental performance results using the implementation of the developed software on a Beowulf-class cluster system.
Performing distributed software transactional memory (DSTM) applications in a public cloud is investigated in this paper. Transactions are introduced in DSTM for simplifying parallel programming in distributed environ...
详细信息
Analysis of tissue using image processing techniques is essential for dealing with a number of problems in cancer research. The identification of normal and cancerous colonic mucosa is such a problem. In this paper te...
详细信息
ISBN:
(纸本)0780376676
Analysis of tissue using image processing techniques is essential for dealing with a number of problems in cancer research. The identification of normal and cancerous colonic mucosa is such a problem. In this paper texture analysis techniques are used to measure certain characteristics of normal and cancerous tissue images. A genetic algorithm undertakes the analysis of those results in order to determine the operations useful for the given problem and in the most appropriate operation combination for the purpose of maximising the classification accuracy. The system developed for undertaking those tasks has been implemented on a cluster of Linux workstations using distributed computing techniques. A distributed programming message-passing library, PVM (parallel Virtual Machine), provides the basis for building this system.
Cloud applications are increasingly playing a crucial role in big data analytics. New use cases such as autonomous cars and edge computing call for novel approaches mixing heterogeneous computing and machine learning....
详细信息
ISBN:
(纸本)9781728116440
Cloud applications are increasingly playing a crucial role in big data analytics. New use cases such as autonomous cars and edge computing call for novel approaches mixing heterogeneous computing and machine learning. These applications typically process petabyte-scale datasets, therefore, requiring low-power and scalable storage providing low-latency and high-throughput data access. While data centers have been focusing on migrating from legacy HDDs and SATA SSDs by deploying high-throughput and low-latency NVMe SSDs, the data bottlenecks appear as capacity scales. One approach to tackle this problem is to enable processing to happen within the storage device -in-storage processing ( ISP)-eliminating the need to move the data. In this paper, we investigated the deployment of storage units with embedded low-power application processors along with FPGA-based reconfigurable hardware accelerators to address both performance and energy efficiency. To this purpose, we developed a high-capacity solid-state drive ( SSD) named Catalina equipped with a quad-core ARM A53 processor running a Linux operating system along with a highly efficient FPGA accelerator for running applications in-place. We evaluated our proposed approach on a case study application for a similarity search library called Faiss.
In distributed application domains where data change rapidly, it is often desirable for programs to obtain the latest available data values to achieve accurate computations. Example applications are financial services...
详细信息
ISBN:
(纸本)081864222X
In distributed application domains where data change rapidly, it is often desirable for programs to obtain the latest available data values to achieve accurate computations. Example applications are financial services and network management. Such data are logically shared by a network of programs. Unlike data in traditional databases, rapidly changing data are usually not lockable by (client) programs and it is crucial to the computations to access their values in a timely manner. In these application domains, a typical program usually performs computations based on recently available data values obtained from the network. However, these data values may be inconsistent or obsolete, since the real data are external to the system and may change more rapidly than can be reflected by their copies within the system. Decision making based on such inaccurate computations can lead to substantial penalties. In this paper, we propose an approach to delaying data value retrieval until needed in distributed programming, considering data and configuration change rapidly. This approach offers the advantage of obtaining more recent data values, resulting in more accurate computations and decision making.
We present a method for reconstruction of the visual hull (VH) of an object in real-time from multiple video streams. A state of the art polyhedral reconstruction algorithm is accelerated by implementing it for parall...
详细信息
ISBN:
(纸本)9781457700361
We present a method for reconstruction of the visual hull (VH) of an object in real-time from multiple video streams. A state of the art polyhedral reconstruction algorithm is accelerated by implementing it for parallel execution on a multi-core graphics processor (GPU). The time taken to reconstruct the VH is measured for both the accelerated and non-accelerated implementations of the algorithm, over a range of image resolutions and number of cameras. The results presented are of relevance to researchers in the field of 3D reconstruction at interactive frame rates (real-time), for applications such as telepresence.
The proceedings contain 128 papers. The topics discussed include: C parallelizing compiler on local-net work- based computer environment;OCCAM prototyping of massively parallelapplications from colored Petri-nets;per...
ISBN:
(纸本)0818634421
The proceedings contain 128 papers. The topics discussed include: C parallelizing compiler on local-net work- based computer environment;OCCAM prototyping of massively parallelapplications from colored Petri-nets;performance characteristics of the iPSC/SSO and CM-2 I/O systems;automatic parallelization of LINPACK routines on distributed memory parallel processors;transformation of doacross loops on distributed memory systems;an efficient atomic multicast protocol for client-server models;a new horizon for sorting on mesh architectures;mapping of uniform dependence algorithm onto fixed size processor arrays;and towards understanding block partitioning for sparse Cholesky factorization.
A network of (wireless smart) cameras can analyse the scene from different views. Wireless smart cameras challenge the hardware for low-power consumption and high imaging performance. In this paper we introduce a wire...
详细信息
ISBN:
(纸本)1424407281
A network of (wireless smart) cameras can analyse the scene from different views. Wireless smart cameras challenge the hardware for low-power consumption and high imaging performance. In this paper we introduce a wireless smart camera based on an SIMD video-analysis processor and an 8051 microcontroller as a local host. Wireless communication is through the ieee802.15.4 standard. The camera constructed in this paper is to enable application research into distributed smart camera systems.
We evaluate the average-case performance of three approximation algorithms for online non-clairvoyant scheduling of parallel tasks with precedence constraints. We show that for a class of wide task graphs, when task s...
详细信息
Wire routing has always been very compute bound phase in the realm of physical design of Very Large Integration Circuits (VLSI) circuits. Some of the software solutions to this problem entail divide and conquer method...
详细信息
ISBN:
(纸本)0780312813
Wire routing has always been very compute bound phase in the realm of physical design of Very Large Integration Circuits (VLSI) circuits. Some of the software solutions to this problem entail divide and conquer methods like the hierarchical routing, etc., in order to reduce its time complexity. Recently, hardware accelerators have been employed to achieve further increase in the speed of this process. In this paper, implementation aspects of a reduced array architecture (RAA) for hardware acceleration of the cut and paste hierarchical routing algorithm are detailed. Several macros have been defined to implement the algorithm in hardware. The architecture has been implemented in double-metal 2ji CMOS technology.
暂无评论