In this paper, we present the design and implementation of a new cluster file system, th-CluFS, which is based on the standard NFS protocol and is implemented in the user level space completely. this open platform fil...
详细信息
ISBN:
(纸本)0769515126
In this paper, we present the design and implementation of a new cluster file system, th-CluFS, which is based on the standard NFS protocol and is implemented in the user level space completely. this open platform file system is important as the clusters become larger and heterogeneous. To take advantages of the accumulated resources and high-speed network in clusters, th-CluFS follows a serverless architecture, hybrid distributed metadata management, and file granular data distribution, and it uses distributed metadata cache and unique cache to optimize performance. For the flexibility of th-CluFS, we plan to employ file migration to balance I/O load across nodes dynamically. According to the experiment results, we conclude that th-CluFS can meet the requirements of consistent file system view, performance and scalability gracefully.
the external selection problem is to select the record withthe K-th smallest key from the given N records that are distributed and stored evenly on the D disks for the parallel machine with D processors. Each process...
详细信息
ISBN:
(纸本)0769515126
the external selection problem is to select the record withthe K-th smallest key from the given N records that are distributed and stored evenly on the D disks for the parallel machine with D processors. Each processor has its own primary memory of size M records and one disk, where N/D>M. the processors are connected with a /spl radic/D /spl times/ /spl radic/D mesh architecture. Based on a two-stage approach, this paper presents an efficient parallel external selection algorithm for the distributed-memory parallel systems. First, all the processors execute local external sorting in parallel, each processor sorts the N/D records on its own disk. Next, they execute parallel external selection from the D sorted sub-files on the D disks. this algorithm is asymptotically optimal and has a small constant factor of time complexity.
the proceedings contain 70 papers. the special focus in this conference is on High Performance Computing. the topics include: High-performance computing and visualization;2-d wavelet transform enhancement on general-p...
ISBN:
(纸本)3540003037
the proceedings contain 70 papers. the special focus in this conference is on High Performance Computing. the topics include: High-performance computing and visualization;2-d wavelet transform enhancement on general-purpose microprocessors;a general data layout for distributed consistency in data parallel applications;duplication-based scheduling algorithm for interconnection-constrained distributed memory machines;evaluating arithmetic expressions using tree contraction;a mechanism to reduce i-cache power consumption in high performance microprocessors;exploiting web document structure to improve storage management in proxy caches;high performance multiprocessor architecture design methodology for application-specific embedded systems;a low latency messaging infrastructure for Linux clusters;low-power high-performance adaptive computing architectures for multimedia processing;a technique to construct high performance CORBA applications;automatic search for performance problems in parallel and distributed programs by using multi-experiment analysis;an adaptive value-based scheduler and its RT-Linux implementation;effective selection of partition sizes for moldable scheduling of parallel jobs;runtime support for multigrain and multiparadigm parallelism;a fully compliant openMP implementation on software distributed shared memory;a fast connection-time redirection mechanism for internet application scalability;an efficient resource sharing scheme for dependable real-time communication in multihop networks;improving web server performance by network aware data buffering and caching;wraps scheduling and its efficient implementation on network processors;performance comparison of pipelined hash joins on workstation clusters and iterative algorithms on heterogeneous network computing.
In this paper, a high accuracy analog random access memory (ARAM) for programmable vision chips is addressed. In the context of a large retina, the main sources of error are analyzed. Several architectures are present...
详细信息
In this paper, a high accuracy analog random access memory (ARAM) for programmable vision chips is addressed. In the context of a large retina, the main sources of error are analyzed. Several architectures are presented and compared. the design of an improved ARAM is detailed.
Query cost models are widely used, both for performance analysis and for comparing execution plans during query *** essence, a cost model predicts where time is being spent during query evaluation. Although many cost ...
详细信息
In the past few years, cluster computing has been accepted widely as parallel platform because of its high performance at an affordable cost. To maximize the use of available resources, resource monitoring for cluster...
详细信息
In the past few years, cluster computing has been accepted widely as parallel platform because of its high performance at an affordable cost. To maximize the use of available resources, resource monitoring for cluster computing is required. the resource information collected can be used by any parallel applications, i.e. parallel motion estimation, for handling variation of available resources in typical time-sharing computers. therefore, the computing load can be distributed properly among n processors. In this paper, we present the development of resource monitoring for cluster computing using MPI programming model to achieve efficient parallel motion estimation. Results show the effectiveness of our method in which the faster parallel execution time can be achieved.
In this paper, the discrete state space recursive filters are implemented in the form of parallel array processors. the state space description permits the straightforward application of systolic architectures to real...
详细信息
In this paper, the discrete state space recursive filters are implemented in the form of parallel array processors. the state space description permits the straightforward application of systolic architectures to realize recursive filters of 1D and 2D types. We show that the recursivity inherent to the filtering algorithm introduces a latency proportional to the filter order. Moreover, we show that the use of the CTP decomposition technique together withthe cylindrical-type structures reduces significantly this latency and improves the computation throughput of these arrays. the processing cells of the systolic array are designed via switched-capacitor techniques.
In this paper, we present an efficient 3-bit-scan multiplier without overlapping bits which has good power-delay area trade-offs. Generation of partial product terms in this multiplier is performed in parallel withth...
详细信息
ISBN:
(纸本)0769514413
In this paper, we present an efficient 3-bit-scan multiplier without overlapping bits which has good power-delay area trade-offs. Generation of partial product terms in this multiplier is performed in parallel withthe multiplication operation. parallel partial product generation results in a multiplier which is faster than conventional sequential multipliers. the architecture of the 3-bit-scan multiplier without overlapping bits is therefore suitable for synchronous sequential multipliers which are required to operate at low power and at relatively high speed for their area.
Presents a new method of performing division in hardware and explores different ways of implementing it. this method involves computing a preliminary estimate of the quotient by splitting the dividend, performing divi...
详细信息
ISBN:
(纸本)0769514413
Presents a new method of performing division in hardware and explores different ways of implementing it. this method involves computing a preliminary estimate of the quotient by splitting the dividend, performing division of each of the parts in parallel and merging them. the estimate is refined iteratively to get the final quotient. this method is significantly fast since it carries out parallel operations to compute the preliminary quotient and makes use of a fast multiplier to refine the result. It is possible to pipeline the execution of the unit yielding further increase in throughput. Speed estimates show that this method yields a much higher throughput than other fast methods, while area and latency are comparable.
the efficiency of a meeting has a lot to do withthe attitudes the participants have towards the meeting goals. the outcome of a meeting is very dependent on how the meeting participants behave, i.e., how they assume ...
详细信息
the efficiency of a meeting has a lot to do withthe attitudes the participants have towards the meeting goals. the outcome of a meeting is very dependent on how the meeting participants behave, i.e., how they assume expected roles. We can select participants based on their ability to play the desired roles prior to the meeting, or we can later try to examine the participants' behavior by analyzing their interventions during the meeting. this paper discusses this latter approach and suggests a semantic network analysis to map interventions onto roles, defined as essential to a good team performance.
暂无评论