In this paper, we present an adaptive version of the parallel Distributive Join (DJ) algorithm that we proposed in [1]. the adaptive parallel DJ algorithm can handle the data skew in operand relations efficiently. We ...
详细信息
ISBN:
(纸本)0769511538
In this paper, we present an adaptive version of the parallel Distributive Join (DJ) algorithm that we proposed in [1]. the adaptive parallel DJ algorithm can handle the data skew in operand relations efficiently. We implemented the original and adaptive parallel DJ algorithms on a network of Alpha workstations using the parallel Virtual Machine (PVM). We analyzed the performance of the algorithms, and compared it withthat of the parallel Hybrid-Hash (KH) join algorithms. Our results show that the parallel DJ algorithms perform comparably withthe parallel HH join algorithms over the entire range of the number of processors used and for different join selectivities. A significant advantage of the parallel DJ algorithms is that they can easily support non-equijoin operations.
Techniques for scheduling parallel I/O for both uniprogrammed systems that run single jobs in isolation and multiprogrammed environments that execute multiple parallel jobs simultaneously ate presented. the performanc...
详细信息
ISBN:
(纸本)0769511538
Techniques for scheduling parallel I/O for both uniprogrammed systems that run single jobs in isolation and multiprogrammed environments that execute multiple parallel jobs simultaneously ate presented. the performance of the scheduling algorithms is evaluated on a network of workstations. A new scheduling algorithm proposed in this paper is observed to perform very well for systems running single jobs in isolation. the algorithmsthat use knowledge of job characteristics are observed to produce a superior performance in multiprogrammed parallel environments.
Architectural synthesis is an efficient design process that reduces the gap between algorithms and architectures by raising the abstraction level. However, this process currently does not take the VLSI circuit interco...
详细信息
ISBN:
(纸本)0780370570
Architectural synthesis is an efficient design process that reduces the gap between algorithms and architectures by raising the abstraction level. However, this process currently does not take the VLSI circuit interconnection cost into account whereas this cost becomes predominant using submicron technologies. In this paper, an interconnection cost analysis at the behavioural level is performed in order to provide rapid prototyping results and to direct the synthesis process with additional path constraints. Results are presented showing the interest of this approach.
PIM (Processor-In-Memory) architectures have been proposed in recent years. One major objective of PIM is to reduce the performance gap between the CPU and memory. To exploit the potential benefits of PIM, we designed...
详细信息
ISBN:
(纸本)0769511538
PIM (Processor-In-Memory) architectures have been proposed in recent years. One major objective of PIM is to reduce the performance gap between the CPU and memory. To exploit the potential benefits of PIM, we designed a statement base parallelizing system - SAGE in [1][2]. In this paper, we extend this system to achieve better performance by devising several comprehensive optimizing techniques, which include IMOP (Intelligent Memory Operation) recognition, tiling for PIM, and a precise mechanism to get load-balanced execution schedule. the experimental results are also presented and discussed.
In this paper we propose an efficient parallel implementation of Edmonds' algorithm for finding optimum branchings on a model of the SIMD type with vertical data processing (the STAR-machine). To this end for a di...
详细信息
ISBN:
(纸本)0769511538
In this paper we propose an efficient parallel implementation of Edmonds' algorithm for finding optimum branchings on a model of the SIMD type with vertical data processing (the STAR-machine). To this end for a directed graph given as a list of triples (edge vertices and the weight), we construct a new associative version of Edmonds' algorithm. this version is represented as the corresponding STAR procedure whose correctness is proved. We obtain that on vertical processing systems Edmonds' algorithm takes O(n log n) time, where n is the number of graph vertices.
In this study, an unicast routing algorithm based on the parallel branching method has been developed for the faulty hypercube parallelprocessing system. the developed method has been compared withthe cube algebra m...
详细信息
ISBN:
(纸本)0780370570
In this study, an unicast routing algorithm based on the parallel branching method has been developed for the faulty hypercube parallelprocessing system. the developed method has been compared withthe cube algebra method developed by us and withthe studies in literature related to this subject. Withthe developed routing algorithm, the routing from the source node to the destination one is fulfilled in available minimal step without any restriction to the number of faulty nodes. In the algorithm, the system with circuit switching has been considered, and the obtained results have been visually simulated by using the developed hypercube routing simulation program. the performance of simulator has been evaluated by using the comparison the number of the fulfilled process versus the number of faulty nodes for the two method developed by us.
this paper presents a high-speed RSA encryption processor employing a highly parallel architecture based on the redundant binary number arithmetic and table-look-up, and also presents a defect-tolerance design suitabl...
详细信息
ISBN:
(纸本)0780370570
this paper presents a high-speed RSA encryption processor employing a highly parallel architecture based on the redundant binary number arithmetic and table-look-up, and also presents a defect-tolerance design suitable for the processor to solve the low yield problem. It is demonstrated that the gate delay through the critical path determining the operation speed of the processor is 1/60 that of the conventional processor. It is also demonstrated that the increase of chip size by introducing defect-tolerance is 6.4% and the increase of delay is minimized.
Considering the properties of mobile computing environments, push-based data dissemination systems have lately attracted considerable interests. However, skewed access pat tern of mobile clients makes average wait tim...
详细信息
ISBN:
(纸本)0769511538
Considering the properties of mobile computing environments, push-based data dissemination systems have lately attracted considerable interests. However, skewed access pat tern of mobile clients makes average wait time worse and they may want to request the data object to the server explicitly through backchannel. We call the broadcast model supporting backchannel as adaptive broadcast. In this paper, we devise new algorithms for adaptive broadcast based on our previous works;that is, we divide data objects which the server maintains into push_data and pull_data. Clients have to explicitly request data objects in pull_data. Maintaining transactional consistency in both pure-push and adaptive broadcast environment is our main concerns. We also evaluate the performance behavior through simulation study.
Efficient utilization of processing resources in a large, multi-user pal allel computer system depends oil the reliable processor allocation algorithms. this paper presents an LSSA (L-Shaped Submesh Allocation) strate...
详细信息
ISBN:
(纸本)0769511538
Efficient utilization of processing resources in a large, multi-user pal allel computer system depends oil the reliable processor allocation algorithms. this paper presents an LSSA (L-Shaped Submesh Allocation) strategy to reduce external fragmentation and job response time, simultaneously. LSSA manipulates the shape of the required submesh to fit into the fragmented mesh system and accommodates incoming jobs faster than other strategies. LSSA call be applied to mesh-connected parallel systems with faulty processors. the basic idea is to reconfigure a faulty mesh system into a maximum convex system using the fault-free upper or lower boundary nodes to compensate for the non-boundary faulty,lodes. To utilize the non-rectangular shapen system pa,ts, LSSA tries to allocate L-shaped submeshes instead of signaling the allocation failure. Extensive simulations show that LSSA performs more efficiently than other strategies in terms of the external fragmentation, the job response time and the system utilization.
In this paper we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster the scheduler can take advantage of this network's unique capabilities, includ...
详细信息
ISBN:
(纸本)0769512607
In this paper we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster the scheduler can take advantage of this network's unique capabilities, including a network interface card-based processor and memory and efficient user-level communication libraries. We developed a micro-benchmark to test the scheduler's performance under various aspects of parallel job workloads: memory usage, bandwidth and latency-bound communication, number of processes, timeslice quantum, and multiprogramming levels. Our experiments show that the gang scheduler performs relatively well under most workload conditions, is largely insensitive to the number of concurrent jobs in the system and scales almost linearly with number of nodes. On the other hand, the scheduler is very sensitive to the timeslice quantum, and values under 30 seconds can incur large overheads and fairness problems.
暂无评论