We describe new features of FG that are designed to improve performance and extend the range of computations that fit into its framework. FG (short for Framework Generator) is a programming environment for parallel pr...
详细信息
ISBN:
(纸本)0769523129
We describe new features of FG that are designed to improve performance and extend the range of computations that fit into its framework. FG (short for Framework Generator) is a programming environment for parallel programs running on clusters. It was originally designed to mitigate latency in accessing data by running a program as a series of asynchronous stages that operate on buffers in a linear pipeline. To improve performance, FG now allows stages to be replicated, either statically by the programmer or dynamically by FG itself. FG also now alters thread priorities to use resources more efficiently;again, this action may be initiated by either the programmer or FG. To extend the range of computations that fit into its framework, FG now incorporates fork-join and DAG structures. Not only do fork-join and DAG structures allow for more programs to be designed for FG, but they also can enable significant performance improvements over linear pipeline structures.
This paper provides a comprehensive performance evaluation of job scheduling policies for parallel systems on which large jobs require the maximum or close to the maximum resource available on the system. A wide range...
详细信息
ISBN:
(纸本)0769523129
This paper provides a comprehensive performance evaluation of job scheduling policies for parallel systems on which large jobs require the maximum or close to the maximum resource available on the system. A wide range of policies are evaluated, including nonpreemptive Backfill policies, time-sharing with Gang Scheduling, and dynamic Equi-spatial policies. Through detailed performance analysis, key problems of each class of policies are identified. As a simpler alternative to Gang Scheduling and Equi-spatial, we propose using a short runtime limit (e.g., one hour as opposed to 10's hours) for Backfill policies. Our simulation results show that applying a short runtime limit on jobs that request a sufficiently large number of processors has the potential to significantly improve FCFS-Backfill policies for all job classes in most workloads studied.
An algorithm has been developed to dynamically schedule heterogeneous tasks on heterogeneous processors in a distributed system. The scheduler operates in an environment with dynamically changing resources and adapts ...
详细信息
ISBN:
(纸本)0769523129
An algorithm has been developed to dynamically schedule heterogeneous tasks on heterogeneous processors in a distributed system. The scheduler operates in an environment with dynamically changing resources and adapts to variable system resources. It operates in a batch fashion and utilises a genetic algorithm to minimise the total execution time. We have compared our scheduler to six other schedulers, three batch-mode and three immediate-mode schedulers. We have performed simulations with randomly generated task sets, using uniform, normal, and Poisson distributions, whilst varying the communication overheads between the clients and scheduler. We have achieved more efficient results than all other schedulers across a range of different scenarios while scheduling 10,000 tasks on up to 50 heterogeneous processors.
We consider the problem of securing communication between sensor nodes in large-scale sensor networks. We propose a distributed, deterministic key management protocol designed to satisfy authentication and confidentia...
详细信息
ISBN:
(纸本)0769523129
We consider the problem of securing communication between sensor nodes in large-scale sensor networks. We propose a distributed, deterministic key management protocol designed to satisfy authentication and confidentiality, without the need of a key distribution center. Our scheme is scalable since every node only needs to hold a small number of keys independent of the network size, and it is resilient against node capture and replication due to the fact that keys are localized;keys that appear in some part of the network are not used again. Another important property of our protocol is that it is optimized for message broadcast;each node shares one pairwise key with all of its immediate neighbors, so only one transition is necessary to broadcast a message. Furthermore, our scheme is suited for data fusion and aggregation processing;if necessary, nodes can "peak" at encrypted data using their cluster key and decide upon forwarding or discarding redundant information. Finally, we describe a mechanism for evicting compromised nodes as well as adding new nodes. A security analysis is discussed and simulation experiments presented.
The growth of load balancing system raises the issue of scalability, and decentralized load balancing architecture has been proposed to address this issue. In this paper, we investigate how a load balancing architectu...
详细信息
ISBN:
(纸本)0769523129
The growth of load balancing system raises the issue of scalability, and decentralized load balancing architecture has been proposed to address this issue. In this paper, we investigate how a load balancing architecture can be built on decentralized policies based on CORBA and enhanced by predictive algorithm. The L2E predictive filtering model was used to supply workstations with robust cluster load information, which allows them to make more accurate independent allocation decisions. Experimental results showed that our decentralized load balancing approach was able to suppress thrashing and oscillations compared to other load monitoring and prediction techniques, and it was able to achieve a highly balanced system than Sun Grid Engine.
FROST (Fold Recognition-Oriented Search Tool) [6] is a software whose purpose is to assign a 3D structure to a protein sequence. It is based on a series of filters and uses a database of about 1200 known 3D structures...
详细信息
ISBN:
(纸本)0769523129
FROST (Fold Recognition-Oriented Search Tool) [6] is a software whose purpose is to assign a 3D structure to a protein sequence. It is based on a series of filters and uses a database of about 1200 known 3D structures, each one associated with empirically determined score distributions. FROST uses these distributions to normalize the score obtained when a protein sequence is aligned with a particular 3D structure. Computing these distributions is extremely time consuming;it requires solving about 1,200,000 hard combinatorial optimization problems and takes about 40 days on a 2.4 GHz computer. This paper describes how FROST has been successfully redesigned and structured in modules and independent tasks. The new package organization allows these tasks to be distributed and executed in parallel using a centralized dynamic load balancing strategy. On a cluster of 12 PCs, computing the score distributions takes now about 3 days which represents a parallelization efficiency of about 1.
Although thoroughly investigated, job scheduling for high-end parallel systems remains an inexact science, requiring significant experience and intuition from system administrators to properly configure batch schedule...
详细信息
ISBN:
(纸本)0769523129
Although thoroughly investigated, job scheduling for high-end parallel systems remains an inexact science, requiring significant experience and intuition from system administrators to properly configure batch schedulers. Production schedulers provide many parameters for their configuration, but tuning these parameters appropriately can be very difficult - their effects and interactions are often nonintuitive. In this paper, we introduce a methodology for automating the difficult process of job scheduler parameterization. Our proposed methodology is based on using past workload behavior to predict future workload, and on online simulations of a model of the actual system to provide on-the-fly suggestions to the scheduler for automated parameter adjustment. Detailed performance comparisons via simulation using actual supercomputing traces indicate that out methodology consistently outperforms other workload-aware methods for scheduler parameterization.
Achieving good scalability for large simulations based on structured adaptive mesh refinement is non-trivial. Performance is limited by the partitioner's ability to efficiently use the underlying parallel computer...
详细信息
ISBN:
(纸本)0769523129
Achieving good scalability for large simulations based on structured adaptive mesh refinement is non-trivial. Performance is limited by the partitioner's ability to efficiently use the underlying parallel computer's resources. Domainbased partitioners serve as a foundation for techniques designed to improve the scalability and they have traditionally been designed on the basis of an independence assumption regarding the computational flow among grid patches at different refinement levels. But this assumption does not hold in practice. Hence the effectiveness of these techniques is significantly impaired. This paper introduces a partitioning method designed on the true premises. The method is tested for four different applications exhibiting different behaviors. The results show that synchronization costs on average can he reduced by 75 percent. The conclusion is that the method is suitable as a foundation in general hierarchical methods designed to improve the scalability of structured adaptive mesh refinement applications.
Frequency and intensity of Internet attacks are rising with an alarming pace. Several technologies and concepts were proposed for fighting distributed denial of service (DDoS) attacks: traceback, pushback, i3, SOS and...
详细信息
ISBN:
(纸本)0769523129
Frequency and intensity of Internet attacks are rising with an alarming pace. Several technologies and concepts were proposed for fighting distributed denial of service (DDoS) attacks: traceback, pushback, i3, SOS and Mayday. This paper shows that in the case of DDoS reflector attacks they are either ineffective or even counterproductive. We then propose a novel concept and system that extends the control over network traffic by network users to the Internet using adaptive traffic processing devices. We safely delegate partial network management capabilities from network operators to network users. All network packets with a source or destination address owned by a network user can now also be controlled within the Internet instead of only at the network user's Internet uplink. By limiting the traffic control features and by restricting the realm of control to the "owner" of the traffic, we can rule out misuse of this system. Applications of our system are manifold: prevention of source address spoofing, DDoS attack mitigation, distributed firewall-like filtering, new ways of collecting traffic statistics, traceback, distributed network debugging, support for forensic analyses and many more.
The Self Distributing Virtual Machine (SDVM) is a parallel computing machine which consists of a cluster of customary computers. The participating machines may have different computing speed or actually different oper...
详细信息
ISBN:
(纸本)0769523129
The Self Distributing Virtual Machine (SDVM) is a parallel computing machine which consists of a cluster of customary computers. The participating machines may have different computing speed or actually different operating systems, and any network topology between them is supported. As the SDVM supports dynamic entry and exit at runtime, the cluster may even grow or shrink without disturbing the program flow. Among other features, the SDVM features decentralized scheduling and automatic distribution of data and program code throughout the cluster. Therefore machines need no prerequisites apart from the SDVM daemon itself to join the cluster. The concept of the SDVM is implemented for the field of computer clusters (extensible to grid computing like the internet), but it can be designed for multi-processor-systems or Systems-on-Chip (SoC) as well. This paper presents the properties and the structure of the SDVM.
暂无评论