It has been an important task of discovering frequent fragments as particular patterns from large sequence databases generated from a variety of applications. In general, the patterns to be discovered may partially an...
详细信息
It has been an important task of discovering frequent fragments as particular patterns from large sequence databases generated from a variety of applications. In general, the patterns to be discovered may partially and asynchronously exist in sequences, and even contain gaps. In addition, it is necessary to collect the information regarding the locations and frequencies of the patterns. How to enumerate candidate patterns for evaluation without exponentially increasing the computation is another problem. In this paper, the modified periodicity transform is proposed to meet the requirements mentioned above. Also, a distributed computing framework is implemented to perform the mining task more efficiently. Both synthetic and biological sequences are utilized to examine the approach. The experimental results demonstrate the efficiency and effectiveness the system.
There are many legacy code applications that cannot be run in grid environment without significant modifications. To avoid reengineering of legacy code, we developed the grid execution management for legacy code archi...
详细信息
There are many legacy code applications that cannot be run in grid environment without significant modifications. To avoid reengineering of legacy code, we developed the grid execution management for legacy code architecture (GEMLCA) that enables deployment of legacy code applications as grid services. GEMLCA is an OGSI grid service layer that supports submitting jobs, getting their results and status back. Security requirements are essential to any grid application to preserve the confidentiality and integrity of data. To meet these requirements the GT3 security model was implemented in GEMLCA. The paper introduces GEMLCA and how grid security infrastructure (GSI) components have been added to GEMLCA in order to enable secure execution of jobs in grid. The paper also presents how a legacy code traffic simulator was transformed into a grid service using GEMLCA and gives some simulation results.
In this paper, a framework for replacing missing values in a database is proposed since a real-world database is seldom complete. Good data quality in a database can directly improve the performance of any data mining...
详细信息
Internet computing and grid technologies promise to change the way we tackle complex problems. They will enable large-scale aggregation and sharing of computational, data and other resources across institutional bound...
详细信息
ISBN:
(纸本)0769522491
Internet computing and grid technologies promise to change the way we tackle complex problems. They will enable large-scale aggregation and sharing of computational, data and other resources across institutional boundaries. As grid computing is becoming a reality, there is a need for managing and monitoring the available resources worldwide, as well as the need for conveying these resources to the everyday user. This paper describes a resource broker with its main function as to match the available resources to the user's needs. The use of the resource broker provides a uniform interface to access any of the available and appropriate resources using user's credentials. The resource broker runs on top of the Globus toolkit. Therefore, it provides security and current information about the available resources and serves as a link to the diverse systems available in the grid.
Jobs submitted into a cluster have varying requirements depending on user-specific needs and expectations. Therefore, in utility-driven cluster computing, cluster resource management systems (RMSs) need to be aware of...
详细信息
Jobs submitted into a cluster have varying requirements depending on user-specific needs and expectations. Therefore, in utility-driven cluster computing, cluster resource management systems (RMSs) need to be aware of these requirements in order to allocate resources effectively. Service level agreements (SLAs) can be used to differentiate different value of jobs as they define service conditions that the cluster RMS agrees to provide for each different job. The SLA acts as a contract between a user and the cluster whereby the user is entitled to compensation whenever the cluster RMS fails to deliver the required service. In this paper, we present a proportional share allocation technique called LibraSLA that takes into account the utility of accepting new jobs into the cluster based on their SLA. We study how LibraSLA performs with respect to several SLA requirements that include: (i) deadline type whether the job can be delayed, (ii) deadline when the job needs to be finished, (iii) budget to be spent for finishing the job, and (iv) penalty rate for compensating the user for failure to meet the deadline
We present an algorithm for scheduling distributed data intensive bag-of-task applications on data grids that have costs associated with requesting, transferring and processing datasets. We evaluate the algorithm on a...
详细信息
ISBN:
(纸本)9780780390379
We present an algorithm for scheduling distributed data intensive bag-of-task applications on data grids that have costs associated with requesting, transferring and processing datasets. We evaluate the algorithm on a data grid testbed and present the results.
Parameter-sweep has been widely adopted in large numbers of scientific applications. Parameter-sweep features need to be incorporated into grid workflows so as to increase the scale and scope of such applications. New...
详细信息
Parameter-sweep has been widely adopted in large numbers of scientific applications. Parameter-sweep features need to be incorporated into grid workflows so as to increase the scale and scope of such applications. New scheduling mechanisms and algorithms are required to provide optimized policy for resource allocation and task arrangement in such a case. This paper addresses scheduling sequential parameter-sweep tasks in a fine-grained manner. The optimization is produced by pipelining the subtasks and dispatching each of them onto well-selected resources. Two types of scheduling algorithms are discussed and customized to adapt the characteristics of parameter-sweep, as well as their effectiveness has been compared under multifarious scenarios.
In this paper, a parallel loop self-scheduling scheme for heterogeneous PC cluster systems is proposed. Though the proposed scheme does allow users to choose parameters before the execution initialization phase, there...
详细信息
ISBN:
(纸本)0769522491
In this paper, a parallel loop self-scheduling scheme for heterogeneous PC cluster systems is proposed. Though the proposed scheme does allow users to choose parameters before the execution initialization phase, there are still weaknesses that motivate us to go further with new improvements in that scheme. For instance, a decision on a fixed and monotonous parameter can easily lead to invalid schedule by using previous input information. Thus, it is proposed in this paper a new scheme, where the scheduling parameter can be adjusted dynamically and fit into most widely available computersystems, in order to provide higher overall performance.
The computing power provided by high performance and low cost PC-based clusters and grid computing platforms are attractive and they are equal or superior to supercomputers and mainframes. In parallel, discussions on ...
详细信息
ISBN:
(纸本)0780389328
The computing power provided by high performance and low cost PC-based clusters and grid computing platforms are attractive and they are equal or superior to supercomputers and mainframes. In parallel, discussions on how to obtain more computing power from these computing platforms become an interesting issue. The development of applications for these high-performance computing platforms is complicated for several reasons: the complexity of applications themselves, which combines aspects of super computing and distributed computing, and by the need to achieve higher performance. This paper describes the design rationale and implementation of a parallel programming Web-based toolkit, to ease the parallel programming learning process, with the use of Web-based interface. The toolkit has widely been used in MPI parallel programming courses (both in graduate and undergraduate levels) and industry trainings.
The low-cost and availability of network of workstations have made them attractive solution for high performance computing. Striking progress of network technology in enabling high-performance global computing, with t...
详细信息
The low-cost and availability of network of workstations have made them attractive solution for high performance computing. Striking progress of network technology in enabling high-performance global computing, with the utilization of cluster and grid technologies, in which computational and data resources in a local or wide area network are transparently employed to solve large-scale problems. In this paper, we present implementation and design rationale of Visuel toolkit for performance measurement and analysis of parallel applications, in cluster and grid environments. Most of performance visualization tools available today for high-performance platforms show solely system performance data (e.g., CPU load, memory usage, network bandwidth, server average load), and thus, being suitable for visualisation of computing platform system activities. The Visuel toolkit, is Web-based interface designed to show performance activities of all computing nodes of a cluster or grid computing platform involved in the execution of a parallel application, such as CPU load level and memory usage of each computing node. In addition, this toolkit is able to display comparative performance data visualizations generated from a number of executions of an application under investigation, analyzing the performance of different implementations. Evaluations using this toolkit show that it outperforms in easing the process of investigation and implementation of parallel applications, in effective way.
暂无评论