Classification is an essential task in data mining, machine learning and pattern recognition *** classification models focus on distinctive samples from different categories. There are fine-grained differences between...
详细信息
Classification is an essential task in data mining, machine learning and pattern recognition *** classification models focus on distinctive samples from different categories. There are fine-grained differences between data instances within a particular category. These differences form the preference information that is essential for human learning, and, in our view, could also be helpful for classification models. In this paper, we propose a preference-enhanced support vector machine(PSVM), that incorporates preference-pair data as a specific type of supplementary information into SVM. Additionally, we propose a two-layer heuristic sampling method to obtain effective preference-pairs, and an extended sequential minimal optimization(SMO)algorithm to fit PSVM. To evaluate our model, we use the task of knowledge base acceleration-cumulative citation recommendation(KBA-CCR) on the TREC-KBA-2012 dataset and seven other datasets from UCI,Stat Lib and ***. The experimental results show that our proposed PSVM exhibits highperformance with official evaluation metrics.
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed ***, MPI implementations can contain defects that impact the reliability and performance of parallelapplications....
详细信息
The Message Passing Interface (MPI) is a widely accepted standard for parallel computing on distributed ***, MPI implementations can contain defects that impact the reliability and performance of parallelapplications. Detecting and correcting these defects is crucial, yet there is a lack of published models specificallydesigned for correctingMPI defects. To address this, we propose a model for detecting and correcting MPI defects(DC_MPI), which aims to detect and correct defects in various types of MPI communication, including blockingpoint-to-point (BPTP), nonblocking point-to-point (NBPTP), and collective communication (CC). The defectsaddressed by the DC_MPI model include illegal MPI calls, deadlocks (DL), race conditions (RC), and messagemismatches (MM). To assess the effectiveness of the DC_MPI model, we performed experiments on a datasetconsisting of 40 MPI codes. The results indicate that the model achieved a detection rate of 37 out of 40 codes,resulting in an overall detection accuracy of 92.5%. Additionally, the execution duration of the DC_MPI modelranged from 0.81 to 1.36 s. These findings show that the DC_MPI model is useful in detecting and correctingdefects in MPI implementations, thereby enhancing the reliability and performance of parallel applications. TheDC_MPImodel fills an important research gap and provides a valuable tool for improving the quality ofMPI-basedparallel computing systems.
Apache Storm is a distributed processing engine that can reliably process unbounded streams of data for real-time applications. While recent research activities mostly focused on devising a resource allocation and tas...
详细信息
Workflow techniques are emerging as an important approach for the specification and management of complex processing tasks. This approach is especially powerful for utilising distributed data and processing resources ...
详细信息
ISBN:
(纸本)0769509819
Workflow techniques are emerging as an important approach for the specification and management of complex processing tasks. This approach is especially powerful for utilising distributed data and processing resources in widely-distributed heterogeneous systems. We describe our DISCWorld distributed workflow environment for composing complex processing chains, which are specified as a directed acyclic graph of operators. Users of our system can formulate processing chains using either graphical or scripting tools. We have deployed our system for image processing applications and decision support systems. We describe the technologies we have developed to enable the execution of these processing chains across wide-area computing systems. In particular, we present our distributed Job Placement Language (based on XML) and various Java interface approaches we have developed for implementing the workflow metaphor. We outline a number of key issues for implementing a high-performance, reliable, distributed workflow management system.
Problem solving environments are an attractive approach to the integration of calculation and management tools for various scientific and engineering applications. These applications often require highperformance com...
详细信息
ISBN:
(纸本)0769509819
Problem solving environments are an attractive approach to the integration of calculation and management tools for various scientific and engineering applications. These applications often require highperformancecomputing components in order to be computationally feasible. It is therefore a challenge to construct integration technology, suitable for problem solving environments, that allows both flexibility as well as the embedding of parallel and highperformancecomputing systems. Our DISCWorld system is designed to meet these needs and provides a Java-based middleware to integrate component applications across wide-area networks. Key features of our design are the abilities to: access remotely stored data; compose complex processing requests either graphically or through a scripting language; execute components on heterogeneous and remote platforms; reconfigure task subgraphs to run across multiple servers. Operators in task graphs can be slow (but portable) pure Java implementations or wrappers to fast (platform specific) supercomputer implementations.
A typical scenario in a stream data-flow processing engine is that users submit continues queries in order to receive the computational result once a new stream of data arrives. The focus of the paper is to design a d...
详细信息
ISBN:
(纸本)9781728126166
A typical scenario in a stream data-flow processing engine is that users submit continues queries in order to receive the computational result once a new stream of data arrives. The focus of the paper is to design a dynamic CPU cap controller for stream data-flow applications with real-time constraints, in which the result of computations must be available within a short time period, specified by the user, once a recent update in the input data occurs. It is common that the stream data-flow processing engine is deployed over a cluster of dedicated or virtualized server nodes, e.g., Cloud or Edge platform, to achieve a faster data processing. However, the attributes of incoming stream data-flow might fluctuate in an irregular way. To effectively cope with such unpredictable conditions, the underlying resource manager needs to be equipped with a dynamic resource provisioning mechanism to ensure the real-time requirements of different applications. The proposed solution uses control theory principals to achieve a good utilization of computing resources and a reduced average response time. The proposed algorithm dynamically adjusts the required quality of service (QoS) in an environment when multiple stream & data-flow processing applications concurrently run with unknown and volatile workloads. Our study confirms that such a unpredictable demand can negatively degrade the system performance, mainly due to adverse interference in the utilization of shared resources. Unlike prior research studies which assumes a static or zero correlation among the performance variability among consolidated applications, we presume the prevalence of shared-resource interference among collocated applications as a key performance-limiting parameter and confront it in scenarios where several applications have different QoS requirements with unpredictable workload demands. We design a low-overhead controller to achieve two natural optimization objectives of minimizing QoS violation amount an
The concept of stream data processing is becoming challenging in most business sectors where try to improve their operational efficiency by deriving valuable information from unstructured, yet, contentiously generated...
详细信息
ISBN:
(纸本)9783030692438
The concept of stream data processing is becoming challenging in most business sectors where try to improve their operational efficiency by deriving valuable information from unstructured, yet, contentiously generated high volume raw data in an expected time spans. A modern streamlined data processing platform is required to execute analytical pipelines over a continues flow of data-items that might arrive in a high rate. In most cases, the platform is also expected to dynamically adapt to dynamic characteristics of the incoming traffic rates and the ever-changing condition of underlying computational resources while fulfill the tight latency constraints imposed by the end-users. Apache Storm has emerged as an important open source technology for performing stream processing with very tight latency constraints over a cluster of computing nodes. To increase the overall resource utilization, however, the service provider might be tempted to use a consolidation strategy to pack as many applications as possible in a (cloud-centric) cluster with limited number of working nodes. However, collocated applications can negatively compete with each other, for obtaining the resource capacity in a shared platform that, in turn, the result may lead to a severe performance degradation among all running applications. The main objective of this work is to develop an elastic solution in a modern stream processing ecosystem, for addressing the shared resource contention problem among collocated applications. We propose a mechanism, based on design principles of Model Predictive Control theory, for coping with the extreme conditions in which the collocated analytical applications have different quality of service (QoS) levels while the shared-resource interference is considered as a key performance limiting parameter. Experimental results confirm that the proposed controller can successfully enhance the p -99 latency of high priority applications by 67%, compared to the default round r
In the present study, THUBioGrid, an experimental distributedcomputing application for bioinformatics (BioGrid) is proposed. THUBioGrid incorporates directory services (data and software), grid computing methods (sec...
详细信息
暂无评论