Proposes a method for data clustering in a n-dimensional space using the elastic net algorithm which is a variant of the Kohonen topographic map learning algorithm. The elastic net algorithm is a mechanical metaphor i...
详细信息
Proposes a method for data clustering in a n-dimensional space using the elastic net algorithm which is a variant of the Kohonen topographic map learning algorithm. The elastic net algorithm is a mechanical metaphor in which an elastic ring is attracted by points in a bi-dimensional space while their internal elastic forces try to shun the elastic expansion. The different weights associated with these two kinds of forces lead the elastic to a gradual expansion in the direction of the bi-dimensional points. In this method, the elastic net algorithm is employed with the help of a heuristic framework that improves its performance for application in the n-dimensional space of cluster analysis. Tests were made with two types of data sets: (1) simulated data sets with up to 1000 points randomly generated in groups linearly separable with up to dimension 10 and (2) the Fisher Iris Plant database, a well-known database referred to in the pattern recognition literature. The advantages of the method presented are its simplicity, its fast and stable convergence, beyond efficiency in cluster analysis.
In this work we investigate how Distributed Shared Memory (DSM) architectures affect performance of or-parallel logic programming systems and how this performance approaches that of conventional C systems. Our work co...
详细信息
In this paper we propose a simple extension to the optical network of a scalable multiprocessor that optimizes page swap-outs significantly. More specifically, we propose to extend the network with an optical ring tha...
详细信息
The recent improvements in workstation and interconnection network performance have popularized the clusters of off-the-shelf workstations. However, the usefulness of these clusters is yet to be fully exploited, mostl...
详细信息
The recent improvements in workstation and interconnection network performance have popularized the clusters of off-the-shelf workstations. However, the usefulness of these clusters is yet to be fully exploited, mostly due to the inadequate management of cluster resources implemented by current distributed operating systems. In order to eliminate this problem and approach the computational power of large clusters of workstations, in this paper we propose Nomad, an efficient operating system for clusters of uni and/or multiprocessors. Nomad includes several important characteristics for modern cluster-oriented operating systems: scalability, efficient resource management across the cluster, efficient scheduling of parallel and distributed applications, distributed I/O, fault detection and recovery, protection, and backward compatibility. Some of the mechanisms used by Nomad, such as process checkpointing and migration, can be found in previously proposed systems. However, our system stands out for its strategy for disseminating information across the cluster and its efficient management of all cluster resources. In addition, Nomad is highly scalable as it uses neither centralized control nor extra messages to implement its functionality, taking advantage of the I/O traffic associated with its distributed file system. Our preliminary evaluation of the load balancing aspect of Nomad shows that the pattern of file accesses in our distributed Ale system allows for efficient and scalable load balancing. Our main conclusion is that the complete implementation of Nomad will most likely be efficient and will be a nice platform for future research on operating systems for clusters of workstations.
Carnival is a performance measurement and analysis tool that assists users in understanding the performance of DSM applications and protocols. Using traces of program executions, Carnival presents performance data as ...
详细信息
Categorizes the coherence traffic in update-based protocols and shows that, for most applications, more than 90% of all updates generated by such a protocol are unnecessary. We identify application characteristics tha...
详细信息
Categorizes the coherence traffic in update-based protocols and shows that, for most applications, more than 90% of all updates generated by such a protocol are unnecessary. We identify application characteristics that generate useless update traffic, and compare the isolated and combined effects of several software and hardware techniques for eliminating useless updates. These techniques include dynamic and static hybrid protocols, a data re-mapping strategy, and coalescing write buffers. Our simulations show that these techniques are effective for different types of useless updates. Overall, software caching (where dynamic data re-mapping is performed under programmer or compiler control) has the potential to significantly increase the percentage of useful traffic in applications. When software caching is not applicable, either the static or the dynamic protocol generates the least useless traffic. Although coalescing write buffers provide great reductions in the total number of messages transferred, these buffers do not necessarily increase the percentage of useful traffic.
In this paper we propose the use of a PCI-based programmable protocol controller for hiding communication and coherence overheads in software DSMs. Our protocol controller provides three different types of overhead to...
ISBN:
(纸本)9780897917674
In this paper we propose the use of a PCI-based programmable protocol controller for hiding communication and coherence overheads in software DSMs. Our protocol controller provides three different types of overhead tolerance: a) moving basic communication and coherence tasks away from computation processors; b) prefetching of diffs; and c) generating and applying diffs with hardware assistance. We evaluate the isolated and combined impact of these features on the performance of TreadMarks. We also compare performance against two versions of the Shrimp-based AURC protocol. Using detailed execution-driven simulations of a 16-node network of workstations, we show that the greatest performance benefits provided by our protocol controller come from our hardware-supported diffs. Reducing the burden of communication and coherence transactions on the computation processor is also beneficial but to a smaller extent. Prefetching is not always profitable. Our results show that our protocol controller can improve running time performance by up to 50% for TreadMarks, which means that it can double the TreadMarks speedups. The overlapping implementation of TreadMarks performs as well or better than AURC for 5 of our 6 applications. We conclude that the simple hardware support we propose allows for the implementation of high-performance software DSMs at low cost. Based on this conclusion, we are building the NCP2 parallel system at coppe/UFRJ.
This monograph presents a comprehensive treatment of the maximum-entropy sampling problem (MESP), which is a fascinating topic at the intersection of mathematical optimization and data science. The text situates ...
详细信息
ISBN:
(数字)9783031130786
ISBN:
(纸本)9783031130779;9783031130809
This monograph presents a comprehensive treatment of the maximum-entropy sampling problem (MESP), which is a fascinating topic at the intersection of mathematical optimization and data science. The text situates MESP in information theory, as the algorithmic problem of calculating a sub-vector of pre-specificed size from a multivariate Gaussian random vector, so as to maximize Shannon's differential entropy. The text collects and expands on state-of-the-art algorithms for MESP, and addresses its application in the field of environmental monitoring. While MESP is a central optimization problem in the theory of statistical designs (particularly in the area of spatial monitoring), this book largely focuses on the unique challenges of its algorithmic side. From the perspective of mathematical-optimization methodology, MESP is rather unique (a 0/1 nonlinear program having a nonseparable objective function), and the algorithmic techniques employed are highly non-standard. In particular, successful techniques come from several disparate areas within the field of mathematical optimization; for example: convex optimization and duality, semidefinite programming, Lagrangian relaxation, dynamic programming, approximation algorithms, 0/1 optimization (e.g., branch-and-bound), extended formulation, and many aspects of matrix theory. The book is mainly aimed at graduate students and researchers in mathematical optimization and data analytics.
This book constitutes the refereed proceedings of the 19th Brazilian Symposium on Artificial Intelligence, SBIA 2008, held in Salvador, Brazil, in October 2008. The 27 revised full papers presented together with 3 inv...
详细信息
ISBN:
(数字)9783540881902
ISBN:
(纸本)9783540881896
This book constitutes the refereed proceedings of the 19th Brazilian Symposium on Artificial Intelligence, SBIA 2008, held in Salvador, Brazil, in October 2008.
The 27 revised full papers presented together with 3 invited lectures and 3 tutorials were carefully reviewed and selected from 142 submissions. The papers are organized in topical sections on computer vision and pattern recognition, distributed AI: autonomous agents, multi-agent systems and game knowledge representation and reasoning, machine learning and data mining, natural language processing, and robotics.
暂无评论