Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one v...
详细信息
Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s ∈ S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. Our algorithm uses an active selection strategy to choose a small set of points that we call landmarks, and considers only the distances between landmarks and other points to produce a clustering. We use our procedure to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire data set. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.
In this paper we consider the problem of clustering snippets returned from search engines. We propose a technique to invoke semantic similarity in the clustering process. Our technique improves on the well-known STC m...
详细信息
In this paper we consider the problem of clustering snippets returned from search engines. We propose a technique to invoke semantic similarity in the clustering process. Our technique improves on the well-known STC method, which is a highly efficient heuristic for clustering web search results. However, a weakness of STC is that it cannot cluster semantic similar documents. To solve this problem, we propose a new data structure to represent suffixes of a single string, called a Semantic Suffix Net (SSN). A generalized semantic suffix net is created to represent suffixes of a set of strings by using a new operator to partially combine nets. A key feature of this new operator is to find a joint point by using semantic similarity and string matching; net pairs combination then begins at that joint point. This logic causes the number of nodes and branches of a generalized semantic suffix net to decrease. The operator then uses the line of suffix links as a boundary to separate the net. A generalized semantic suffix net is then incorporated into the STC algorithm so that it can cluster semantically similar snippets. Experimental results show that the proposed algorithm improves upon conventional STC.
in the paper, we analyze and study the status character in detail which a rotor of some rotating mechanism exist in some different fault types and different fault strengths. The research result shows that a fault roto...
详细信息
The rate of human death and morbidity due to malaria is increasing in many parts of the developing countries. Thus, there is a great need to understand the critical pathways in malaria parasite in order to develop eff...
详细信息
We report in-situ Fourier transform infrared (FTIR) reflection-absorption studies of curing chemistry of polyimide thin films on Crand Cu surfaces, and of the thermal stability of the resulting thin film interfaces wh...
We report in-situ Fourier transform infrared (FTIR) reflection-absorption studies of curing chemistry of polyimide thin films on Crand Cu surfaces, and of the thermal stability of the resulting thin film interfaces when exposed to air at elevated temperatures. The polyimide investigated is based on 4,4'-(hexafluoroisopropylidene)bis(phthalicanhydride)-4,4'-bis(4-aminophenoxy)-biphenyl. The imidization process takes place at temperatures higher than 90°C anda small amount of anhydride is generated during curing. This by-product is converted to imide at temperatures above 250°C. Complete imidization is achieved after curing at 300°C on Cr substrates, while evidence for incomplete curing on Cu is observed under the same conditions. Thermal stability studies with Cr and Cu substrates show that thermal decomposition of thin (∼1000Å) polyimide filmsoccurs on Cu when the film is exposed to air at 200°C, while the polyimide is stable on Cr.
The 19th robotics program at the annual AAAI conference was held in Atlanta, Georgia, in July 2010. In this article we give a summary of three components of the exhibition: the Small-Scale Manipulation Challenge: Robo...
详细信息
The success of an e-Learning system depends on several factors: a supportive infrastructure, high-quality content, effective format, and high availability to satisfy ongoing user needs. In this paper, we perform explo...
详细信息
The success of an e-Learning system depends on several factors: a supportive infrastructure, high-quality content, effective format, and high availability to satisfy ongoing user needs. In this paper, we perform exploratory data analysis and data mining on an e-Learning web log, which spans one academic year. The study uncovers the e-Learning users’ usage behavior in accessing the content. The study discovers e-Learning media popularity and usage patterns, and helps the institution fine tune future courseware, from strategic changes to the fine-grain of lesson content improvement.
In the current era of technology,the Internet and web technologies become the center source of *** to the huge amount of contents,one of the main challenges of modern information technology is aimed at how to reduce a...
详细信息
In the current era of technology,the Internet and web technologies become the center source of *** to the huge amount of contents,one of the main challenges of modern information technology is aimed at how to reduce and manage information in a structured way with mobilizing users to the similar kind of relevant ***,any intelligent system should be able to understand people's interest about a particular type of information and automatically mobilize him to the similar kind of available information *** idea of high level Activity Streams along with its standardized format can play a vital role to solve this problem in the broader *** paper introduces a novel system called CoASGen (Consolidation and Activity Streams Generator) which is able to automatically generate high level Activity Streams after aggregating and consolidating from different independent systems (*** a software company context:version management system,wikis,bug trackers etc.).It retrieves life time information as heterogeneous web feed by sensing user activities from those independent systems and then it transforms several similar types of atomic activities into high level Activity Streams using semantic technologies along with its specific standardized ***,it shows these high level Activity Streams to the user interface which is able to automatically motivate users to find relevant information easily without either missing any data or losing valuable *** system solves the problem "data silos" by reducing and managing information in a structured way.
Organic Thin Film Transistors (OTFTs) are promising devices for future development of variety of low-cost and large-area electronics applications such as flexible displays. This paper analyzes the performance of OTFT ...
详细信息
Query-based diagnostics (Agosta, Gardos, & Druzdzel, 2008) offers passive, incremental construction of diagnostic models that rest on the interaction between a diagnostician and a computer-based diagnostic system....
详细信息
Query-based diagnostics (Agosta, Gardos, & Druzdzel, 2008) offers passive, incremental construction of diagnostic models that rest on the interaction between a diagnostician and a computer-based diagnostic system. Effectively, this approach minimizes knowledge engineering, the main bottleneck in practical application of Bayesian networks. While this idea is appealing, it has undergone only limited testing in practice. We describe a series of experiments that subject a prototype implementing passive, incremental model construction to a rigorous practical test. We show that the prototype's diagnostic accuracy reaches reasonable levels after merely tens of cases and continues to increase with the number of cases, comparing favorably to state of the art approaches based on learning.
暂无评论