The next generation of the Web, called Semantic Web, has to improve the Web with semantic page annotations to enable knowledge-level querying and searches. However, manual construction of these ontologies is a time co...
详细信息
ISBN:
(纸本)9781424430536;9780769531519
The next generation of the Web, called Semantic Web, has to improve the Web with semantic page annotations to enable knowledge-level querying and searches. However, manual construction of these ontologies is a time consuming and difficult task. In this paper, we describe an automatic extraction method that learns domain ontologies for semantic web from deep web. Our approach first learns a base ontology from deep web query interfaces, then grows the current ontology by probing the sources and discovering additional concepts and instances from the result pages. We have evaluated our approach in several real-world domains. Preliminary results indicate that the proposed extraction method discovers concepts and instances with high accuracy.
This paper proposes a unified dynamic relation tree (DRT) span for tree kernel-based semantic relation extraction between entity names. The basic idea is to apply a variety of linguistics-driven rules to dynamically p...
详细信息
This paper proposes a unified dynamic relation tree (DRT) span for tree kernel-based semantic relation extraction between entity names. The basic idea is to apply a variety of linguistics-driven rules to dynamically prune out noisy information from a syntactic parse tree and include necessary contextual information. In addition, different kinds of entity-related semantic information are unified into the syntactic parse tree. Evaluation on the ACE RDC 2004 corpus shows that the unified DRT span outperforms other widely-used tree spans, and our system achieves comparable performance with the state-of-the-art kernel-based ones. This indicates that our method can not only well model the structured syntactic information but also effectively capture entity-related semantic information.
With the fast development of World Wide Web, the quantity of web information is increasing in an unprecedented pace, a great many of which are generated dynamically from background databases, and can't be indexed ...
详细信息
With the fast development of World Wide Web, the quantity of web information is increasing in an unprecedented pace, a great many of which are generated dynamically from background databases, and can't be indexed by traditional search engine, so we call them Deep Web. For the heterogeneous and dynamic features of Deep Web sources, classifying the Deep Web source by domain effectively is a significant precondition of Deep Web sources integration. In this paper, we consider the visible features of Deep Web and Maximum Entropy approach, and then on the basis of binary classification, we propose a new multivariate classification approach based on Maximum Entropy towards Deep Web sources. In addition, we propose a Feedback algorithm to improve the accuracy of classification. An experimental evaluation over real Web data shows that, our approach could provide an effective and general solution to the multivariate classification of Deep Web sources.
Chaos and artificial neural networks have been providing a new rout for investigating the complicated nonlinear time series. As the traditional neural networks are easy to get slow convergence and produce large redund...
详细信息
With the startup of the Golden Agriculture Project, the step of being-information of agriculture is becoming rapid. And the transformation and share of data is indispensable to the being-information of agriculture. Ho...
详细信息
With the startup of the Golden Agriculture Project, the step of being-information of agriculture is becoming rapid. And the transformation and share of data is indispensable to the being-information of agriculture. How to implement data combination, data transformation and data receiving applications are the important means to complete the information share safely and enhance the efficiency. The paper starts with searching of methods to implement data interchange, and introduce some of the methods, points of the techniques, etc. Basing on this, the paper also introduces the detail requirement analyses, system design and detail implementation of the system. According to the requirement and trait of the project, a data interchange system is researched and completed. And a data interchange model based on message-oriented middleware (MOM) is presented in this paper, which builds a middleware between the province and the ministry taking part in data interchange. The system has traits as follows: 1. keeping the data safe and credible while it is transformed. 2. having excellent transplantable and applied capability. 3. doesn't need intervention of workman in the process of data interchange. 4. applying the data interchange between databases of different structure. 5. being simple to be developed and applied. MOM TongLink/Q offers interfaces for application development, and it completes the data transformation through the internet. The integration adapters developed do the data management, which are developed based on the Frame for Applications Integration TongIntegrator. This method offers a new approach to resolve the question of data interchange. Now the system has been successfully applied in the data interchange project of Ministry of Agriculture.
The essential characteristic of DNA computation is its massive parallelism in obtaining and managing information. With the develop- ment of molecular biology technique, the eld of DNA computation has made a great prog...
详细信息
The essential characteristic of DNA computation is its massive parallelism in obtaining and managing information. With the develop- ment of molecular biology technique, the eld of DNA computation has made a great progress. By using an advanced biochip technique, laboratory-on-a-chip, a new DNA computing model is presented in the paper to solve a simple timetabling problem, which is a special ver- sion of the optimization problems. It also plays an important role in education and other industries. With a simulated biological experiment, the result suggested that DNA computation with lab-on-a-chip has the potential to solve a real complex timetabling problem.
This paper proposed an adaptive shadows detection algorithm based on Gaussian Mixture Model to improve the performance of video object segmentation. This method takes advantage of luminance weight to model the backgro...
详细信息
This paper proposed an adaptive shadows detection algorithm based on Gaussian Mixture Model to improve the performance of video object segmentation. This method takes advantage of luminance weight to model the background of the image and obtains a primary segmentation in CIE Luv color space. In this way, it improves the real-time ability of detection. It also becomes more efficient, comparing with the existing shadow detection algorithms which often need to set the threshold manually or get them through a training process. By using the Gaussian distribution, it is able to realize an adaptive shadow detection. At same time, the authors deal with the noise or the aim points uneven distribution by using horizontal filling and vertical filling. It improves the accuracy of segmentation. The experimental results have shown that this method achieves adaptive shadows detection and has strong robustness, high segmentation accuracy.
System modeling is a complex nonlinear procedure. Traditional methods are easy to get slow convergence and low efficiency. The nonlinear forecast modeling based on wavelet analysis is proposed and used to forecast the...
详细信息
The K-Nearest Neighbor (KNN) algorithm for text categorization is applied to CET4 essays. In this paper, each essay is represented by the vector space model (VSM). After removing the stop words, we chose the words, ph...
详细信息
ISBN:
(纸本)9781424439027
The K-Nearest Neighbor (KNN) algorithm for text categorization is applied to CET4 essays. In this paper, each essay is represented by the vector space model (VSM). After removing the stop words, we chose the words, phrases and arguments as features of the essays, and the value of each vector is expressed by the term frequency and inversed document frequency (TF-IDF) weight. The TF and information fain (IG) methods are used to select features by predetermined thresholds. We calculated the similarity of essays with cosine in the KNN algorithm. Experiments on CET4 essays in the Chinese Learner English Corpus (CLEC) show accuracy above 76% is achieved.
Decision tree is a good model of Classification. Recently, there has been much interest in mining streaming data. Because streaming data is large and no limited, it is unpractical that passing the entire data over mor...
详细信息
Decision tree is a good model of Classification. Recently, there has been much interest in mining streaming data. Because streaming data is large and no limited, it is unpractical that passing the entire data over more than one time. A one pass online algorithm is necessary. One of the most successful algorithms for mining data streams is VFDT(Very Fast Decision Tree).we extend the VFDT system to EVFDT(Efficient-VFDT) in two directions: (1)We present Uneven Interval Numerical Pruning (shortly UINP) approach for efficiently processing numerical attributes. (2)We use naive Bayes classifiers associated with the node to process the samples to detect the outlying samples and reduce the scale of the trees. From the experimental comparison, the two techniques significantly improve the efficiency and the accuracy of decision tree construction on streaming data.
暂无评论