Online social media has become one of the most important ways people communicate, while how to find valuable information from huge amounts of data becomes a key problem. We present a novel topic extraction method that...
详细信息
Online social media has become one of the most important ways people communicate, while how to find valuable information from huge amounts of data becomes a key problem. We present a novel topic extraction method that employs topic value of each words and social model attributes as additional features based on the multi-document summarization. The experimental results show that the multi-document summarization with the topic and the sociality are helpful to extract topics from social media.
As one of the challenging issues in the field of Natural languageprocessing (NLP), metaphor has aroused substantial attention among researchers in recent years. Many models and methods have been proposed for proper u...
详细信息
As one of the challenging issues in the field of Natural languageprocessing (NLP), metaphor has aroused substantial attention among researchers in recent years. Many models and methods have been proposed for proper understanding of metaphors. But the automatic identification of metaphor is less touched. This paper presents a tentative study on the metaphor identification based on rules, and the results on a small scale corpus are provided.
We present a forced decoding approach for the tuning process in statistical machine translation. Unlike the traditional discriminative approaches, the forced decoding system can take advantage of the reference of deve...
详细信息
In the past few years, much attention has been paid on extending phrase-based statistical machine translation with syntactic structures. In this paper we introduce a novel syntax encapsulated phrase(SEP) model, in whi...
详细信息
In the past few years, much attention has been paid on extending phrase-based statistical machine translation with syntactic structures. In this paper we introduce a novel syntax encapsulated phrase(SEP) model, in which treebank tag sequences are employed to decorate the bilingual phrase pairs. We use tag sequences, instead of phrase pairs, to train the lexicalized reordering model. Since the number of treebank tags is much smaller than the number of words, the tag sequence based reordering model is smaller and more accurate than the phrase based reordering model. Experiments were carried out on four types of models: the phrase model, the hierarchical phrase model, the POS tag encapsulated phrase(PTEP) model and the syntactic tag encapsulated phrase(STEP) model. The STEP model obtained higher BLEU-4 score than other models on NIST 2005 MT task.
In this paper, an integrated algorithm to detect humans in thermal imagery was introduced. In recent years, histogram of oriented gradient (HOG) is a quite popular algorithm for person detection in visible imagery. We...
详细信息
In this paper, an integrated algorithm to detect humans in thermal imagery was introduced. In recent years, histogram of oriented gradient (HOG) is a quite popular algorithm for person detection in visible imagery. We implement the pedestrian detection in infrared imagery with this algorithm by adjusting the parameters. Simultaneously, we have increased some other geometric characteristics, such as mean contrast, which is used as features for the detection. After analyzing the property of the infrared imagery, which is designed to meet the shortfall of the HOG in infrared imagery, the combined vectors are fed to a linear SVM for object/non-object classification and we get the detector at the same time. After that, the detection window is scanned across the image at multiple positions and scales, which is followed by the combination of the overlapping detections. At last, a pedestrian is described by a final detection, and we have detected the pedestrians in the thermal imagery. Experimental results with OSU Thermal Pedestrian Database are reported to demonstrate the excellent performance of our algorithms.
Event recognition and temporal information analysis are important subtasks in information extraction (IE). In this paper, event recognition based on time series characteristics is proposed. In the pipeline of event re...
详细信息
The optimization of search results has always been the research hotspot in the area of search engine. More concretely, topic partition by clustering proved to be a good way. However, the clusters, some of which still ...
详细信息
Namesake is a very common phenomenon both in real world and in the Internet. This paper combines the problem of name disambiguation with clustering technique and attempts to achieve the purpose of person name disambig...
详细信息
The Application of word sense disambiguation (WSD) methods based on supervised machine learning are limited by the difficulties in defining sense tags and acquiring labeled data for training. In this paper, the two pr...
详细信息
A novel combination forecasting model is presented in this paper, which combines single ones based on machine learning. The model has been applied to the prediction of five cities' election in Taiwan with combinin...
详细信息
暂无评论