An improved maximum entropy language model (IMELM) is presented based on three respects of language modeling (LM) improvement: the solution of long dependences, the integration of languageknowledge into LM, the gener...
详细信息
This paper presents Ellogon, a multi-lingual, cross-platform, general-purpose text engineering environment. Ellogon was designed in order to aid both researchers in naturallanguageprocessing, as well as companies th...
详细信息
This paper presents Ellogon, a multi-lingual, cross-platform, general-purpose text engineering environment. Ellogon was designed in order to aid both researchers in naturallanguageprocessing, as well as companies that produce languageengineering systems for the end-user. Ellogon provides a powerful TIPSTER-based infrastructure for managing, storing and exchanging textual data, embedding and managing text processing components as well as visualising textual data and their associated linguistic information. Among its key features are full Unicode support, an extensive multi-lingual graphical user interface, its modular architecture and the reduced hardware requirements.
The proceedings contains 70 papers. Topics discussed include machine learning, data mining and knowledge discovery, constraint satisfaction, intelligent information retrieval, planning and scheduling, intelligent real...
详细信息
The proceedings contains 70 papers. Topics discussed include machine learning, data mining and knowledge discovery, constraint satisfaction, intelligent information retrieval, planning and scheduling, intelligent real time systems, logic and reasoning, naturallanguageprocessing, multimedia and image processing, internet software, multi-agents, neural networks and applications and software engineering and knowledge sharing.
The main goal of this work is to describe a new model for a large vocabulary continuous speech recognition system using a phonetic-phonological approach. This work proposes a statistical phonetic structure, applied at...
详细信息
ISIS (Intelligent Speech for Information Systems) is a trilingual spoken dialog system in the stocks domain. It supports the three languages commonly used in Hong Kong (Cantonese, Putonghua and English), and serves as...
详细信息
ISIS (Intelligent Speech for Information Systems) is a trilingual spoken dialog system in the stocks domain. It supports the three languages commonly used in Hong Kong (Cantonese, Putonghua and English), and serves as a test-bed for our research in various speech and language technologies. ISIS also features combined interaction and delegation dialogs, and automatic assimilation of newly listed stock names into the system' s knowledge base. This paper focuses on the architecture and multi-modality of ISIS. We use the CORBA middleware to implement a distributed system that is interoperable across platforms. We also describe the incorporation of KQML (knowledge Query and Manipulation language) software agents in ISIS to handle delegation dialogs. The latest enhancement supports multi-modal and mixed-modal input which suit the natural affordances of certain interactions in order to improve usability. Input modalities include speaking, typing or mouse-clicking. Output media include synthesized speech, text, tables and graphics.
This review of medical imaging informaties is a survey of current developments in an exciting field. The focus is on informatics issues rather than traditional data processing and information systems, such as picture ...
详细信息
This review of medical imaging informaties is a survey of current developments in an exciting field. The focus is on informatics issues rather than traditional data processing and information systems, such as picture archiving and communications systems (PACS) and image processing and analysis systems. In this review, we address imaging informatics issues within the requirements of an informatics system defined by the American Medical Informatics Association. With these requirements as a framework, we review, in four sections: (1) Methods to present imaging and associated data without causing an overload, including image study summarization, content-based medical image retrieval, and naturallanguageprocessing of text data. (2) Data modeling techniques to represent clinical data with focus on an image data model, including general-purpose time-based multimedia data models, healthcare-specific data models, knowledge models, and problem-centric data models. (3) Methods to integrate medical data information from heterogeneous clinical data sources. Advances in centralized databases and mediated architectures are reviewed along with a discussion on our efforts at data integration based on peer-to-peer networking and shared file systems. (4) Visualization schemas to present imaging and clinical data: the large volume of medical data presents a daunting challenge for an efficient visualization paradigm. In this section we review current multimedia visualization methods including temporal modeling, problem-specific data organization, including our problem-centric, context and user-specific visualization interface.
Microfilm documents contain a wealth of information, but extracting and organizing this information by hand is slow, error-prone, and tedious. As an initial step toward automating access to this information, we descri...
详细信息
ISBN:
(纸本)1581135947
Microfilm documents contain a wealth of information, but extracting and organizing this information by hand is slow, error-prone, and tedious. As an initial step toward automating access to this information, we describe in this paper an algorithmic process to automatically identify record patterns found in microfilm tables for pre-specified application domains. Our table-processing algorithm accepts an XML input file describing the individual cells of a table taken from a microfilm document, and finds for each record in the document the cells that together comprise the record. Two key features drive the algorithm: (1) geometric layout and (2) label matching with respect to a given domain-specific application ontology. The algorithm achieved an accuracy of 92% on our test corpus of genealogical microfilm tables.
An improved maximum entropy language model (IMELM) is presented based on three respects of language modeling (LM) improvement: the solution of long dependences, the integration of languageknowledge into LM, and the g...
详细信息
ISBN:
(纸本)0780374886
An improved maximum entropy language model (IMELM) is presented based on three respects of language modeling (LM) improvement: the solution of long dependences, the integration of languageknowledge into LM, and the general framework that combines all kinds of languageknowledge. The proposed model combines trigram with base phrase structure knowledge in this paper. Trigram is used to capture the local relation between words, while base phrase structure knowledge is considered to represent the long-distance relations between syntactical structures. The knowledge of syntax, semantics and word is integrated in the maximum entropy framework. The experimental results show that the proposed model has a 24% improvement in perplexity over the conventional trigram model.
In software engineering a system requirements document written in a naturallanguage (NL) needs to be translated into one of the formal specification languages for system execution. When this translation is to be auto...
In software engineering a system requirements document written in a naturallanguage (NL) needs to be translated into one of the formal specification languages for system execution. When this translation is to be automated, resolution of the ambiguity in the document and explicit definition of implicit domain knowledge are necessary. In our approach, Contextual naturallanguageprocessing is used to overcome the ambiguity and the domain knowledge is expressed in DARPA Agent Markup language (DAML). The result is a formal representation of the informal requirements in NL for prototyping and even for implementation.
We introduce a methodology for automating the maintenance and growth of domain-specific concept taxonomies and grammatical class hierarchies simultaneously, based on knowledge capture from naturallanguage texts. The ...
详细信息
ISBN:
(纸本)1581133804
We introduce a methodology for automating the maintenance and growth of domain-specific concept taxonomies and grammatical class hierarchies simultaneously, based on knowledge capture from naturallanguage texts. The assimilation process is centered around the linguistic and conceptual 'quality' of various forms of evidence underlying the generation, assessment and on-going refinement of lexical and concept hypotheses. On the basis of the strength of evidence, hypotheses are ranked according to plausibility, and the most reasonable ones are selected for assimilation into the given lexical class hierarchy and domain ontology.
暂无评论