this paper presents a novel crawling strategy to locate bilingual sites. It does so by focusing on the Web graph neighborhood of these sites and exploring the patterns of the links in this region to guide its visitati...
详细信息
We compare the use of an unsupervised transliteration mining method and a rule-based method to automatically extract lists of transliteration word pairs from a parallel corpus of Hindi/Urdu. We build joint source chan...
详细信息
the computational demands of multimedia data processing are steadily increasing as consumers call for progressively more complex and intelligent multimedia services. New multi-core hardware architectures provide the r...
详细信息
ISBN:
(纸本)9780769545110
the computational demands of multimedia data processing are steadily increasing as consumers call for progressively more complex and intelligent multimedia services. New multi-core hardware architectures provide the required resources, but writing parallel, distributed applications remains a labor-intensive task compared to their sequential counter-part. For this reason, Google and Microsoft implemented their respective processing frameworks MapReduce [10] and Dryad [19], as they allow the developer to think sequentially, yet benefit from parallel and distributed execution. An inherent limitation in the design of these batch processing frameworks is their inability to express arbitrarily complex workloads. the dependency graphs of the frameworks are often limited to directed acyclic graphs, or even pre-determined stages. this is particularly problematic for video encoding and other algorithmsthat depend on iterative execution. Withthe Nornir runtime system for parallel programs [39], which is a Kahn Process Network implementation, we addressed and solved several of these limitations. However, it is more difficult to use than other frameworks due to its complex programming model. In this paper, we build on the knowledge gained from Nornir and present a new framework, called P2G, designed specifically for developing and processing distributed real-time multimedia data. P2G supports arbitrarily complex dependency graphs with cycles, branches and deadlines, and provides both data- and task-parallelism. the framework is implemented to scale transparently with available (heterogeneous) resources, a concept familiar from the cloud computing paradigm. We have implemented an (interchangeable) P2G kernel language to ease development. In this paper, we present a proof of concept implementation of a P2G execution node and some experimental examples using complex workloads like Motion JPEG and Kmeans clustering. the results show that the P2G system is a feasible approach to mu
Annotating Named Entity Recognition (NER) training corpora is a costly process but necessary for supervised NER systems. this paper presents an approach to generate large-scale Chinese NER training data from an Englis...
详细信息
the proceedings contain 18 papers. the topics discussed include: a systematic approach to web application penetration testing using TTCN-3;towards model-based support for managing organizational transformation;cloud c...
ISBN:
(纸本)9783642208614
the proceedings contain 18 papers. the topics discussed include: a systematic approach to web application penetration testing using TTCN-3;towards model-based support for managing organizational transformation;cloud computing providers: characteristics and recommendations;evolution of goal-driven pattern families for business process modeling;searching, translating and classifying information in cyberspace;e-tourism portal: a case study in ontology-driven development;toward a goal-oriented, business intelligence decision-making framework;flexible communication based on linguistic and ontological cues;a study of e-government architectures;model-based engineering of a managed process application framework;harnessing enterprise 2.0 technologies: the midnight projects;following the conversation: a more meaningful expression of engagement;and the design, development and application of a proxy credential auditing infrastructure for collaborative research.
this paper presents a hardware architecture for H.264 intra prediction frame processing. this design reuse some modules according to the common parts of luma 16X16 prediction and chroma 8X8 prediction in architecture ...
详细信息
Withthe help of recent development on semiconductor design and process technologies modern processors can provide a great opportunity to increase the performance of processing multimedia data by exploiting task- and ...
详细信息
During the last years there is increasing interest in methods that perform some kind of weighting of heterogeneous parallel training data when building a statistical machine translation system. It was for instance obs...
详细信息
Open information extraction (IE) is a weakly supervised IE paradigm that aims to extract relation-independent information from large-scale natural language documents without significant annotation efforts. A key chall...
详细信息
DNA sequences can be often showed in fragments, little pieces, found at crime scene or in a hair sample for paternity exam. In order to compare that fragments with a subject or target sequence of a suspect, we need an...
详细信息
暂无评论