In this paper we analyze the Web coverage of three search engines, Google, Yahoo and MSN. We conducted a 15 month study collecting 15,770 Web content or information pages linked from 260 Australian federal and local g...
详细信息
ISBN:
(纸本)1595936548
In this paper we analyze the Web coverage of three search engines, Google, Yahoo and MSN. We conducted a 15 month study collecting 15,770 Web content or information pages linked from 260 Australian federal and local government Web pages. The key feature of this domain is that new information pages are constantly added but the 260 web pages tend to provide links only to the more recently added information pages. Search engines list only some of the information pages and their coverage varies from month to month. Meta-search engines do little to improve coverage of information pages, because the problem is not the size of web coverage, but the frequency with which information is updated. We conclude that organizations such as governments which post important information on the Web cannot rely on all relevant pages being found with conventional search engines, and need to consider other strategies to ensure important information can be found.
In this paper we introduce a new method for detecting a type of English Multiword Expressions (MWEs), which is phrasal verbs, into an English-Arabic phrase-based statistical machine translation (PBSMT) system. The det...
详细信息
In this paper we introduce a new method for detecting a type of English Multiword Expressions (MWEs), which is phrasal verbs, into an English-Arabic phrase-based statistical machine translation (PBSMT) system. The detection starts with parsing the English side of the parallel corpus, detecting various linguistic patterns for phrasal verbs and finally integrate them into the En-Ar PBSMT system. In addition, the paper explores the effect of cliticizing specific words in English that have no Arabic equivalent. The results, which reported with the BLEU scores, showed that some patterns achieved significant improvements compared to other patterns and still the baseline achieves the highest score. This paper shows that, by detecting more linguistic patterns and integrating them into En-Ar SMT system, translation quality could be improved with other integration methods. Yet, the results show which path is worth to follow and clarifies the perspective that linguistic features are not handled properly in the statistically learned models.
Automatic Differentiation (AD) is a maturing computational technology and has become a mainstream tool used by practicing scientists and computer engineers. The rapid advance of hardware computing power and AD tools h...
详细信息
ISBN:
(数字)9781461300755
ISBN:
(纸本)9780387953052;9781461265436
Automatic Differentiation (AD) is a maturing computational technology and has become a mainstream tool used by practicing scientists and computer engineers. The rapid advance of hardware computing power and AD tools has enabled practitioners to quickly generate derivative-enhanced versions of their code for a broad range of applications in applied research and development.;Automatic Differentiation of Algorithms provides a comprehensive and authoritative survey of all recent developments, new techniques, and tools for AD use. The book covers all aspects of the subject: mathematics, scientific programming (i.e., use of adjoints in optimization) and implementation (i.e., memory management problems). A strong theme of the book is the relationships between AD tools and other software tools, such as compilers and parallelizers. A rich variety of significant applications are presented as well, including optimum-shape design problems, for which AD offers more efficient tools and techniques.
Image reconstruction enhanced by regularizers, e.g., to enforce sparsity, low rank or smoothness priors on images, has many successful applications in vision tasks such as computer photography, biomedical and spectral...
ISBN:
(纸本)9781713871088
Image reconstruction enhanced by regularizers, e.g., to enforce sparsity, low rank or smoothness priors on images, has many successful applications in vision tasks such as computer photography, biomedical and spectral imaging. It has been well accepted that non-convex regularizers normally perform better than convex ones in terms of the reconstruction quality. But their convergence analysis is only established to a critical point, rather than the global optima. To mitigate the loss of guarantees for global optima, we propose to apply the concept of invexity and provide the first list of proved invex regularizers for improving image reconstruction. Moreover, we establish convergence guarantees to global optima for various advanced image reconstruction techniques after being improved by such invex regularization. To the best of our knowledge, this is the first practical work applying invex regularization to improve imaging with global optima guarantees. To demonstrate the effectiveness of invex regularization, numerical experiments are conducted for various imaging tasks using benchmark datasets.
In past years, the world has switched to multi and many core shared memory architectures. As a result, there is a growing need to utilize these architectures by introducing shared memory parallelization schemes, such ...
详细信息
ISBN:
(纸本)9798400700156
In past years, the world has switched to multi and many core shared memory architectures. As a result, there is a growing need to utilize these architectures by introducing shared memory parallelization schemes, such as OpenMP, to applications. Nevertheless, introducing OpenMP work-sharing loop construct into code, especially legacy code, is challenging due to pervasive pitfalls in management of parallel shared memory. To facilitate the performance of this task, many source-to-source (S2S) compilers have been created over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. In this work, we propose leveraging recent advances in machine learning techniques, specifically in natural language processing (NLP), to suggest the need for an OpenMP work-sharing loop directive and data-sharing attributes clauses --- the building blocks of concurrent programming. We train several transformer models, named PragFormer, for these tasks and show that they outperform statistically-trained baselines and automatic source-to-source (S2S) parallelization compilers in both classifying the overall need for an parallel for directive and the introduction of private and reduction clauses. In the future, our corpus can be used for additional tasks, up to generating entire OpenMP directives. The source code and database for our project can be accessed on GitHub 1 and HuggingFace 2.
The problem of privacy-preserving data mining has become more and more important in recent years. Many successful and efficient techniques have been developed. However, in collaborative data analysis, part of the data...
详细信息
The problem of privacy-preserving data mining has become more and more important in recent years. Many successful and efficient techniques have been developed. However, in collaborative data analysis, part of the datasets may come from different data owners and may be processed using different data distortion methods. Thus, combinations of datasets processed using different methods are of practical interests. In this paper, a class of novel data distortion strategies is proposed. Four schemes via attribute partition, with different combinations of singular value decomposition (SVD), nonnegative matrix factorization (NMF), discrete wavelet transformation (DWT), are designed to perturb submatrix of the original datasets for privacy protection. We use some metrics to measure the performance of the proposed new strategies. Data utility is examined by using a binary classification based on the support vector machine. Our experimental results indicate that, in comparison with the individual data distortion techniques, the proposed schemes are very efficient in achieving a good trade-off between data privacy and data utility, and provide a feasible solution for collaborative data analysis.
This book constitutes the refereed post-proceedings of the Joint International Semantic Technology Conference, JIST 2011, held in Hangzhou, China, in December 2011. This conference is a joint event for regional semant...
详细信息
ISBN:
(数字)9783642299230
ISBN:
(纸本)9783642299223
This book constitutes the refereed post-proceedings of the Joint International Semantic Technology Conference, JIST 2011, held in Hangzhou, China, in December 2011. This conference is a joint event for regional semantic Web related conferences. JIST 2011 brings together the Asian Semantic Web Conference 2011 and the Chinese Semantic Web Conference 2011. The 21 revised full papers presented together with 12 short papers were carefully reviewed and selected from 82 submissions. The papers cover a wide range of topics in disciplines related to semantic technology including applications of the semantic Web, management of semantic Web data, ontology and reasoning, social semantic Web, and user interfaces to the semantic Web.
In many applications, volumetric data sets are examined by displaying isosurfaces, surfaces where the data, or some function of the data, takes on a given value, interactive applications typically use local lighting m...
详细信息
暂无评论