A word spotting system is in large parts characterized by the query modalities it is able to process. the most common modalities here are Query-by-Example and Query-by-String. However, recently a new query type has be...
详细信息
ISBN:
(纸本)9781538635865
A word spotting system is in large parts characterized by the query modalities it is able to process. the most common modalities here are Query-by-Example and Query-by-String. However, recently a new query type has been proposed: In Query-by-Online-Trajectory (QbO) the query is presented as a set of online-handwritten trajectories. In this work we devise a cross-domain word spotting framework using CNNs which is able to accomplish the QbO task. In particular, we design two different QbO systems which we evaluate in a number of experiments. We are not only able to outperform the current state of the art in QbO word spotting but also show that a system using a single CNN for both online and offline data achieves superior results compared to a system that uses a CNN for each domain individually.
Among one of the current and most topical tasks in the area of textual documents processing belongs the problem of automatic categorization. Clustering as the most common form of unsupervised learning enables automati...
详细信息
ISBN:
(纸本)9780769549347;9781467350266
Among one of the current and most topical tasks in the area of textual documents processing belongs the problem of automatic categorization. Clustering as the most common form of unsupervised learning enables automatic grouping of unlabeled documents into subsets called clusters. In this paper, the authors are concerned with results of clustering of very large electronic real-world data collections containing customers' reviews written freely, in English as a natural language. the reviews are automatically clustered into two groups that should contain either positive or negative reviews. the paper focuses on the analysis why certain reviews are assigned wrongly to a group containing mostly reviews of a different class. the assignment of a review into a certain cluster is based on its properties, i.e., on the words that appeared in the review. thus, words appearing in incorrectly categorized reviews were analyzed. It was found that words that are important from the correct classification viewpoint (and thus bearing some sentiment) are often similarly important as the words in a different set than expected, therefore do not take effect as misleading information unlike words that are much more or quite insignificant.
this study examines the role of spatial configuration on consumer movement behavior in terms of their physical footfall pattern and place-specific intensity in a large-scale, multi-leveled, planned shopping mall. the ...
详细信息
the large Hadron Collider (LHC) at CERN, the European Organization for Nuclear Research, will produce unprecedented volumes of data when it starts operation in 2007. To provide for its computational needs, the LHC Com...
详细信息
this paper gives an overview of methods for utilizing large process data matrices. these data matrices are almost always of less than full statistical rank, and therefore, latent variable methods are shown to be well ...
详细信息
this paper gives an overview of methods for utilizing large process data matrices. these data matrices are almost always of less than full statistical rank, and therefore, latent variable methods are shown to be well suited to obtain useful subspace models from them for treating a variety of important industrial problems. An overview of the important concepts behind latent variable models is presented and the methods are illustrated with industrial examples in the following areas: (i) the analysis of historical databases and trouble-shooting process problems;(ii) process monitoring and FDI;(iii) extraction of information from novel multivariate sensors;(iv) process control in reduced dimensional subspaces. In each of these problems, latent variable models provide the framework on which solutions are based. (c) 2005 Elsevier Ltd. All rights reserved.
Modern web services use in-memory caching extensively to increase throughput and reduce latency. there have been several workload analyses of production systems that have fueled research in improving the effectiveness...
详细信息
ISBN:
(纸本)9781939133199
Modern web services use in-memory caching extensively to increase throughput and reduce latency. there have been several workload analyses of production systems that have fueled research in improving the effectiveness of in-memory caching systems. However, the coverage is still sparse considering the wide spectrum of industrial cache use cases. In this work, we significantly further the understanding of real-world cache workloads by collecting production traces from 153 in-memory cache clusters at Twitter, sifting through over 80 TB of data, and sometimes interpreting the workloads in the context of the business logic behind them. We perform a comprehensive analysis to characterize cache workloads based on traffic pattern, time-to-live (TTL), popularity distribution, and size distribution. A fine-grained view of different workloads uncover the diversity of use cases: many are far more write-heavy or more skewed than previously shown and some display unique temporal patterns. We also observe that TTL is an important and sometimes defining parameter of cache working sets. Our simulations show that ideal replacement strategy in production caches can be surprising, for example, FIFO works the best for a large number of workloads.
the influence of sample size on the pyrolysis of wheat straw and three types of cellulose has been investigated by simultaneous thermogravimetric analysis and differential scanning calorimetry. Samples between 2 and 2...
详细信息
the influence of sample size on the pyrolysis of wheat straw and three types of cellulose has been investigated by simultaneous thermogravimetric analysis and differential scanning calorimetry. Samples between 2 and 20 mg were pyrolyzed to a maximum temperature of 600 degreesC at a heating rate of 40 degreesC min(-1). It was found that sample size had a large effect on the pyrolysis of Avicel cellulose;the mass loss peak was shifted to higher temperatures at higher sample mass. However, the effect of sample mass on the pyrolysis of wheat straw was insignificant. In wheat straw samples washed in water to reduce the KCl content, the influence of sample size was between that of cellulose and straw, but closer to straw. A model for the TGA/DSC-system has been developed, which includes heat transfer by both convection and radiation to the two crucibles and the sample. Simulations withthe model showed that the sample mass had a large influence on the pyrolysis at high heat of reaction and in agreement withdata, the pyrolysis peak shifts to higher temperatures at higher sample mass. the recommendation found in the literature that samples should be no larger than 1 mg in TGA measurements must be limited to biomass samples with a high heat of reaction such as cellulose. (C) 2001 Elsevier Science B.V. All rights reserved.
the recent development in the data analytics field provides a boost in production for modern industries. Small-sized factories intend to take full advantage of the data collected by sensors used in their machinery. th...
详细信息
In this paper, we present a text mining methodology and an information visualization interface that allows users to browse a large collection of French-language songs based on lyrics. We first harvested lyrics and met...
详细信息
ISBN:
(纸本)9782705688417
In this paper, we present a text mining methodology and an information visualization interface that allows users to browse a large collection of French-language songs based on lyrics. We first harvested lyrics and metadata from various sources on the Web. After data preprocessing, we used clustering and Latent Semantic analysis to identify a thematic structure and determine significant *** then transformed the resulting model into a set of nodes and edges to obtain an interactive visualization system for the exploration of our song collection.
the relation between physiological events, environmental factors, and the occurrence of problem behavior in natural settings was analyzed using data mining system LERS (Learning from Examples based on Rough Sets). Dat...
详细信息
ISBN:
(纸本)0769510043
the relation between physiological events, environmental factors, and the occurrence of problem behavior in natural settings was analyzed using data mining system LERS (Learning from Examples based on Rough Sets). data on heart rate were linked to environmental and behavioral data coded from videotapes of one adult subject diagnosed with severe mental retardation and who engaged in problem behavior, Tile results of the analysis suggest that using data mining system LERS it-ill be a valuable strategy, for exploring largedata sets that include heart rate, environmental, and behavioral measures.
暂无评论