In algorithm trading, computer algorithms are used to make the decision on the time, quantity, and direction of operations (buy, sell, or hold) automatically. To create a useful algorithm, the parameters of the algori...
详细信息
In algorithm trading, computer algorithms are used to make the decision on the time, quantity, and direction of operations (buy, sell, or hold) automatically. To create a useful algorithm, the parameters of the algorithm should be optimized based on historical data. However, Parameter optimization is a time consuming task, due to the large search space. We propose to search the parameter combination space using the MapReduce framework, with the expectation that runtime of optimization be cut down by leveraging the parallel processing capability of MapReduce. This paper presents the details of our method and some experiment results to demonstrate its efficiency. We also show that a rule based strategy after being optimized performs better in terms of stability than the one whose parameters are arbitrarily preset, while making a comparable profit.
Join processing in wireless sensor networks is a challenging problem. Current solutions are not involved in the join operation among tuples of the latest sampling periods. In this article, we proposed a continuous Sin...
详细信息
Join processing in wireless sensor networks is a challenging problem. Current solutions are not involved in the join operation among tuples of the latest sampling periods. In this article, we proposed a continuous Single attribute Join Queries within latest sampling Periods (SJQP) for wireless sensor networks. The main idea of our filter-based framework is to discard non-matching tuples, and our scheme can guarantee the result is correct independent of the filters. Experiments based on real-world sensor data show that our method performs close to a theoretical optimum and consistently outperforms the centralized join algorithm.
With the increasing of XML data over the Internet, managing and analyzing huge amount of XML documents has played an important role for information management. Clustering as an intelligent technique has been utilized ...
详细信息
With the increasing of XML data over the Internet, managing and analyzing huge amount of XML documents has played an important role for information management. Clustering as an intelligent technique has been utilized as an excellent way of grouping the documents by their content or structure. However, the key problem is how to measure similarity between XML documents. In this paper, we propose an extended vector space model and on this basis put forward an effective semantic similarity measurement method combining content and structure semantics, in which a variety of XML document features impacting similarity measurement, such as term element frequency, term inverse element frequency, semantic weight of tag and level information of the term, are analyzed. In addition, information gain, for clustering quality evaluation are introduced motivated by the fact that collection has no classification information in advance. Experiment results show that proposed similarity method (EVSM_SS) outperforms the content and structure integration measurement based on structure path (VSM_SP) as well as traditional document clustering measurement (CO) in information gain and produce better clustering quality.
The biggest characteristic of the XML retrieval is able to return the element node results. This paper studies XML element search results clustering and proposes one similarity measurement method based on term semanti...
详细信息
The biggest characteristic of the XML retrieval is able to return the element node results. This paper studies XML element search results clustering and proposes one similarity measurement method based on term semantics, in which the "core" concept between terms is got through latent semantic indexing technology(LSI) and the same time the XML element node content and semantic structure properties(CASS) are combined. In addition, two new performance evaluation methodologies, namely R_ClusterRatio and R_DocuRatio are introduced to evaluate clustering quality. It is motivated by the observations of relevant documents distribution and the fact that the experiment data collection, IEEE CS corpus, do not provide classification information. Experiment results show that proposed similarity method combining term semantics with content and structure semantics integration(LSI-CASS) is feasible, and it produces better clustering quality than LSI-CAS and CASS.
The increasing availability of GPS-embedded mobile devices has given rise to a new spectrum of location-based services, which have accumulated a huge collection of location trajectories. In practice, a large portion o...
详细信息
ISBN:
(纸本)9781467300421
The increasing availability of GPS-embedded mobile devices has given rise to a new spectrum of location-based services, which have accumulated a huge collection of location trajectories. In practice, a large portion of these trajectories are of low-sampling-rate. For instance, the time interval between consecutive GPS points of some trajectories can be several minutes or even hours. With such a low sampling rate, most details of their movement are lost, which makes them difficult to process effectively. In this work, we investigate how to reduce the uncertainty in such kind of trajectories. Specifically, given a low-sampling-rate trajectory, we aim to infer its possible routes. The methodology adopted in our work is to take full advantage of the rich information extracted from the historical trajectories. We propose a systematic solution, History based Route Inference System (HRIS), which covers a series of novel algorithms that can derive the travel pattern from historical data and incorporate it into the route inference process. To validate the effectiveness of the system, we apply our solution to the map-matching problem which is an important application scenario of this work, and conduct extensive experiments on a real taxi trajectory dataset. The experiment results demonstrate that HRIS can achieve higher accuracy than the existing map-matching algorithms for low-sampling-rate trajectories.
The data required for automatic optimization of user services usually exists in current systems, but that data is not modelled or linked in a way that facilitates automation. knowledgeengineering is a promising appro...
详细信息
The data required for automatic optimization of user services usually exists in current systems, but that data is not modelled or linked in a way that facilitates automation. knowledgeengineering is a promising approach for managing the disparate communication service quality management information data sets and the links across those data sets. Once a knowledge base is in place, semantic techniques can be used to analyse and suggest optimizations to service quality. This paper describes our work in building, populating and evaluating a knowledge base for an IPTV service in Home Area Networks. Population of the knowledge base was implemented using terminal reports. The characteristics of the approach were evaluated through experimentation and the evaluation results are presented in this paper.
This paper proposes a framework to identify the relevant law articles consisting of sentences and range of punishments, given facts discovered in the criminal case of interest. The model is formulated as a two-stage c...
详细信息
This paper proposes a framework to identify the relevant law articles consisting of sentences and range of punishments, given facts discovered in the criminal case of interest. The model is formulated as a two-stage classifier according to the concept of machine learning. The first stage is to determine a set of case diagnostic issues, using a modular Artificial Neural Network (mANN), and the second stage is to determine the relevant legal elements which lead to legal charges identification, using SVM-equipped C4.5. The integrated multi-stage model aims at achieving high accuracy of classification while reserving “arguability”. Hypothetically, mANN handles well for digesting complexity in case-level issues analysis with acceptable explanatory power and C4.5 addresses the lesser extent of contingency and provides human-interpretable logic concerning the high-level context of legal codes.
Cloud Computing Service (CCS) paradigm is changing IT strategy of organizations in the digital world. CCS that requires few upfront investments and uses lease-based pricing is especially relevant to the Small and Medi...
详细信息
ISBN:
(纸本)9781627486040
Cloud Computing Service (CCS) paradigm is changing IT strategy of organizations in the digital world. CCS that requires few upfront investments and uses lease-based pricing is especially relevant to the Small and Medium Enterprises (SMEs), which have limited resources and may not know their true valuation for the IT prior to adoption. Thus, this research aims to investigate the influential factors of SMEs' strategic choice of CCS as online service. Relying upon Technology-Organization-Environment (TOE) paradigm, we identify both generic and context-specific factors from the three aspects and explain how the identified factors affect SMEs' CCS strategic choices. We hope this research can make contributions to innovation diffusion theory and IT strategy literature. We also hope the research with progress going on can generate insights for the CCS vendors who care about the sector of SME as well as the government administrators to make appropriate policies or supports for SMEs.
The problem of scalable knowledge extraction from the Web has attracted much attention in the past decade. However, it is under explored how to extract the structured knowledge from semi-structured Websites in a fully...
详细信息
ISBN:
(纸本)9781467351645
The problem of scalable knowledge extraction from the Web has attracted much attention in the past decade. However, it is under explored how to extract the structured knowledge from semi-structured Websites in a fully automatic and scalable way. In this work, we define the table-formatted structured data with clear schema as knowledge Tables and propose a scalable learning system, which is named as Kable to extract knowledge from semi-structured Websites automatically in a never ending and scalable way. Kable consists of two major components, which are auto wrapper induction and schema matching respectively. In contrast to the state of the art auto wrappers for semi-structured Web sites, our adopted approach can run around 1'000 times faster, which makes the Web scale knowledge extraction possible. On the other hand, we propose a novel schema matching solution which can work effectively on the auto-extracted structured data. With 3 months' continuous run using ten Web servers, we successfully extracted 427,105,009 knowledge facts. The manual labeling over sampled knowledge extracted show the up to 87% precision for supporting various Web applications.
data collected from mobile phones have potential knowledge to provide with important behavior patterns of individuals. In this paper, we present approaches to discovering personal mobility and characteristics based on...
详细信息
暂无评论