A computer-vision computation requires a high number of multiplications, hence causing a bottleneck. Based on the work of Zhenhong Liu, (1) the multiplications in these algorithms do not always require high precision ...
详细信息
ISBN:
(数字)9781510627741
ISBN:
(纸本)9781510627741
A computer-vision computation requires a high number of multiplications, hence causing a bottleneck. Based on the work of Zhenhong Liu, (1) the multiplications in these algorithms do not always require high precision provided by the processors. As a result, we can reduce computation redundancy by means of multiplication approximation. Following this approach, in this paper, we investigate two major algorithms, namely convolutional neural network (CNN) and scale-invariant features transform (SIFT), and quantify their tolerance of multiplication approximation. A multiplication approximation is done by injecting a random value to each of the precise multiplication values. The INRIA and OXFORD datasets were used in the SIFT algorithm analysis while the CIFAR-10 and MNIST datasets were applied for the CNN experiments. The results showed that SIFT can withstand only a small percentage of multiplication approximation while CNN can tolerate over 30% of multiplication approximation.
Traditional methods for solving multi-class problems, well-known as multi-SVMs, always combine certain decomposed binary-SVMs' results to formulate the final decision function. The prevalent methods are 'one v...
详细信息
ISBN:
(纸本)9781424451821
Traditional methods for solving multi-class problems, well-known as multi-SVMs, always combine certain decomposed binary-SVMs' results to formulate the final decision function. The prevalent methods are 'one vs. one' and 'one vs. all', which are based on a voting scheme among the binary classifiers to derive the winning class. However, they do not scale well with the data size and class number. Core Vector Machine (CVM) is a promising technique for scaling up a binary-SVM to handle large data sets with the greedy-expansion strategy, where the kernels are required to be normalized to ensure the equivalence between the kernel-induced spaces of SVM and Minimum Enclosing Ball (MEB). The idea proposed by CVM can also be utilized to formulate multi-SVM to MEB, by which we propose an approximate MEB algorithm with smaller core sets to handle multi-SVM. The experimental results on synthetic and benchmark data sets demonstrate the competitive performances of the method we proposed both on training time and training accuracy.
Maximum Range-Sum (MaxRS) query is an important operator in spatial database for retrieving regions of interest (ROIs). Given a rectangular query size a x b and a set of spatial objects associated with positive weight...
详细信息
ISBN:
(纸本)9781450369091
Maximum Range-Sum (MaxRS) query is an important operator in spatial database for retrieving regions of interest (ROIs). Given a rectangular query size a x b and a set of spatial objects associated with positive weights, MaxRS retrieves rectangular regions Q of size a x b, such that the sum of object weights covered by Q (i.e., range sum) is maximized. Due to the inaccuracy of the location acquisition, the collected locations of spatial objects are inherently uncertain and imprecise, which can be modeled by uncertain objects. In this paper, we propose a Probabilistic Maximum Range-Sum (PMaxRS) query over uncertain spatial objects, which obtains a set gamma* of rectangles such that the probability that each region Q is an element of gamma* has the maximum range-sum exceeds a user-specified threshold P-t. We show that determining whether a given region Q is #P-complete. To tackle the hardness, we introduce the PMaxRS Framework based on pruning and refinement strategies. In the pruning step, we propose a candidate generation technique to reduce the search space. In the refinement step, we design an efficient sampling-based approximation algorithm to verify the remaining candidate regions. Extensive experiments are conducted to demonstrate the effectiveness and efficiency of our algorithms.
In many applications including sensor networks, telecommunications data management, network monitoring and financial applications, data arrives in a stream. There are growing interests in algorithms over data streams ...
详细信息
ISBN:
(纸本)9728865414
In many applications including sensor networks, telecommunications data management, network monitoring and financial applications, data arrives in a stream. There are growing interests in algorithms over data streams recently. This paper introduces the problem of sampling from landmark windows of recent data items from data streams and presents a random sampling algorithm for this problem. The presented algorithm, which is called SMS algorithm, is a stratified multistage sampling algorithm for landmark window. It takes different sampling fraction in different strata of landmark window, and works even when the number of data items in the landmark window varies dramatically over time. The theoretic analysis and experiments show that the algorithm is effective and efficient for continuous data streams processing.
In this work we present a framework for estimation of a rather general class of multivariate jump-diffusion processes. We assume that a continuous unobservable linear diffusion processes system is additively mixed tog...
详细信息
ISBN:
(纸本)9781424498642
In this work we present a framework for estimation of a rather general class of multivariate jump-diffusion processes. We assume that a continuous unobservable linear diffusion processes system is additively mixed together with a discrete jump processes vector and a conventional multi-variate white-noise process. This sum is observed over time as a multi-variate jump-diffusion time-series. Our objective is to identify realizations of all components of the mix in a robust and scalable way. First, we formulate this model as an Mixed-Integer-Programming (MIP) optimization problem extending traditional least-squares estimation framework to include discrete jump processes. Then we propose a Dynamic Programming (DP) approximate algorithm that is reasonably fast & accurate and scales polynomially with time horizon. Finally, we provide numerical test cases illustrating the algorithm performance and robustness.
Edge data caching has attracted tremendous attention in recent years. Service providers can consider caching data on nearby locations to provide service for their app users with relatively low latency. The key to enha...
详细信息
ISBN:
(纸本)9781665416818
Edge data caching has attracted tremendous attention in recent years. Service providers can consider caching data on nearby locations to provide service for their app users with relatively low latency. The key to enhance the user experience is appropriately choose to cache data on the suitable edge servers to achieve the service providers' objective, e.g., minimizing data retrieval latency and minimizing data caching cost, etc. However, Quality of Experience (QoE), which impacts service providers' caching benefit significantly, has not been adequately considered in existing studies of edge data caching. This is not a trivial issue because QoE and Quality-of-Service (QoS) are not correlated linearly. It significantly complicates the formulation of cost-effective edge data caching strategies under the caching budget, limiting the number of cache spaces to hire on edge servers. We consider this problem of QoE-aware edge data caching in this paper, intending to optimize users' overall QoE under the caching budget. We first build the optimization model and prove the NP-completeness about this problem. We propose a heuristic approach and prove its approximation ratio theoretically to solve the problem of large-scale scenarios efficiently. We have done extensive experiments to demonstrate that the MPSG algorithm we propose outperforms state-of-the-art approaches by at least 68.77%.
This paper discusses the computation of matrix chain products of the form M1 × M22 × ··· × Mn where Mi‘s are matrices. The order in which the matrices are computed affects the number of ...
详细信息
An increasing number of high-performance networks are built over the existing IP network infrastructure to provision dedicated channels for big data transfer. The links in these overlay networks correspond to underlyi...
详细信息
ISBN:
(纸本)9781509053360
An increasing number of high-performance networks are built over the existing IP network infrastructure to provision dedicated channels for big data transfer. The links in these overlay networks correspond to underlying paths and may share lower-level link segments. We consider a model of overlay networks that incorporates correlated link capacities and linear capacity constraints (LCCs) to formulate such shared bottleneck components. The overlay links are typically shared by multiple users through advance reservations, resulting in varying bandwidth availability in future time. Therefore, efficient bandwidth scheduling algorithms are needed to improve the network resource utilization and also meet the user's transport requirements. We investigate two advance scheduling problems in overlay networks with LCCs: Fixed-Bandwidth Path and Varying-Bandwidth Path, with the objective to minimize the data transfer end time for a given data size. We prove that both problems are NP-complete and non-approximable, and propose heuristic algorithms using a gradual relaxation procedure on the maximum number of links from each LCC allowed for path computation. The performance superiority of these heuristics is verified by extensive simulation results in comparison with optimal and greedy strategies.
This paper describes the application of the cycles merging algorithm for data compression using an approximate solution to the Shortest Common Superstring (SCS) problem, which is reduced to solving the asymmetric trav...
详细信息
ISBN:
(数字)9781728180755
ISBN:
(纸本)9781728180755
This paper describes the application of the cycles merging algorithm for data compression using an approximate solution to the Shortest Common Superstring (SCS) problem, which is reduced to solving the asymmetric traveling salesman problem to the maximum (MaxTSP). SCS is as follows: given a collection of strings S-1, ... , S-m, find the shortest string S such that each S-l appears as a substring (a consecutive block) of S. The algorithm is tested for data compression on randomly generated data with the specified parameters, the results are compared with the results of data compression by greedy and 4-approximation algorithms. The efficiency of the CMA algorithm for the maximum problem is investigated.
SimRank is a similarity measure for graph nodes that has numerous applications in practice. Scalable SimRank computation has been the subject of extensive research for more than a decade, and yet, none of the existing...
详细信息
ISBN:
(纸本)9781450335317
SimRank is a similarity measure for graph nodes that has numerous applications in practice. Scalable SimRank computation has been the subject of extensive research for more than a decade, and yet, none of the existing solutions can efficiently derive SimRank scores on large graphs with provable accuracy guarantees. In particular, the state-of-the-art solution requires up to a few seconds to compute a SimRank score in million-node graphs, and does not offer any worst-case assurance in terms of the query error. This paper presents SLING, an efficient index structure for SimRank computation. SLING guarantees that each SimRank score returned has at most epsilon additive error, and it answers any single pair and single-source SimRank queries in O(1/epsilon) and O(n/epsilon) time, respectively. These time complexities are near-optintal, and are significantly better than the asymptotic bounds of the most recent approach. Furthermore, SLING requires only O(n/epsilon) space (which is also near-optimal in an asymptotic sense) and O(m/epsilon + n log n/delta/epsilon(2)) pre-computation time, where delta is the failure probability of the preprocessing algorithm. We experimentally evaluate SLING with a variety of real-world graphs with up to several millions of nodes. Our results demonstrate that SLING is up to 10000 times (rest). 110 times) faster than competing methods for single pair (resp. single-source) SimRank queries, at the cost of higher space overheads.
暂无评论