Identifying partially duplicated text segments among documents is an important research problem withapplications in plagiarism detection and near-duplicate web page detection. We investigate the problem of local simi...
详细信息
Enterprise architects and information system designers need to understand and manage workflows, data flows, and social interactions to design tools and systems for well-coordinated organizational operations. However, ...
详细信息
ISBN:
(纸本)9781538637906
Enterprise architects and information system designers need to understand and manage workflows, data flows, and social interactions to design tools and systems for well-coordinated organizational operations. However, the organizational-nature has drastically transformed over the recent years due to wide-scale use of new computingtechnologies. Disintegrated structures, large quantities of frequently-generated data, and dubious system and interaction boundaries are some of the obvious identifiers of a modern enterprise, where poorly designed coordination can lead to serious privacy risks. Old coordination modeling frameworks do not set well for the new organizational settings, and a need for alternative models and frameworks has been felt. In this paper, we propose a privacy-aware conceptual framework for understanding coordination by identifying and mapping work, data, and interaction patterns in organizational environments. these propositions intend to help practitioners in developing an updated understanding of the coordination that serves privacy needs, as well.
Clustering analysis is a widely used technique in bioinformatics and biochemistry for variety of applications such as detection of new cell types, evaluation of drug response, etc. Since different applications and cel...
详细信息
ISBN:
(纸本)9789897582127
Clustering analysis is a widely used technique in bioinformatics and biochemistry for variety of applications such as detection of new cell types, evaluation of drug response, etc. Since different applications and cells may require different clustering algorithms combining multiple clustering results into a consensus clustering using distributed clustering is a popular and efficient method to improve the quality of clustering analysis. Currently existing solutions are commonly based on supervised techniques which do not require any a priori knowledge. However in certain cases, a priori information on particular labelings may be available a priori. In these cases it is expected that performance improvement can be achieved by utilizing this prior information. To this purpose in this paper, we propose two semi-supervised distributed clustering algorithms and evaluate their performance for different base clusterings.
In this paper, we propose a novel approach which uses full processor utilization to compute a particular class of dynamic programming problems parallelly. this class includes algorithms such as Longest Common Subseque...
详细信息
AGEL-SVM is an extension to a kernel Support Vector Machine (SVM) and is designed for distributedcomputing using Approximate Global Exhaustive Local sampling (AGEL)-SVM. the dual form of SVM is typically solved using...
详细信息
ISBN:
(纸本)9781450355490
AGEL-SVM is an extension to a kernel Support Vector Machine (SVM) and is designed for distributedcomputing using Approximate Global Exhaustive Local sampling (AGEL)-SVM. the dual form of SVM is typically solved using sequential minimal optimization (SMO) which iterates very fast if the full kernel matrix can fit in a computer's memory. AGEL-SVM aims to partition the feature space into sub problems such that the kernel matrix per problem can fit in memory by approximating the data outside each partition. AGEL-SVM has similar Cohen's Kappa and accuracy metrics as the underlying SMO implementation. AGEL-SVM's training times greatly decreased when running on a 128 worker MATLAB pool on Amazon's EC2. Predictor evaluation times are also faster due to a reduction in support vectors per partition.
Large scale computing and applications with large data sets often cause high cache miss rate because of using array of structures or linked-list data structure. When traversing these data structures, the memory access...
详细信息
ISBN:
(纸本)9781538637906
Large scale computing and applications with large data sets often cause high cache miss rate because of using array of structures or linked-list data structure. When traversing these data structures, the memory accesses may have constant long strides across pages on virtual address (VA), but mostly scatter over the physical address (PA). therefore, conventional stride prefetcher (SP) based on PA cannot prefetch data efficiently here. In this paper, we propose a hardware data prefetching design named Virtual Address-based Stride Prefetcher (VASP) to exploit the prefetch potential of long-stride access pattern on VA. VASP detects the access strides on VA including those cross pages, then it predicts a new VA and prefetches data after address translation. We implement VASP in the gem5 simulator and use SPEC CPU2006 integer benchmarks to evaluate its performance. Our simulation results show that, compared with SP, applying VASP to caches offers up to 43% performance improvement in the mcf benchmark, and improves the overall performance by 6%.
With extensive use of Internet of Vehicle (IoV) technologies in vehicle traffic management, real-time analysis of vehicle behavior trajectories is of great significance to the assessment of traffic conditions and the ...
详细信息
ISBN:
(纸本)9781538637906
With extensive use of Internet of Vehicle (IoV) technologies in vehicle traffic management, real-time analysis of vehicle behavior trajectories is of great significance to the assessment of traffic conditions and the avoidance of abnormal conditions. this paper presents a solution which can efficiently deal with real-time streaming data of trajectory and excavate temporal and spatial abnormal information. In order to represent the local feature information of the trajectory and solve the problem of large loss of information in the feature point extraction algorithm, a trajectory partitioning strategy based on multi-motion feature and a similarity measure method based on trajectory structure are proposed. And based on the proposed strategy and method, a distributed clustering algorithm is designed for streaming trajectories to improve the efficiency of clustering algorithm. In order to solve the problem of massive calculation of distance and neighborhood density in trajectory anomaly detection algorithm, the data set is pruned by track clustering results, and the efficiency of the algorithm increases the real-time performance of abnormal trajectory detection.
Withthe development of information technology, real-time data stream processing(RTDSP) has become a popular research topic. the first step of RTDSP is collecting data, requiring a data collector to receive data from ...
详细信息
ISBN:
(纸本)9781538637906
Withthe development of information technology, real-time data stream processing(RTDSP) has become a popular research topic. the first step of RTDSP is collecting data, requiring a data collector to receive data from the source and send them to the sink. Apache Flume, a distributed and reliable framework, used for this purpose, has some limitations and drawbacks on load balancing and storage. In this paper, we aim to improve performance and availability for collecting unstable real-time big data stream. So we propose a new load balancing strategy based on the free memory size and a storage strategy of integration memory channel withthe multi-file channel to reduce the overhead of disk and network. Finally, the experimental results show that the availability and performance are improved under the condition of a poor network, high availability requirements, intense competition in memory resources and large data size. Specifically, the availability is higher than 99.999%, and the performance can be improved by 10%-50% under different conditions.
A dominating set of a small size is useful in several settings including wireless networks, document summarization, secure system design, and the like. In this paper, we start by studying three distributed algorithms ...
详细信息
ISBN:
(纸本)9781538630778
A dominating set of a small size is useful in several settings including wireless networks, document summarization, secure system design, and the like. In this paper, we start by studying three distributed algorithms that produce a small sized dominating sets in a few rounds. We interpret these algorithms in the natural shared memory setting and experiment withthese algorithms on a multi-core CPU. Based on the observations from these experimental results, we propose variations to the three algorithms and also show how the proposed variations offer interesting trade-offs with respect to the size of the dominating set produced and the time taken.
In order to guarantee the reliability of services and reduce the waste of resources in traditional dual-path protection method, a service-reliability-based resources mapping method is proposed in this paper. the virtu...
详细信息
ISBN:
(纸本)9781538637906
In order to guarantee the reliability of services and reduce the waste of resources in traditional dual-path protection method, a service-reliability-based resources mapping method is proposed in this paper. the virtualization technology is adopted in our method for smart grid Fiber-Wireless access networks. Firstly, a priority-and fault-probability based primary link mapping model is set up. It classifies the service priorities level and provides high-priority services with high-reliability links to improve service reliability of the whole network. then, a resource-saving aimed backup link mapping model is established to allocate the multiple services as many as possible for one backup link to save resources. At last, the genetic algorithm is used to solve the mapping model. the evaluation results show that our proposed method increases service reliability, and improves resource utilization.
暂无评论