Big data is a broad term for data sets so large or complex the traditional data processingapplications are inadequate. Challenges include analysis, capture, data duration, search, sharing, storage, transfer, visualiz...
详细信息
Big data is a broad term for data sets so large or complex the traditional data processingapplications are inadequate. Challenges include analysis, capture, data duration, search, sharing, storage, transfer, visualization, querying and information privacy. The processing and storage of big data although made efficient and easy by hadoop, there are two things to be considered, a storage part (Hadoop distributed File System) and a processing part (Map reduce). The performance of a map reduce depends on how the data are partitioned and then controlled in parallel manner. The current methods are far from optimal. The skew aware partitioning technology is used to detect the problems related to data partitioning that affects performance. In this paper, we enhance the split mechanism of Hadoop without making it harder than existing. There are a series of works on data partitioning in database communities, we want to employ more intelligent methods, like sampled data for record skew, program code analysis for computation cost estimation etc., to enhance our works for optimizing data partition in MapReduce. Experiments results demonstrated that our solutions can improve the MapReduce processing performance remarkably than traditional Hadoop Implementation.
Increasingly there are a variety of important applications for image processing on mobile phones. The performance of image processingapplications thus becomes one of the focused points of research activities. OpenCV ...
详细信息
ISBN:
(纸本)9781509028269
Increasingly there are a variety of important applications for image processing on mobile phones. The performance of image processingapplications thus becomes one of the focused points of research activities. OpenCV provides APIs to let programmers develop image processing programs with ease. The new OpenCV 3.0 enables OpenCL flow to aim at making the applications developed with OpenCV run fast on heterogeneous multi-core systems. Although OpenCL programs are portable, the performance still needs to be tuned for different architecture models. In this paper, we demonstrate the optimization flow for Gesture Recognition applications with OpenCV 3.0 on Mali GPUs. In this case study, several optimization techniques are devised for the flow. The techniques include vectorization, the increase of vector width via layout transformation, kernel fusion, etc. Preliminary experimental results show that our scheme is effective to optimize OpenCV 3.0 flow for Gesture Recognition applications on embedded heterogeneous multi-core systems.
The proceedings contain 29 papers. The special focus in this conference is on Big Data Analytics, Cloud Data Management, Internet of Things, Security, Privacy Engineering, Data Protection, Data Hiding, Context-Based D...
ISBN:
(纸本)9783319480565
The proceedings contain 29 papers. The special focus in this conference is on Big Data Analytics, Cloud Data Management, Internet of Things, Security, Privacy Engineering, Data Protection, Data Hiding, Context-Based Data Analysis, Emerging Data Management Systems and applications. The topics include: Incorporating trust, certainty and importance of information into knowledge processing systems - an approach;incremental parallel support vector machines for classifying large-scale multi-class image datasets;a large-scale two-level clustering similarity search with MapReduce;immune approach to the protection of IoT devices;heuristic-guided verification for fast congestion detection on wireless sensor networks;security risk management in the aviation turnaround sector;a novel encryption mechanism for door lock;information and identity theft Without ARP spoofing in LAN environments;a watermarking framework for outsourced and distributed relational databases;face quality measure for face authentication;using graph database for evidence correlation on android smartphones;a secure token-based communication for authentication and authorization servers;an enhancement of the Rew-XAC model for workflow data access control in healthcare;trust and risk-based access control for privacy preserving threat detection systems;fine grained attribute based access control model for privacy protection;automatic extraction of semantic relations from text documents;the present and future of large-scale systems modeling and engineering;non-disjoint multi-agent scheduling problem on identical parallel processors and an evaluative model to assess the organizational efficiency in training corporations.
MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. We propose an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. Our solution seam...
详细信息
ISBN:
(纸本)9781509053827
MapReduce is a widely adopted computing framework for data-intensive applications running on clusters. We propose an approach to exploit data parallelisms in XML processing using MapReduce in Hadoop. Our solution seamlessly integrates data storage, labelling, indexing, and parallel queries to process a massive amount of XML data. Specifically, we introduce an SDN labelling algorithm and a distributed hierarchical index using DHTs, we develop an efficient data retrieval approach called B-SLCA. More importantly, we design an advanced two-phase MapReduce solution that is able to efficiently address the issues of labelling, indexing, and query processing on big XML data. We implemented our solution on a real-world Hadoop cluster processing the real-world datasets. Our experimental results show that SDN outperforms NCIM by up to a factor of 1.36 with an average of 1.17, our BSLCA outperforms BwdSLCA by up to a factor of 1.96 with an average of 1.2.
The Automata Processor is a new accelerator technology that supports direct hardware implementation of a set of non-deterministic finite automata over a streaming input, and is designed for complex string pattern matc...
详细信息
ISBN:
(纸本)9781509021413
The Automata Processor is a new accelerator technology that supports direct hardware implementation of a set of non-deterministic finite automata over a streaming input, and is designed for complex string pattern matching applications. In this paper, we broaden the scope of this architecture beyond its primary design goal, by developing algorithmic techniques to solve problems on unweighted graphs. We present a strategy to represent nodes and edges in a graph using strings, and use this transformation to develop algorithms for several classic graph problems including finding Hamiltonian paths and cycles, connected components, and breadth-first search. Our algorithms rely on a core set of automata building blocks which we designed for this purpose, and illustrate various design considerations that developers must bear in mind when harnessing this new technology. We expect that this work provides the foundations for solving graph problems using the Automata Processor.
The advances in location-acquisition technologies have generated massive spatio-temporal trajectory data, which represent the mobility of a diversity of moving objects over time, such as people, vehicles, and animals....
详细信息
ISBN:
(纸本)9781467390064
The advances in location-acquisition technologies have generated massive spatio-temporal trajectory data, which represent the mobility of a diversity of moving objects over time, such as people, vehicles, and animals. Discovery of traveling companions on trajectory data has many real-world applications. Most of existing discovery approaches are limited to centralized computing, while these techniques for handling large-scale trajectory data require considerable performance improvement. parallel computing essentially provides an alternative method for handling this problem. In this work, we first present the design and implementation of both batch and streaming gathering patterns discovery algorithm in a distributedparallel computing fashion. Afterwards, we further propose several optimization techniques for efficient computation. Finally we conduct extensive experiments based on a public dataset to evaluate the efficiency of our approaches and effectiveness of optimizations using Amazon EC2 clusters.
Deterministic and reproducible program execution eases the development and debugging of distributed systems. However, deterministic execution comes at high performance costs and is hard to achieve, especially when run...
详细信息
Deterministic and reproducible program execution eases the development and debugging of distributed systems. However, deterministic execution comes at high performance costs and is hard to achieve, especially when running on different hardware. In this paper we introduce the concept of application-level determinism and describe how the parallel programming model Spawn & Merge can be used for scalable and deterministic distributed computation. Application-level deterministic applications yield reproducible deterministic results independent of the number of nodes participating in the computation, even though intermediate tasks may be executed in an unpredictable schedule. To achieve consistency independent of the order in which operations have been applied we present a new Operational Transformation algorithm, which mitigates the performance loss of introducing determinism with Spawn & Merge. We show that such deterministic processing can scale across a cluster of compute nodes and discuss for which kind of workload the programming model is feasible. Furthermore, for high and low workloads, we evaluate the cost of adding determinism to be 28% and 40% higher than perfect parallel computation.
In several NoSQL database systems, among which is HBase, only one index is available for the tables, which is also the row key and the clustered index. Using other indexes does not come out of the box. As a result, th...
详细信息
In several NoSQL database systems, among which is HBase, only one index is available for the tables, which is also the row key and the clustered index. Using other indexes does not come out of the box. As a result, the row key design is the most important thing when designing tables, because an inappropriate design can lead to detrimental consequences on performances and costs. Particular row key designs are suitable for different problems, and in this paper we analyze the performance, characteristics and applicability of each of them. In particular we investigate the effect of using various techniques for modeling row keys: sequences, salting, padding, hashing, and modulo operations. We propose four different designs based on these techniques and we analyze their performance on different HBase clusters when loading HDFS files with various sizes. The experiments show that particular designs consistently outperform others on differently sized clusters in both execution time and even load distribution across nodes.
We present a novel trace-based analysis tool that rapidly classifies an MPI application as bandwidth-bound, latency-bound, load-imbalance-bound, or computation-bound for different interconnection networks. The tool us...
详细信息
ISBN:
(纸本)9781509021413
We present a novel trace-based analysis tool that rapidly classifies an MPI application as bandwidth-bound, latency-bound, load-imbalance-bound, or computation-bound for different interconnection networks. The tool uses an extension of Lamport's logical clock to track application progress in the trace replay. Ithas two unique features. First, it predicts application performance for many latency and bandwidth parameters from a single replay of the trace. Second, it infers the performance characteristics of an application and classifies the application using the predicted performance trend for a range of network configurations instead of using the predicted performance for a particular network configuration. We describe the techniques used in the tool and its design and implementation, and report our performance study of the tool and our experience with classifying nine applications and mini-apps from the DOE Design Forward project as well as the NAS parallel Benchmarks.
Conventional High Level Synthesis (HLS) tools mainly target compute intensive kernels typical of digital signal processingapplications. We are developing techniques and architectural templates to enable HLS of data a...
详细信息
ISBN:
(纸本)9781509035908
Conventional High Level Synthesis (HLS) tools mainly target compute intensive kernels typical of digital signal processingapplications. We are developing techniques and architectural templates to enable HLS of data analytics applications. These applications are memory intensive, present fine-grained, unpredictable data accesses, and irregular, dynamic task parallelism. We discuss an architectural template based around a distributed controller to efficiently exploit thread level parallelism. We present a memory interface that supports parallel memory subsystems and enables implementing atomic memory operations. We introduce a dynamic task scheduling approach to efficiently execute heavily unbalanced workload. The templates are validated by synthesizing queries from the Lehigh University Benchmark (LUBM), a well know SPARQL benchmark.
暂无评论