this paper presents two alternative NLP architectures to analyze massive amounts of documents, using parallelprocessing. the two architectures focus on different processing scenarios, namely batch-processing and stre...
详细信息
ISBN:
(纸本)9782951740891
this paper presents two alternative NLP architectures to analyze massive amounts of documents, using parallelprocessing. the two architectures focus on different processing scenarios, namely batch-processing and streaming processing. the batch-processing scenario aims at optimizing the overall throughput of the system, i.e., minimizing the overall time spent on processing all documents. the streaming architecture aims to minimize the time to process real-time incoming documents and is therefore especially suitable for live feeds. the paper presents experiments with botharchitectures, and reports the overall gain when they are used for batch as well as for streaming processing. All the software described in the paper is publicly available under free licenses.
the proceedings contain 59 papers. the special focus in this conference is on Applications of parallel and Distributed Computing. the topics include: On exploring a virtual agent negotiation inspired approach for rout...
ISBN:
(纸本)9783319271361
the proceedings contain 59 papers. the special focus in this conference is on Applications of parallel and Distributed Computing. the topics include: On exploring a virtual agent negotiation inspired approach for route guidance in urban traffic networks;optimization of binomial option pricing on intel MIC heterogeneous system;stencil computations on HPC-oriented ARMv8 64-bit multi-core processor;a particle swarm optimization algorithm for controller placement problem in software defined network;a streaming execution method for multi-services in mobile cloud computing;economy-oriented deadline scheduling policy for render system using IaaS cloud;towards detailed tissue-scale 3D simulations of electrical activity and calcium handling in the human cardiac ventricle;task parallel implementation of matrix multiplication on multi-socket multi-core architectures;refactoring for separation of concurrent concerns;exploiting scalable parallelism for remote sensing analysis models by data transformation graph;resource-efficient vibration data collection in cyber-physical systems;a new approach for vehicle recognition and tracking in multi-camera traffic system;a scalable distributed fingerprint identification system;energy saving and load balancing for SDN based on multi-objective particle swarm optimization;pre-stack kirchhoff time migration on hadoop and spark;a cyber physical system with GPU for CNC applications;a solution of the controller placement problem in software defined networks;parallel column subset selection of kernel matrix for scaling up support vector machines;real-time deconvolution with GPU and spark for big imaging data analysis and parallel kirchhoff pre-stack depth migration on large high performance clusters.
the proceedings contain 52 papers. the special focus in this conference is on Big Data and Its Applications. the topics include: Preference-aware HDFS for hybrid storage;urban traffic congestion prediction using float...
ISBN:
(纸本)9783319271217
the proceedings contain 52 papers. the special focus in this conference is on Big Data and Its Applications. the topics include: Preference-aware HDFS for hybrid storage;urban traffic congestion prediction using floating car trajectory data;a metadata cooperative caching architecture based on SSD and DRAM for file systems;parallel training GBRT based on kmeans histogram approximation for big data;an intelligent clustering algorithm based on mutual reinforcement;an effective method for gender classification with convolutional neural networks;a highly efficient indexing and retrieving method for astronomical big data of time series images;specialized FPGA-based accelerator architecture for data-intensive k-means algorithms;effectively identifying hot data in large-scale I/O streams with enhanced temporal locality;a search-efficient hybrid storage system for massive text data;enhancing parallel data loading for large scale scientific database;tradeoff between the price of distributing a database and its collusion resistance based on concatenated codes;a mapreduce reinforced distributed sequential pattern mining algorithm;a fast documents classification method based on simhash;identification of natural images and computer generated graphics using multi-fractal differences of PRNU;enriching document representation withthe deviations of word co-occurrence frequencies;big data analytics and visualization with spatio-temporal correlations for traffic accidents;a novel app recommendation method based on SVD and social influence;a segmentation hybrid index structure for temporal data and a refactored content-aware host-side SSD cache.
this paper presents an optimizing methodology for the implementation of a Learning Vector Quantization (LVQ) Artificial Neural Network (ANN). Starting from a high-level algorithmic specification, we suggest a design m...
详细信息
this paper presents an optimizing methodology for the implementation of a Learning Vector Quantization (LVQ) Artificial Neural Network (ANN). Starting from a high-level algorithmic specification, we suggest a design methodology of the LVQ-dedicated architecture. Our approach is based on the creation of partially parallelarchitectures to optimize the performance of our ANN for different topologies. In this manuscript, we used parallelism for the supervisor part of the application, which is responsible for calculating minimum distances, weights and labels, to solve the problems of application delay and the power consumed. In this work we integrate the Partial Dynamic Reconfiguration (PDR) to facilitate the use of different architectures by the user. the latter can implement the architecture and performance that suits their need. therefore, our approach has reduced the latency of parallelarchitectures with respect to the sequential architecture of an LVQ for variable topologies. To validate our approach, the optimized LVQ implementation was tried on the Zynq device.
the proceedings contain 58 papers. the special focus in this conference is on parallel and Distributed architectures. the topics include: parallelizing block cryptography algorithms on speculative multicores;performan...
ISBN:
(纸本)9783319271187
the proceedings contain 58 papers. the special focus in this conference is on parallel and Distributed architectures. the topics include: parallelizing block cryptography algorithms on speculative multicores;performance characterization and optimization for intel xeon phi coprocessor;an extended MDS code to improve single write performance of disk arrays for correcting triple disk failures;a distributed location-based service discovery protocol for vehicular ad-hoc networks;unified virtual memory support for deep CNN accelerator on SoC FPGA;dynamic time slice scheduler for virtual machine monitor;memory-aware NoC application mapping based on adaptive genetic algorithm;a study on non-volatile 3D stacked memory for big data applications;parallel implementation of dense optical flow computation on many-core processor;a power-conserving online scheduling scheme for video streaming services;prevent deadlock and remove blocking for self-timed systems;improving the memory efficiency of in-memory mapreduce based HPC systems;dual best-first search mapping algorithm for shared-cache multicore processors;energy efficient network-on-chip router with heterogeneous virtual channels;availability and network-aware mapreduce task scheduling over the internet;an optimized algorithm based on CRS codes in big data storage systems;quantum computer simulation on multi-GPU incorporating data locality;query execution optimization based on incremental update in database distributed middleware;coding-based cooperative caching in data broadcast environments;usage history-directed power management for smartphones;a mobile application distribution method and a clustering algorithm based on rough sets for the recommendation domain in trust-based access control.
parallel computing architectures like GPUs have traditionally been used to accelerate applications with dense and highly-structured workloads; however, many important applications in science and engineering are irregu...
详细信息
parallel computing architectures like GPUs have traditionally been used to accelerate applications with dense and highly-structured workloads; however, many important applications in science and engineering are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Numerical simulation of charged particle beam dynamics is one such application where the distribution of work and data in the accurate computation of collective effects at each time step is irregular and exhibits control-flow and memory access patterns that are not readily amenable to GPU's architecture. algorithms withthese properties tend to present both significant branch and memory divergence on GPUs which leads to severe performance *** present a novel cache-aware algorithm that uses machine learning to address this problem. the algorithm presented here uses supervised learning to adaptively model and track irregular access patterns in the computation of collective effects at each time step of the simulation to anticipate the future control-flow and data access patterns. Access pattern forecast are then used to formulate runtime decisions that minimize branch and memory divergence on GPUs, thereby improving the performance of collective effects computation at a future time step based on the observations from earlier time steps. Experimental results on NVIDIA Tesla K40 GPU shows that our approach is effective in maximizing data reuse, ensuring workload balance among parallelthreads, and in minimizing both branch and memory divergence. Further, the parallel implementation delivers up to 485 Gflops of double precision performance, which translates to a speedup of up to 2.5X compared to the fastest known GPU implementation.
In molecular biology, the interaction mechanisms between microRNAs (miRNAs) withtheir messenger RNA targets are poorly understood. this is the reason why many miRNA-target prediction methods are available, but their ...
详细信息
ISBN:
(纸本)9781509060597
In molecular biology, the interaction mechanisms between microRNAs (miRNAs) withtheir messenger RNA targets are poorly understood. this is the reason why many miRNA-target prediction methods are available, but their results are often inconsistent. A lot of efforts focus on the quality of the sequence match between miRNA and target rather than on the role of the mRNA secondary structure in which the target is embedded. Nonetheless, it is known that the miRNA secondary structures contribute to target recognition, because there is an energetic cost to freeing base-pairing interactions within mRNA to make the target accessible for miRNA binding. this approach is implemented by PITA (Probability of Interaction by Target Accessibility), a very computational-intensive tool that is able to provide accurate results even when little is know about the conservation of the miRNA. In this paper we propose a new implementation of PITA, called lPITA, able to exploit a coarse-grained parallelism over low power architectures to reduce both execution times and the power consumption.
the proceedings contain 59 papers. the special focus in this conference is on Software Systems, Programming Models, Performance Modeling and Evaluation. the topics include: A scalable fault-tolerance programing model ...
ISBN:
(纸本)9783319271392
the proceedings contain 59 papers. the special focus in this conference is on Software Systems, Programming Models, Performance Modeling and Evaluation. the topics include: A scalable fault-tolerance programing model on MIC cluster;multi-chunk redundant array of independent SSDS with improved performance;an energy efficient storage system for astronomical observation data on dome a;parallel aware hybrid solid-state storage;automatic optimization of software transactional memory through linear regression and decision tree;a data-centric tool to improve the performance of multithreaded program on NUMA;a light-weight hot data identification scheme via grouping-based LRU lists;towards interactive programming withparallel linear algebra in r;enhancing i/o scheduler performance by exploiting internal parallelism of SSDS;a performance and scalability analysis of the MPI based tools utilized in a large ice sheet model executing in a multicore environment;an efficient algorithm for a generalized LCS problem;on exploring a quantum particle swarm optimization method for urban traffic light scheduling;identifying repeated interleavings to improve the efficiency of concurrency bug detection;global reliability evaluation for cloud storage systems with proactive fault tolerance;performance evaluation and optimization of Wi-Fi display on android;a novel scheduling algorithm for file fetch in transparent computing;joint power and reduced spectral leakage-based resource allocation for d2d communications in 5G and an optimization strategy of energy consumption for data transmission based on optimal stopping theory in mobile networks.
GPUs are an important hardware development platform for problems where massive parallel computations are needed. Many of these problems require a higher precision than the standard double floating-point (FP) available...
详细信息
ISBN:
(纸本)9781509015030
GPUs are an important hardware development platform for problems where massive parallel computations are needed. Many of these problems require a higher precision than the standard double floating-point (FP) available. One common way of extending the precision is the multiple-component approach, in which real numbers are represented as the unevaluated sum of several standard machine precision FP *** representation is called a FP expansion and it offers the simplicity of using directly available and highly optimized FP operations. In this article we present new data-parallelalgorithms for adding and multiplying FP expansions specially designed for extended precision computations on GPUs. these are generalized algorithmsthat can manipulate FP expansions of different sizes (from double-double up to a few tens of doubles) and ensure a certain worst case error bound on the results.
Field-programmable gate array (FPGAs) are classified as high efficient computational execution platform. However their limited density makes them not suitable for highly demanding algorithms. the Partial Dynamic Recon...
详细信息
ISBN:
(纸本)9781450347860
Field-programmable gate array (FPGAs) are classified as high efficient computational execution platform. However their limited density makes them not suitable for highly demanding algorithms. the Partial Dynamic Reconfiguration (PDR) concept overcomes this problem by rising the FPGA reuse from one task to another. Nonetheless, the scheduling on partially dynamically reconfigurable architectures involves several degrees of freedom and hardware constraints to be managed, which makes the PDR more challenging. In this paper we propose a task clustering approach to optimize the scheduling on PDR FPGAs. By clustering tasks, the approach moves the optimizing overhead from hardware side into the software side, which is largely simple.
暂无评论