this paper presents an SSD-based Block I/O Scheduler, short for SBIOS. SBIOS fully exploits the internal parallelism to improve the system performance. It dispatches the read requests to different blocks to make full ...
详细信息
the proceedings contain 50 papers. the topics discussed include: FINGER: a novel erasure coding scheme using fine granularity blocks to improve Hadoop write and update performance;a virtual shared metadata storage for...
ISBN:
(纸本)9781467378918
the proceedings contain 50 papers. the topics discussed include: FINGER: a novel erasure coding scheme using fine granularity blocks to improve Hadoop write and update performance;a virtual shared metadata storage for HDFS;a time-efficient connected densest subgraph discovery algorithm for big data;caching on dual-mode flash memory;cost-effectively improving life endurance of MLC NAND Flash SSDs via hierarchical data redundancy and heterogeneous flash memory;adaptive video streaming uploading with moving prediction in VANETs scenarios;direct device-to-device transfer protocol: a new look at the benefits of a decentralized I/O model;a regional popularity-aware cache replacement algorithm to improve the performance and lifetime of SSD-based disk cache;and a novel optimization algorithm for Chien Search of BCH codes in NAND flash memory devices.
As we all know, application firewall provides in-depth inspection to ensure application-layer security services, but brings a serious decline for network performance of application service, even more serious impact on...
详细信息
Software development for multicore or multiprocessor systems is complex and error prone. the development of sequential source code is familiar and a proven procedure. the model presented in this paper can help to make...
详细信息
Software development for multicore or multiprocessor systems is complex and error prone. the development of sequential source code is familiar and a proven procedure. the model presented in this paper can help to make use of current hardware architectures with existing software. this model makes a change in execution possible and could lead to hardware software co-design. this paper presents a way to decompose sequential software. the characteristics are used for new arrangements of the fragments and in software visualization. the visualization supports the developers in understanding internal dependencies. Some improvements in visual presentation by reducing complexity are discussed.
An interesting challenge in E-health is to perform real-time diagnosis. In many distributed computing systems the data processing stage, generally assigned on standard computational CPU environments, is a critical asp...
详细信息
ISBN:
(纸本)9781467394741
An interesting challenge in E-health is to perform real-time diagnosis. In many distributed computing systems the data processing stage, generally assigned on standard computational CPU environments, is a critical aspect. In particular, the analysis of magnetic resonance imaging (MRI) for improving the quality of images and helping the diagnosis requires an high computational complexity. Using Graphics processing Units (GPUs) on High Performance Computing (HPC), the images processing step can be accelerated by speeding the whole diagnosis procedure. In this paper, we propose a parallel algorithm, on a GPU environment, for MRI denoising in order to make the diagnostic system more efficient. As case study, we consider the Optimized Blockwise Non Local Means (OB-NLM) method. Its intrinsic nature makes it perfectly suited for parallelization and multithreading implementation, especially for GPUs architectures. the results show a significant improvement of the entire healthcare practice procedure in terms of performances.
We propose an ensemble scheme with a parallel computational structure which we call Distributed Ensemble Support Vector Machine (DESVM) to overcome the difficulties of large scale nonlinear Support Vector Machines (SV...
详细信息
ISBN:
(纸本)9781479980550
We propose an ensemble scheme with a parallel computational structure which we call Distributed Ensemble Support Vector Machine (DESVM) to overcome the difficulties of large scale nonlinear Support Vector Machines (SVMs) in practice. the dataset is split into many stratified partitions. Each partition might be still too large to be solved by using conventional SVM solvers. We apply the reduced kernel trick to generate a nonlinear SVM classifier for each partition that can be treated as an approximation model based on the partial dataset. then, we use a linear SVM classifier to fuse the nonlinear SVM classifiers that are generated from all data partitions. In this linear SVM training model, we treat each nonlinear SVM classifier as an "attribute" or an "expert". In the ensemble phase, DESVM generates a fusion model which is a weighted combination of the nonlinear SVM classifiers. It can be explained as a weighted voting decision made by a group of experts. We test our proposed method on five benchmark datasets. the numerical results show that DESVM is competitive in accuracy and has a high speed-up. thus, DESVM can be a powerful tool for binary classification problems with large scale not linearly separable datasets.
In-place data manipulation is very desirable in many-core architectures with limited on-board memory. this paper deals withthe in-place implementation of a class of primitives that perform data movements in one direc...
详细信息
In-place data manipulation is very desirable in many-core architectures with limited on-board memory. this paper deals withthe in-place implementation of a class of primitives that perform data movements in one direction. We call these primitives Data Sliding (DS) algorithms. Notable among them are relational algebra primitives (such as select and unique), padding to insert empty elements in a data structure, and stream compaction to reduce memory requirements. their in-place implementation in a bulk synchronous parallel model, such as GPUs, is specially challenging due to the difficulties in synchronizing threads executing on different compute units. Using a novel adjacent work-group synchronization technique, we propose two algorithmic schemes for regular and irregular DS algorithms. With a set of 5 benchmarks, we validate our approaches and compare them to the state-of-the-art implementations of these benchmarks. Our regular DS algorithms demonstrate up to 9.11x and 73.25x on NVIDIA and AMD GPUs, respectively, the throughput of their competitors. Our irregular DS algorithms outperform NVIDIA thrust library by up to 3.24x on the three most recent generations of NVIDIA GPUs.
the proceedings contain 50 papers. the special focus in this conference is on Architecture, Modeling, Tools, Applications, Network-on-a-Chip, Cryptography Applications and Extended Abstracts. the topics include: Reduc...
ISBN:
(纸本)9783319162133
the proceedings contain 50 papers. the special focus in this conference is on Architecture, Modeling, Tools, Applications, Network-on-a-Chip, Cryptography Applications and Extended Abstracts. the topics include: Reducing storage costs of reconfiguration contexts by sharing instruction memory cache blocks;a vector caching scheme for streaming FPGA SpMV accelerators;hardware synthesis from functional embedded domain-specific languages;operand-value-based modeling of dynamic energy consumption of soft processors in FPGA;a fully parallel particle filter architecture for FPGAs;Teach advanced reconfigurable architectures and tools;dynamic memory management in vivado-HLS for scalable many-accelerator architectures;place and route tools for the mitigation of single event transients on flash-based FPGAs;advanced systemC tracing and analysis framework for extra-functional properties;run-time partial reconfiguration simulation framework based on dynamically loadable components;architecture virtualization for run-time hardware multithreading on field programmable gate arrays;survey on real-time network-on-chip architectures;hardware benchmarking of cryptographic algorithms using high-level synthesis tools;an efficient and flexible FPGA implementation of a face detection system;a dynamically reconfigurable mixed analog-digital filter bank;a timing driven cycle-accurate simulation for coarse-grained reconfigurable architectures;a novel concept for adaptive signal processing on reconfigurable hardware;modular acquisition and stimulation system for timestamp-driven neuroscience experiments;DRAM row activation energy optimization for stride memory access on FPGA-based systems;acceleration of data streaming classification using reconfigurable technology;partial reconfiguration for dynamic mapping of task graphs onto 2D mesh platform and a challenge of portable and high-speed FPGA accelerator.
To provide timely results for ‘Big Data Analytics’, it is crucial to satisfy deadline requirements for MapReduce jobs in production environments. In this paper, we propose a deadline-oriented task scheduling approac...
详细信息
Traffic congestion prediction is an important precondition to promote urban sustainable development. Nevertheless, there is a lack of a unified prediction method to address the performance metrics, such as accuracy, i...
详细信息
暂无评论