检索结果-内蒙古大学图书馆

dMazeRunner: Executing Perfectly Nested Loops on Dataflow Accelerators

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS 2019年第5期18卷 70-70页

作者： Dave, Shail Kim, Youngbin Avancha, Sasikanth Lee, Kyoungwoo Shrivastava, Aviral Arizona State Univ Compiler Microarchitecture Lab Sch Comp Informat & Decis Syst Engn Tempe AZ 85281 USA Yonsei Univ Seoul South Korea Intel Labs Parallel Comp Lab Bangalore Karnataka India

Dataflow accelerators feature simplicity, programmability, and energy-efficiency and are visualized as a promising architecture for accelerating perfectly nested loops that dominate several important applications, including image and media processing and deep learning. Although numerous accelerator designs are being proposed, how to discover the most efficient way to execute the perfectly nested loop of an application onto computational and memory resources of a given dataflow accelerator (execution method) remains an essential and yet unsolved challenge. In this paper, we propose dMazeRunner - to efficiently and accurately explore the vast space of the different ways to spatiotemporally execute a perfectly nested loop on dataflow accelerators (execution methods). The novelty of dMazeRunner framework is in: i) a holistic representation of the loop nests, that can succinctly capture the various execution methods, ii) accurate energy and performance models that explicitly capture the computation and communication patterns, data movement, and data buffering of the different execution methods, and iii) drastic pruning of the vast search space by discarding invalid solutions and the solutions that lead to the same cost. Our experiments on various convolution layers (perfectly nested loops) of popular deep learning applications demonstrate that the solutions discovered by dMazeRunner are on average 9.16x better in Energy-Delay-Product (EDP) and 5.83x better in execution time, as compared to prior approaches. With additional pruning heuristics, dMazeRunner reduces the search time from days to seconds with a mere 2.56% increase in EDP, as compared to the optimal solution.

关键词： Coarse-grained reconfigurable array dataflow deep neural networks loop optimization energy-efficiency systolic arrays mapping analytical model design space exploration

来源：评论

学校读者我要写书评

暂无评论

Red Blood Cells Segmentation: A Fully Convolutional Network Approach

Red Blood Cells Segmentation: A Fully Convolutional Network ...

引用

IEEE International Conference on Big Data and Cloud Computing (BdCloud)

作者： Ario Sadafi Martin Radolko Iosif Serafeimidis Steffen Hadlak Arivis AG Munich Germany

ISBN: (纸本)9781728111421;9781728111414

Red blood cell segmentation in microscopic images is the first step for various clinical studies carried out on blood samples such as cell counting, cell shape identification, etc. Conventional methods while often showing a high accuracy are heavily depending on the acquisition modality. Deep learning approaches have shown to be more robust regarding such modalities and still showing a comparable accuracy. In this paper, we first investigate necessary steps to apply a specific type of deep learning methods, namely fully convolutional networks, to red blood cell segmentation. Based on data given and constraints imposed by our partners mainly regarding a high throughput of their data we then describe an exemplary application. First results show, that even with a focus on high performance a good accuracy above 90% can be reached.

关键词： image segmentation Training Shape Deep learning Red blood cells image analysis

来源：评论

学校读者我要写书评

暂无评论

OFL: A language for Cloud Orchestration 15

OFL: A language for Cloud Orchestration

引用

15th IEEE International Symposium on parallel and distributed processing with Applications (ISPA) / 16th IEEE International Conference on Ubiquitous Computing and Communications (IUCC)

作者： Amato, Flora Castiglione, Aniello Mazzocca, Nicola Moscato, Francesco Moscato, Vincenzo Univ Naples Federico II DIETI Naples NA Italy Univ Salerno Dept Comp Sci Salerno Italy Univ Campania Luigi Vanvitelli Dept Sci Polit Caserta Italy

ISBN: (纸本)9781538637906

The increasing complexity of Cloud Architecture and the introduction of new paradigms like Internet of Things introduced the problem of creating Value Added Services by composition, not only of Resources, but of Services too. In this work we describe an architectural solution for Orchestration at all Cloud Layers. Here we described a language for orchestrating both resources and services in Cloud. The language manages composition of services and resources in order to create composite service based on Cloud Design Patterns. It is based on a Workflow language for description of composition and it enables verification of composite services by means of Model Driven Engineering techniques, providing a precious and easy-to-use tool for Cloud Engineering.

关键词： Cloud Formal methods Orchestration Language

来源：评论

学校读者我要写书评

暂无评论

An Efficient parallel Multi-Scale Segmentation Method for Remote Sensing imagery

引用

REMOTE SENSING 2018年第4期10卷 590-590页

作者： Gu, Haiyan Han, Yanshun Yang, Yi Li, Haitao Liu, Zhengjun Soergel, Uwe Blaschke, Thomas Cui, Shiyong Chinese Acad Surveying & Mapping Inst Photogrammetry & Remote Sensing 28 Lianhuachi Rd Beijing 100830 Peoples R China Univ Stuttgart Inst Photogrammetry Geschwister Scholl Str 24D D-70174 Stuttgart Germany Univ Salzburg Dept Geoinformat GIS Z Schillerstr 30 A-5020 Salzburg Austria German Aerosp Ctr DLR EOC Remote Sensing Technol Inst IMF D-82234 Wessling Germany

Remote sensing (RS) image segmentation is an essential step in geographic object-based image analysis (GEOBIA) to ultimately derive "meaningful objects". While many segmentation methods exist, most of them are not efficient for large data sets. Thus, the goal of this research is to develop an efficient parallel multi-scale segmentation method for RS imagery by combining graph theory and the fractal net evolution approach (FNEA). Specifically, a minimum spanning tree (MST) algorithm in graph theory is proposed to be combined with a minimum heterogeneity rule (MHR) algorithm that is used in FNEA. The MST algorithm is used for the initial segmentation while the MHR algorithm is used for object merging. An efficient implementation of the segmentation strategy is presented using data partition and the " reverse searching-forward processing" chain based on message passing interface (MPI) parallel technology. Segmentation results of the proposed method using images from multiple sensors (airborne, SPECIM AISA EAGLE ii, WorldView-2, RADARSAT-2) and different selected landscapes (residential/industrial, residential/agriculture) covering four test sites indicated its efficiency in accuracy and speed. We conclude that the proposed method is applicable and efficient for the segmentation of a variety of RS imagery (airborne optical, satellite optical, SAR, high-spectral), while the accuracy is comparable with that of the FNEA method.

关键词： remote sensing image segmentation geographic object-based image analysis graph theory fractal net evolution approach minimum spanning tree minimum heterogeneity rule message passing interface

来源：评论

学校读者我要写书评

暂无评论

Q-Graph: Preserving query locality in multi-query graph processing

arXiv

引用

arXiv 2018年

作者： Mayer, Christian Mayer, Ruben Grunert, Jonas Rothermel, Kurt Tariq, Muhammad Adnan Institute of Parallel and Distributed Systems University of Stuttgart Germany

Arising user-centric graph applications such as route planning and personalized social network analysis have initiated a shift of paradigms in modern graph processing systems towards multiquery analysis, i.e., processing multiple graph queries in parallel on a shared graph. These applications generate a dynamic number of localized queries around query hotspots such as popular urban areas. However, existing graph processing systems are not yet tailored towards these properties: The employed methods for graph partitioning and synchronization management disregard query locality and dynamism which leads to high query latency. To this end, we propose the system Q-Graph for multi-query graph analysis that considers query locality on three levels. (i) The query-aware graph partitioning algorithm Q-cut maximizes query locality to reduce communication overhead. (ii) The method for synchronization management, called hybrid barrier synchronization, allows for full exploitation of local queries spanning only a subset of partitions. (iii) Both methods adapt at runtime to changing query workloads in order to maintain and exploit locality. Our experiments show that Q-cut reduces average query latency by up to 57 percent compared to static query-agnostic partitioning algorithms. Copyright © 2018, The Authors. All rights reserved.

关键词： Graph theory

来源：评论

学校读者我要写书评

暂无评论

An unbiased estimator for the ellipticity from image moments

引用

MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY 2017年第1期471卷 L57-L60页

作者： Tessore, Nicolas Univ Manchester Jodrell Bank Ctr Astrophys Alan Turing BldgOxford Rd Manchester M13 9PL Lancs England

An unbiased estimator for the ellipticity of an object in a noisy image is given in terms of the image moments. Three assumptions are made: (i) the pixel noise is normally distributed, although with arbitrary covariance matrix;(ii) the image moments are taken about a fixed centre;and (iii) the point spread function is known. The relevant combinations of image moments are then jointly normal and their covariance matrix can be computed. A particular estimator for the ratio of the means of jointly normal variates is constructed and used to provide the unbiased estimator for the ellipticity. Furthermore, an unbiased estimate of the covariance of the new estimator is also given.

关键词： gravitational lensing: weak methods: statistical techniques: image processing

来源：评论

学校读者我要写书评

暂无评论

image Deno sing via Multiscale Nonlinear Diffusion Models

引用

SIAM JOURNAL ON IMAGING SCIENCES 2017年第3期10卷 1234-1257页

作者： Feng, Wensen Qiao, Peng Xi, Xuanyang Chen, Yunjin Shenzhen Univ Coll Comp Sci & Software Engn Shenzhen Peoples R China Natl Univ Def Technol Sch Comp Natl Lab Parallel & Distributed Proc Changsha 410073 Hunan Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China ULSee Inc Hangzhou 310016 Zhejiang Peoples R China

image denoising is a fundamental operation in image processing and holds considerable practical importance for various real-world applications. Arguably several thousands of papers are dedicated to image denoising. In the past decade, state-of-the-art denoising algorithms have been clearly dominated by nonlocal patch-based methods, which explicitly exploit patch self-similarity within the targeted image. However, in the past two years, discriminatively trained local approaches have started to outperform previous nonlocal models and have been attracting increasing attention due to the additional advantage of computational efficiency. Successful approaches include cascade of shrinkage fields (CSF) and trainable nonlinear reaction diffusion (TNRD). These two methods are built on the filter response of linear filters of small size using feed forward architectures. Due to the locality inherent in local approaches, the CSF and TNRD models become less effective when the noise level is high and consequently introduce some noise artifacts. In order to overcome this problem, in this paper we introduce a multiscale strategy. To be specific, we build on our newly developed TNRD model, adopting the multiscale pyramid image representation to devise a multiscale nonlinear diffusion process. As expected, all the parameters in the proposed multiscale diffusion model, including the filters and the influence functions across scales, are learned from training data through a loss-based approach. Numerical results on Gaussian and Poisson denoising substantiate that the exploited multiscale strategy can successfully boost the performance of the original TNRD model with a single scale. As a consequence, the resulting multiscale diffusion models can significantly suppress the typical incorrect features for those noisy images with heavy noise. It turns out that multiscale TNRD variants achieve better performance than state-of-the-art denoising methods.

关键词： image denoising multiscale pyramid image representation trainable nonlinear reaction diffusion model Gaussian denoising Poisson denoising

来源：评论

学校读者我要写书评

暂无评论

On-Demand processing for Remote Sensing Big Data Analysis 15

On-Demand Processing for Remote Sensing Big Data Analysis

引用

15th IEEE International Symposium on parallel and distributed processing with Applications (ISPA) / 16th IEEE International Conference on Ubiquitous Computing and Communications (IUCC)

作者： Huang, Zhenchun Zhong, Anrun Li, Guoqing Tsinghua Univ Dept Comp Sci & Technol Beijing Peoples R China Chinese Acad Sci Inst Remote Sensing & Digital Earth Beijing Peoples R China

ISBN: (纸本)9781538637906

In the recent decades, remote sensing data are rapidly growing in size and variety, and considered as "big geo data" because of their huge data volume, significant heterogeneity and challenge of fast analysis. In the traditional remote sensing analysis workflows, the data transfer for downloading raw image files to local workstations often costs a lot of time and slows down the data analysis workflows. Because results of remote sensing data analysis models are usually much smaller than raw data to be processed, "on-demand processing", which tries to upload data analysis models and execute them "near" where data stores, can significantly accelerate the execution of remote sensing analysis workflows. In this paper, a framework for on-demand remote sensing data analysis is proposed based on three-layered architecture;XML/JSON based runtime environment description;and on-demand model deployment methods. The evaluation on a prototype system shows that on-demand processing framework accelerates the execution of analysis models in 2.8 similar to 12.7 times by reducing data transfers, especially for those analysis workflows which transfer data through low bandwidth Internet. By on-demand processing, classical remote sensing data service systems can evolve into remote sensing data processing infrastructures, which provide IaaS (Infrastructure-as-a-Service) and PaaS (Platform-as-a Service) services, and make it possible to exchange knowledge among scientists by sharing models. Furthermore, a remote sensing data analysis platform for carbon satellites is designed based on the on-demand processing proposed by this paper and will soon be implemented under the support of SunWay-TaihuLight, the world's most powerful super computer.

关键词： remote sensing data analysis on-demand processing big geo data

来源：评论

学校读者我要写书评

暂无评论

parallelization of Synthetic Aperture Radar (SAR) image Formation Algorithm 1st

Parallelization of Synthetic Aperture Radar (SAR) Image Form...

引用

1st International Conference on Computational Intelligence and Informatics (ICCii)

作者： Rao, P. V. R. R. Bhogendra Shashank, S. S. Def Res & Dev Lab Hyderabad Andhra Pradesh India BrahMos Aerosp Private Ltd Hyderabad Andhra Pradesh India

ISBN: (纸本)9789811024719;9789811024702

Synthetic aperture radar (SAR)-based platforms have to process increasingly large number of complex floating-point operations and have to meet hard real-time deadlines. However, real-time use of SAR is severely restricted by computation time taken for image formation. One of the classical methods of reducing this computation time to make it suitable for real-time application is multi-processing. A successful attempt has been made by the authors to develop and test a parallel algorithm for synthetic aperture radar image formation, and the results are presented in this paper.

关键词： Synthetic aperture radar SAR image formation Digital signal processing parallel computing distributed computing

来源：评论

学校读者我要写书评

暂无评论

Pattern Learning Based parallel Ant Colony Optimization 15

Pattern Learning Based Parallel Ant Colony Optimization

引用

15th IEEE International Symposium on parallel and distributed processing with Applications (ISPA) / 16th IEEE International Conference on Ubiquitous Computing and Communications (IUCC)

作者： Jin, Xiaotian Zheng, Wenbo Mo, Shaocong Qu, Yili Jin, Xin Zhou, Jiangwei Duan, Pengfei Zheng, Tao Wuhan Univ Technol Sch Comp Sci & Technol Wuhan 430070 Hubei Peoples R China Xi An Jiao Tong Univ Sch Software Engn Xian 710049 Shaanxi Peoples R China Sun Yat Sen Univ Sch Data & Comp Sci Guangzhou 510006 Guangdong Peoples R China Huazhong Univ Sci & Technol Sch Management Wuhan 430074 Hubei Peoples R China Huazhong Univ Sci & Technol Sch Comp Sci & Technol Wuhan 430074 Hubei Peoples R China Wuhan Univ Technol Hubei Key Lab Transportat Internet Things Wuhan 430070 Hubei Peoples R China Wuhan Univ Technol Sch Foreign Languages Wuhan 430070 Hubei Peoples R China

ISBN: (纸本)9781538637906

Ant colony optimization (ACO) can be used to solve complex optimization problems in engineering, economic management and military strategy. Most of these are NP hard problems, which are difficult to solve with traditional methods. An improved parallel ACO algorithm based on pattern learning is proposed in this paper. It extracts parameters automatically to reduce solution space and enhance calculation efficiency. Various parameters in the algorithm are analyzed, and a refining strategy is formed according to ACOs characteristics. The parallel ACO algorithm is carried out under the MIC/CPU architecture, and it can significantly enhance performance.

关键词： Pattern learning Ant Colony Optimization MIC

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：