检索结果-内蒙古大学图书馆

Applying Simulated Annealing and parallel computing to the Mobile Sequential Recommendation

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2019年第2期31卷 243-256页

作者： Ye, Zeyang Xiao, Keli Ge, Yong Deng, Yuefan SUNY Stony Brook Dept Appl Math & Stat Stony Brook NY 11794 USA SUNY Stony Brook Coll Business Stony Brook NY 11794 USA Univ Arizona Eller Coll Management Tucson AZ 85721 USA Sun Yat Sen Univ Guangdong Prov Key Lab Computat Sci Guangzhou 510006 Guangdong Peoples R China

We speed up the solution of the mobile sequential recommendation (MSR) problem that requires searching optimal routes for empty taxi cabs through mining massive taxi GPS data. We develop new methods that combine parallel computing and the simulated annealing with novel global and local searches. While existing approaches usually involve costly offline algorithms and methodical pruning of the search space, our new methods provide direct real-time search for the optimal route without the offline preprocessing. Our methods significantly reduce computational time for the high dimensional MSR problems from days to seconds based on the real-world data as well as the synthetic ones. We efficiently provide solutions to MSR problems with thousands of pick-up points without offline training, compared to the published record of 25 pick-up points.

关键词： Mobile sequential recommendation simulated annealing parallel computing potential travel distance

来源：评论

学校读者我要写书评

暂无评论

A dynamic texture based segmentation method for ultrasound images with Surfacelet, HMT and parallel computing

引用

MULTIMEDIA TOOLS AND APPLICATIONS 2019年第5期78卷 5381-5401页

作者： Cai, Bo Ye, Wei Zhao, Jianhui Wuhan Univ Sch Comp Sci Wuhan 430072 Hubei Peoples R China

To segment regions of interest (ROIs) from ultrasound images, one novel dynamic texture based algorithm is presented with surfacelet transform, hidden Markov tree (HMT) model and parallel computing. During surfacelet transform, the image sequence is decomposed by pyramid model, and the 3D signals with high frequency are decomposed by directional filter banks. During HMT modeling, distribution of coefficients is described with Gaussian mixture model (GMM), and relationship of scales is described with scale continuity model. From HMT parameters estimated through expectation maximization, the joint probability density is calculated and taken as feature value of image sequence. Then ROIs and non-ROIs in collected sample videos are used to train the support vector machine (SVM) classifier, which is employed to identify the divided 3D blocks from input video. To improve the computational efficiency, parallel computing is implemented with multi-processor CPU. Our algorithm has been compared with the existing texture based approaches, including gray level co-occurrence matrix (GLCM), local binary pattern (LBP), Wavelet, for ultrasound images, and the experimental results prove its advantages of processing noisy ultrasound images and segmenting higher accurate ROIs.

关键词： Dynamic texture Surfacelet transform HMT model parallel computing Ultrasound images

来源：评论

学校读者我要写书评

暂无评论

parallel computing in soft X-rays plasma diagnostic systems for thermal fusion reactors-feasibility studies for GPUs

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2020年第10期32卷 e5235-e5235页

作者： Krawczyk, Rafal Czarski, Tomasz Linczuk, Pawel Wojenski, Andrzej Chernyshova, Maryna Pozniak, Krzysztof Mazon, Didier Kolasinski, Piotr Kasprowicz, Grzegorz Zabolotny, Wojciech Gaska, Michal Kowalsaka-Strzeciwilk, Ewa Malinowski, Karol Jardin, Axel Malard, Philippe Warsaw Univ Technol Inst Elect Syst Nowowiejska St 15-19 PL-00665 Warsaw Poland Inst Plasma Phys & Laser Microfus Warsaw Poland French Atom Energy & Alternat Energies Commiss Inst Magnet Fusion Res Cadarache France Polish Acad Sci Inst Nucl Phys Krakow Poland

This paper presents feasibility studies in utilizing graphics processing units (GPUs) as high-performance computing hardware with front-end electronics in high-scale magnetic confinement thermal fusion experiments. The objective of the research is to provide scalable, high-throughput, and low-latency measurements for the runtime tokamak metallic impurities X-ray diagnostic for the Tungsten Environment in Steady-State Tokamak (WEST) reactor. The heterogeneous system of front-end with field-programmable gate arrays and the back-end server was introduced to decompose workloads efficiently. It allows the comprehensive evaluation of CPUs and accelerators. In particular, a novel implementation of the back-end algorithm for GPU with the performance analysis are presented.

关键词： graphics processing unit (GPU) heterogeneous systems high-performance computing (HPC) high-throughput computing parallel computing plasma physics

来源：评论

学校读者我要写书评

暂无评论

Study on High-Density Integration Resistive Random Access Memory Array From Multiphysics Perspective by parallel computing

引用

IEEE TRANSACTIONS ON ELECTRON DEVICES 2019年第4期66卷 1747-1753页

作者： Zhu, Guodong Chen, Wenchao Wang, Dawei Xie, Hao Zhao, Zhenguo Gao, Pingqi Schutt-Aine, Jose Yin, Wen-Yan Zhejiang Univ Key Lab Adv Micronano Elect Devices & Smart Syst Innovat Inst Electromagnet Informat & Elect Integ Coll Informat Sci & Elect Engn Hangzhou 310058 Zhejiang Peoples R China ZJU UIUC Inst Coll Informat Sci & Elect Engn Int Campus Haining 314400 Peoples R China Inst Appl Phys & Computat Math Beijing 100088 Peoples R China Chinese Acad Sci Ningbo Inst Mat Technol & Engn Ningbo 315201 Zhejiang Peoples R China UIUC Dept Elect & Comp Engn Urbana IL 61801 USA

A finite-element method-based parallel computing simulator for multiphysics effects in resistive random access memory (RRAM) array, which is suitable for supercomputer platforms even with thousands of cores, is developed to simulate oxygen vacancy migration, current transport, and thermal conduction. Exponentially fit flux Galerkin method is introduced to improve algorithm convergence when solving the 3-D oxygen vacancy drift-diffusion equation. The accuracy of our algorithm is validated by comparison with commercial software. Scalability of our parallel algorithm is also investigated. The simulation results for the high-density integration RRAM array indicate that the heat generated during the writing process can result in high temperature, and lead to severe reliability problem. Even the RRAM cells without bias voltage applied can be transferred from low-resistance state to high-resistance state unintentionally, and lose their stored information. Increasing the feature size or equivalently decreasing the integration density lowers the power density, hence improves reliability performance. Large electrode thickness with Dirichlet boundary applied on their side surfaces can drain out heat faster and enhance reliability of RRAM array.

关键词： Drift diffusion finite-element method (FEM) heat conduction parallel computing reliability resistive random access memory (RRAM) array

来源：评论

学校读者我要写书评

暂无评论

Research on SVM environment performance of parallel computing based on large data set of machine learning

引用

JOURNAL OF SUPERcomputing 2019年第9期75卷 5966-5983页

作者： Gong, Yunlu Jia, Lianguo Shanghai Univ Dept Math Shanghai Peoples R China Wuxi Huoqiupuhui Co Ltd Wuxi Jiangsu Peoples R China

The support vector machine (SVM) algorithm is widely used in various fields because of its good classification effect, simplicity and practicability. However, the support vector machine calculates the support vector by quadratic programming, and the solution of quadratic programming will calculate the n-order matrix. When the amount of data is large, the calculation and storage of the n-order matrix will make the optimization speed very slow, even lead to memory overflow and interrupt operation. Using the big data computing platform Spark to improve the support vector machine algorithm can solve the above problems, but it's not competent for multi-classification problems. Therefore, this paper starts with constructing multiple classifiers, combines the Spark framework of big data programming model and the classification characteristics of support vector machine to realize a parallel one-to-many SVM optimization algorithm based on large data sets and compares them through UCI data sets. In the experiments, the one-to-many support vector machine improved by Spark is obviously better than the one-to-many support vector machine in the single-machine environment. The simulation results show that the proposed algorithm has better performance.

关键词： SVM environment parallel computing Large data set Machine learning

来源：评论

学校读者我要写书评

暂无评论

Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review

引用

computing 2019年第8期101卷 893-936页

作者： Memeti, Suejb Pllana, Sabri Binotto, Alecio Kolodziej, Joanna Brandic, Ivona Linnaeus Univ Dept Comp Sci S-35195 Vaxjo Sweden IBM Res Sao Paulo Brazil Cracow Univ Technol PL-31155 Krakow Poland Vienna Univ Technol Inst Software Technol & Interact Syst Elect Commerce Grp A-1040 Vienna Austria

While modern parallel computing systems offer high performance, utilizing these powerful computing resources to the highest possible extent demands advanced knowledge of various hardware architectures and parallel programming models. Furthermore, optimized software execution on parallel computing systems demands consideration of many parameters at compile-time and run-time. Determining the optimal set of parameters in a given execution context is a complex task, and therefore to address this issue researchers have proposed different approaches that use heuristic search or machine learning. In this paper, we undertake a systematic literature review to aggregate, analyze and classify the existing software optimization methods for parallel computing systems. We review approaches that use machine learning or meta-heuristics for software optimization at compile-time and run-time. Additionally, we discuss challenges and future research directions. The results of this study may help to better understand the state-of-the-art techniques that use machine learning and meta-heuristics to deal with the complexity of software optimization for parallel computing systems. Furthermore, it may aid in understanding the limitations of existing approaches and identification of areas for improvement.

关键词： parallel computing Machine learning Meta-heuristics Software optimization

来源：评论

学校读者我要写书评

暂无评论

Optimizing Slot Utilization and Network Topology for Communication Pattern on Circuit-Switched parallel computing Systems

引用

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS 2019年第2期E102D卷 247-260页

作者： Hu, Yao Koibuchi, Michihiro Natl Inst Informat Informat Syst Architecture Sci Res Div Tokyo 1018430 Japan

In parallel computing systems, the interconnection network forms the critical infrastructure which enables robust and scalable communication between hundreds of thousands of nodes. The traditional packet-switched network tends to suffer from long communication time when network congestion occurs. In this context, we explore the use of circuit switching (CS) to replace packet switches with custom hardware that supports circuit-based switching efficiently with low latency. In our target CS network, a certain amount of bandwidth is guaranteed for each communication pair so that the network latency can be predictable when a limited number of node pairs exchange messages. The number of allocated time slots in every switch is a direct factor to affect the end-to-end latency, we thereby improve the slot utilization and develop a network topology generator to minimize the number of time slots optimized to target applications whose communication patterns are predictable. By a quantitative discrete-event simulation, we illustrate that the minimum necessary number of slots can be reduced to a small number in a generated topology by our design methodology while maintaining network cost 50% less than that in standard tori topologies.

关键词： parallel computing interconnection network circuit switching time division multiplexing (TDM) end-to-end latency

来源：评论

学校读者我要写书评

暂无评论

Finite Difference Generated Transient Potentials of Open-Layered Media by parallel computing Using OpenMP, MPI, OpenACC, and CUDA

引用

IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 2019年第10期67卷 6541-6550页

作者： Miri Rostami, Seyyed Reza Ghaffari-Miab, Mohsen Tarbiat Modares Univ Fac Elect & Comp Engn Tehran *** Iran Tarbiat Modares Univ Nano Plasmophoton Res Grp Fac Elect & Comp Engn Tehran *** Iran

The implementation of time-domain Green's functions (TDGFs) in the graphics processing unit (GPU) and the central processing unit (CPU) using a finite-difference scheme is shown. The TDGFs represent the transient electric scalar and magnetic vector potentials due to a horizontal electric dipole (HED) in open-layered media. The layered media is bounded with a perfectly matched layer (PML), symmetry axis, and perfect electric conductor (PEC). We adopted four different parallel approaches as follows: 1) open multiprocessing (OpenMP) CPU implementation;2) message passing interface (MPI) CPU implementation;3) open accelerators (OpenACC) GPU implementation;and 4) compute unified device architecture (CUDA) GPU implementation. The accuracy and efficiency of the utilized programming models are validated by comparing and verifying the obtained results using a sequential CPU implementation. Compared to single threaded CPU implementation, speed-ups obtained by the OpenMP, MPI, OpenACC, and CUDA programming models are4.8 ,6.12 ,45.97 , and96.53higher, respectively. The final result shows that GPU implementation leads to a considerable speed-up while the solution's accuracy is fixed.

关键词： Central processing unit (CPU) compute unified device architecture (CUDA) finite-difference scheme graphics processing unit (GPU) message passing interface (MPI) open accelerators (OpenACC) open multiprocessing (OpenMP) parallel computing time-domain Green's functions (TDGFs)

来源：评论

学校读者我要写书评

暂无评论

Efficient modeling of distributions of polymer properties using probability generating functions and parallel computing

引用

COMPUTERS & CHEMICAL ENGINEERING 2019年 128卷 261-284页

作者： Asteasuain, Mariano Univ Nacl Sur Dept Ingn Quim Av Alem 1253 RA-8000 Bahia Blanca Buenos Aires Argentina PLAPIQUI UNS CONICET Planta Piloto Ingn Quim Camino La Carrindanga Km 7 RA-8000 Bahia Blanca Buenos Aires Argentina

High-fidelity models of polymer processes should include the prediction of distributions of polymer properties, including multivariate distributions. Deterministic models with this capability usually involve a large system of equations, which compromises the model performance in terms of CPU time. The probability generating function (pgf) technique is a powerful method for modeling distributions of polymer properties, including multivariate distributions. It can be applied to systems described by complex kinetic mechanism and requires no a priori assumptions about the distribution shape. The structure of this modeling method makes it particularly suitable for parallel computing. This work describes the application of the pgf technique for modeling uni- and bi-variate distributions of polymer properties with parallelization of the model code. It is shown that accurate results can be achieved in very short running times, which makes the technique suitable for models to be employed in optimization and online control tasks. (C) 2019 Elsevier Ltd. All rights reserved.

关键词： Mathematical modeling Polymers Multivariate distributions Probability generating function parallel computing

来源：评论

学校读者我要写书评

暂无评论

A parallel computing architecture for high-performance OWL reasoning

引用

parallel computing 2019年 83卷 34-46页

作者： Quan, Zixi Haarslev, Volker Concordia Univ Dept Comp Sci & Software Engn Montreal PQ Canada

The Web Ontology Language (OWL) is a widely used knowledge representation language for describing knowledge in application domains by using classes, properties, and individuals. Ontology classification is an important and widely used service that computes a taxonomy of all classes occurring in an ontology. It can require significant amounts of runtime, but most OWL reasoners do not support any kind of parallel processing. We present a novel thread-level parallel architecture for ontology classification, which is ideally suited for shared-memory SMP servers, but does not rely on locking techniques and thus avoids possible race conditions. We evaluated our prototype implementation with a set of real-world ontologies. Our experiments demonstrate a very good scalability resulting in a speedup that is linear to the number of available cores. (C) 2018 Elsevier B.V. All rights reserved.

关键词： Ontology classification parallel computing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：