Traffic simulation is a critical tool for congestion analysis, travel time estimation, and route optimization in urban planning, benefiting navigation apps, transportation network companies, and state agencies. Tradit...
详细信息
Traffic simulation is a critical tool for congestion analysis, travel time estimation, and route optimization in urban planning, benefiting navigation apps, transportation network companies, and state agencies. Traditionally, traffic micro-simulation frameworks are based on road segments and can only support a limited number of main roads. Efficient traffic simulation on a regional scale remains a significant challenge due to the complexity of urban mobility and the large scale of spatiotemporal data. This paper introduces a Large Scale Multi-GPU Parallel Computing based Regional Scale Traffic Simulation Framework (LPSim), which leverages graphical processing unit (GPU) parallel computing to address these challenges. LPSim utilizes a multi-GPU architecture to simulate extensive and dynamic traffic networks with high fidelity and reduced computation time. Using the parallel processing capabilities of GPUs, LPSim can perform tens of millions of individual vehicle dynamics simulations simultaneously, significantly outperforming traditional CPU-based approaches. The framework is designed to be scalable and can easily accommodate the increasing complexity of traffic simulations. We present the theory behind GPU-based traffic simulation, the architecture of single- and multi-GPU based simulations, and the graph partition strategies that enhance computation resource load balance. Our experimental results demonstrate the effectiveness of LPSim in simulating large-scale traffic scenarios. LPSim is capable of completing simulations of 2.82 million trips in just 6.28 minutes on a single GPU machine equipped with 5120 CUDA cores (Tesla V100-SXM2). Furthermore, utilizing a Google Cloud instance with two NVIDIA V100 GPUs, which collectively offer 10240 CUDA cores, LPSim successfully simulates 9.01 million trips within 21.16 minutes. We further tested our simulator with the same demand on dual NVIDIA A100-PCIE-40GB GPUs, which finished the simulation in 0.0398 hours, approximately
Python has been gaining some traction for years in the world of scientific applications. However, the high-level abstraction it provides may not allow the developer to use the machines to their peak performance. To ad...
详细信息
ISBN:
(数字)9780738110868
ISBN:
(纸本)9781665422864
Python has been gaining some traction for years in the world of scientific applications. However, the high-level abstraction it provides may not allow the developer to use the machines to their peak performance. To address this, multiple strategies, sometimes complementary, have been developed to enrich the software ecosystem either by relying on additional libraries dedicated to efficient computation (e.g., NumPy) or by providing a framework to better use HPC scale infrastructures (e.g., PyCOMPSs).In this paper, we present a Python extension based on SharedArray that enables the support of system-provided shared memory and its integration into the PyCOMPSs programming model as an example of integration to a complex Python environment. We also evaluate the impact such a tool may have on performance in two types of distributed execution-flows, one for linear algebra with a blocked matrix multiplication application and the other in the context of data-clustering with a k-means application. We show that with very little modification of the original decorator (3 lines of code to be modified) of the task-based application the gain in performance can rise above 40% for tasks relying heavily on data reuse on a distributed environment, especially when loading the data is prominent in the execution time.
Social robotic assistants have been widely studied and deployed as telepresence tools or caregivers. Evaluating their design and impact on the people interacting with them is of prime importance. In this research, we ...
Social robotic assistants have been widely studied and deployed as telepresence tools or caregivers. Evaluating their design and impact on the people interacting with them is of prime importance. In this research, we evaluate the usability and impact of ARMAR-6, an industrial robotic assistant for maintenance tasks. For this evaluation, we have used a modified System Usability Scale (SUS) to assess the general usability of the robotic system and the Godspeed questionnaire series for the subjective perception of the coworker. We have also recorded the subjects' gaze fixation patterns and analyzed how they differ when working with the robot compared to a human partner.
Branch-and-Bound (B&B) algorithms to solve Global Optimization (GO) may use n-simplicial partition sets. The n-simplex represents an n-dimensional body with n+1 vertices in (n+1)-dimensional space. The aim of this...
详细信息
Branch-and-Bound (B&B) algorithms to solve Global Optimization (GO) may use n-simplicial partition sets. The n-simplex represents an n-dimensional body with n+1 vertices in (n+1)-dimensional space. The aim of this article is to investigate the properties of the binary tree generated by iterative bisection of the longest edge (LE) of the regular n-simplex as search space. This way of splitting an n-simplex reduces the appearance of bad shaped simplices which facilitates the convergence of the algorithm. It also helps to have a more uniform sampling of the search space since the function is evaluated at vertices of simplices. A motivation for this research is the estimation of the pending computational work load during the B&B GO algorithm. Such estimation may be helpful to steer parallel versions of the algorithms. In this paper we will show that the way the longest edge is selected affects on the number of sub-problems, the number of similar shapes of those sub-problems and their roundness factor. The computational cost to obtain those metrics increases with the dimension n. Here we show the results for n leq 3, where for n=3, we have a 3-dimensional body in a 4-dimensional space. Due to the exponential growth of the binary tree, high performance computing is useful in order to reach a high precision or when n eq 3. We make use of parallel computing under MATLAB software.
作者:
Dr. Roy L. StreitSince 1992
Dr. Streit has focused a significant amount of his energy to assignment problems in acoustic warfare data fusion systems. This work led him to the formulation of the PDD Principle a broadly useful theoretical method that enables the derivation of new classes of discrete-continuous estimation algorithms for solving assignment problems without requiring enumeration and pruning. In recognition of his outstanding work in this area Dr. Streit received the NAVSEA Scientist of the Year Award for 1997. Dr. Streit has also been pursuing avenues for analyzing the loss in broadband detection performance of an acoustic array in the presence of interferences. Because examining all possible separations between the signal and many interferers is infeasible he has proposed a Poisson process model for the number and location (s) of the interferers. In this same time period Dr. Streit has also been investigating issues in environmental modeling and localization by proposing a novel integral method for solving the bearings-only target motion analysis problem which enables a natural heuristic for compensating for mismatch between the model predictions and the real world. Most recently Dr. Streit has proposed the Numerical ACoustic Hull ARray (NACHAR) Project as a revolutionary and ambitious approach to hull array design. Its premise is that optimizing the detection capability of large hull arrays requires the full integration of hull sensor array and beamformer design processes. Because NACHAR crosses an unusual number of technical disciplines it involves researchers from several departments. The diversity of Dr. Streif s technical background makes him uniquely suited to lead the project Dr. Streifs truly impressive scientific achievements are complemented by a noteworthy list of professional activities that accentuate his value to Division Newport and the Navy. He represents the United States in The Technical Cooperation Program Maritime Activities Panel 9 (Sonar Technology) and participates in the Division New
for his significant engineering research and development in sonar array research and acoustic transient signals as set forth in the following
for his significant engineering research and development in sonar array research and acoustic transient signals as set forth in the following
It is envisaged that the grid infrastructure will be a large-scale distributed software system that will provide high-end computational and storage capabilities to differentiated users. A number of distributed computi...
详细信息
It is envisaged that the grid infrastructure will be a large-scale distributed software system that will provide high-end computational and storage capabilities to differentiated users. A number of distributed computi...
详细信息
ISBN:
(纸本)9780769515823
It is envisaged that the grid infrastructure will be a large-scale distributed software system that will provide high-end computational and storage capabilities to differentiated users. A number of distributed computing technologies are being applied to grid development work, including CORBA and Jini. In this work, we introduce an A4 (Agile Architecture and Autonomous Agents) methodology, which can be used for resource management for grid computing. An initial system implementation utilises the performance prediction techniques of the PACE toolkit to provide quantitative data regarding the performance of complex applications running on local grid resources. At the meta-level, a hierarchy of identical agents is used to provide an abstraction of the system architecture. Each agent is able to cooperate with other agents to provide service advertisement and discovery to schedule applications that need to utilise grid resources. A performance monitor and advisor (PMA) is in development to optimize the performance of agent behaviours.
暂无评论