检索结果-内蒙古大学图书馆

A Hardware Accelerator for Contour Tracing in Real-Time Imaging

IEEE SENSORS JOURNAL 2024年第18期24卷 29156-29166页

作者： Gupta, Sonal Goel, Shubh Kumar, Ayush Kar, Subrat IIT Delhi Dept Elect Engn New Delhi 110016 India

Contour tracing is a critical technique in image analysis and computer vision, with applications in medical imaging, big data analytics, machine learning, and robotics. We introduce a novel hardware accelerator based on the adapted and segmented (AnS) vertex following (VF) and run-data-based-following (RDBF) families of fast contour tracing algorithms implemented on the Zynq-7000 field-programmable gate array (FPGA) platform. Our algorithmic implementation utilizing a mesh-interconnected multiprocessor architecture is at least 55x faster than the existing implementations. With input-output overheads, it is up to 12.5x faster. Our hardware accelerator for contour tracing is benchmarked on mesh-interconnected hardware, all three families of contour tracing algorithms, and a random image from the Imagenet database. Our implementation is, thus, faster for FPGA, application-specific integrated circuit (ASIC), graphics processing unit (GPU), and supercomputer hardware in comparison to the central processing unit (CPU)-GPU collaborative approach and offers a better solution for those systems where the input-output overheads can be minimized, such as parallel processing arrays and mesh-connected sensor networks.

关键词： Accelerated contour tracing field-programmable gate array (FPGA) graphics processing unit (GPU) image processing multiprocessors parallel algorithms parallel processing array torus

来源：评论

学校读者我要写书评

暂无评论

Studying the structural features of the lithospheric magnetic and gravity fields with the use of parallel algorithms

引用

IZVESTIYA-PHYSICS OF THE SOLID EARTH 2014年第4期50卷 508-513页

作者： Martyshko, P. S. Fedorova, N. V. Akimova, E. N. Gemaidinov, D. V. Russian Acad Sci Inst Geophys Ural Branch Ekaterinburg 620016 Russia Russian Acad Sci Inst Math & Mech Ural Branch Ekaterinburg 620990 Russia Ural Fed Univ Ekaterinburg 620002 Russia

We describe the parallel algorithms for studying the structural features of the anomalies in the gravity and magnetic fields of the lithosphere, which are based on the height transformations of the data. The algorithms are numerically implemented on the Uran supercomputer. The suggested computer technology is used for constructing the maps of the regional and local anomalies of the magnetic and gravity fields for the northeastern sector of Europe within an area confined between 48A degrees-62A degrees E and 60A degrees-68A degrees N.

关键词： describe parallel algorithms parallel Lines gravitational fields Magnetic force algorithms lithosphere Anomalies parallel algorithms Structural properties Computer technology

来源：评论

学校读者我要写书评

暂无评论

Two-Level parallel Augmented Schur Complement Interior-Point algorithms for the Solution of Security Constrained Optimal Power Flow Problems

引用

IEEE TRANSACTIONS ON POWER SYSTEMS 2020年第2期35卷 1340-1350页

作者： Kardos, Juraj Kourounis, Drosos Schenk, Olaf Univ Svizzera Italiana Inst Computat Sci Adv Comp Lab CH-6904 Lugano Switzerland NEPLAN AG CH-8700 Kusnacht Switzerland

Modern power grids incorporate renewable energy at an increased pace, placing greater stress on the power grid equipment and shifting their operational conditions towards their limits. As a result, failures of any network component, such as a transmission line or power generator, can be critical to the overall grid operation. The security constrained optimal power flow (SCOPF) aims for the long term precontingency operating state, such that in the event of any contingency, the power grid will remain secure. For a realistic power network, however, with numerous contingencies considered, the overall problem size becomes intractable for single-core optimization tools in short time frames established by real-time industrial operations. We propose a parallel distributed memory structure exploiting framework, BELTISTOS-SC, which accelerates the solution of SCOPF problems over state of the art techniques. The acceleration on single-core execution is achieved by a structure-exploiting interior point method, employing successive Schur complement evaluations to further reduce the size of the systems solved at each iteration while maintaining sparsity, resulting in lower computational resources for the linear system solution. Additionally the parallel, distributed memory implementation of the proposed framework is also presented in detail and validated through several large-scale examples, demonstrating its efficiency for large-scale SCOPF problems.

关键词： Security constraints optimal power flow non-linear programming interior point method parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Adaptive Multi-view Radiance Caching for Heterogeneous Participating Media

引用

COMPUTER GRAPHICS FORUM 2025年

作者： Stadlbauer, P. Tatzgern, W. Mueller, J. H. Winter, M. Stojanovic, R. Weinrauch, A. Steinberger, M. Graz Univ Technol Inst Visual Comp Graz Austria Huawei Technol Vienna Austria

Achieving lifelike atmospheric effects, such as fog, is essential in creating immersive environments and poses a formidable challenge in real-time rendering. Highly realistic rendering of complex lighting interacting with dynamic fog can be very resource-intensive, due to light bouncing through a complex participating media multiple times. We propose an approach that uses a multi-layered spherical harmonics probe grid to share computations temporarily. In addition, this world-space storage enables the sharing of radiance data between multiple viewers. In the context of cloud rendering this means faster rendering and a significant enhancement in overall rendering quality with efficient resource utilization.

关键词： CCS Concepts Distributed algorithms parallel algorithms Rendering • Computing methodologies → Ray tracing

来源：评论

学校读者我要写书评

暂无评论

A Theoretical Bound Which Improves the Performance of Compilation-Based Multi-Agent Path Finding

引用

IEEE ACCESS 2025年 13卷 86133-86143页

作者： Lopez, Rodrigo Asin-Acha, Roberto Baier, Jorge A. Pontificia Univ Catolica Chile Dept Comp Sci Santiago 8320165 Chile Univ Tecn Federico Santa Maria Dept Informat Santiago 8940897 Chile Inst Milenio Fundamentos Datos Santiago 8320000 Chile

A well-known approach to optimally solving Multi-Agent Path Finding (MAPF) is by compilation to Boolean Satisfiability or Answer Set Programming. Such compilation-based approaches to MAPF are superior to others on dense, relatively small instances. During solving, the underlying solver is invoked multiple times, each with an encoding of the same instance for a different makespan. The runtime of the last solver invocation, whose input is the instance encoded with a theoretical upper bound of the makespan of the optimal solution, is critical to performance. This paper proposes a new theoretical upper bound for such a last invocation, which we prove is correct. Unlike the previously known bound, given a MAPF instance, our bound requires computing a semi-relaxed solution, which is the union of cost-optimal solutions for partitions of such an instance. The computation of our new bound requires optimally solving partitions, which requires more computational resources than those needed for computing the bound currently used by state-of-the-art solvers. We propose a recursive parallel approach that, we call ReBo, which despite additional overhead in upper bound computation, obtains substantially better overall results by exploiting the new bound. ReBo uses a heuristic to select an appropriate partition for bound computation, which does not guarantee that the solution returned is optimal. However, in our benchmarks, composed of 2,890 problems over warehouses and random instances, ReBo never finds a suboptimal solution. In addition, we found that the new bound is significantly tighter than the previously known, on average, 21.2% smaller than the previous bound. This allows us to generate encodings that are around 33.45% smaller, allowing ReBo to solve 4.9% more instances than a state-of-the-art solver in our benchmark set.

关键词： Upper bound Encoding Costs Mathematical models Runtime Benchmark testing Answer set programming Scalability Visualization Video games parallel algorithms multi-agent systems path planning answer set programming

来源：评论

学校读者我要写书评

暂无评论

A Novel Key Point Based MLCS Algorithm for Big Sequences Mining

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2025年第1期37卷 15-28页

作者： Li, Yanni Liu, Bing Duan, Tihua Wang, Zhi Li, Hui Cui, Jiangtao Xidian Univ Sch Comp Sci & Technol Xian 710071 Peoples R China Univ Illinois Dept Comp Sci Chicago IL 60607 USA Shanghai Yushang Informat Technol Co Ltd Shanghai 201620 Peoples R China

Mining multiple longest common subsequences (MLCS) from a set of sequences of length three or more over a finite alphabet (a classical NP-hard problem) is an important task in many fields, e.g., bioinformatics, computational genomics, pattern recognition, information extraction, etc. Applications in these fields often involve generating very long sequences (length >= 10,000), referred to as big sequences. Despite efforts in improving the time and space complexities of MLCS mining algorithms, both existing exact and approximate algorithms face challenges in handling big sequences due to the overwhelming size of their problem-solving graph model MLCS-DAG (Directed Acyclic Graph), leading to the issue of memory explosion or extremely high time complexity. To bridge the gap, this paper first proposes a new identification and deletion strategy for different classes of non-critical points in the mining of MLCS, which are the points that do not contribute to their MLCSs mining in the MLCS-DAG. It then proposes a new MLCS problem-solving graph model, namely DAG(KP) (a new MLCS-DAG containing only Key Points). A novel parallel MLCS algorithm, called KP-MLCS (Key Point based MLCS), is also presented, which can mine and compress all MLCSs of big sequences effectively and efficiently. Extensive experiments on both synthetic and real-world biological sequences show that the proposed algorithm KP-MLCS drastically outperforms the existing state-of-the-art MLCS algorithms in terms of both efficiency and effectiveness.

关键词： Approximation algorithms parallel algorithms Problem-solving Finite element analysis Directed acyclic graph Data mining NP-hard problem Heuristic algorithms Face recognition MLCS (Multiple longest common subsequence) Explosions MLCS-DAG (Directedacyclic graph) DAG(KP) (a new MLCS-DAG containing only key points) non-critical points KP-MLCS (Key point based MLCS) non-critical points

来源：评论

学校读者我要写书评

暂无评论

Specular Path Generation and Near-Reflective Diffraction in Interactive Acoustical Simulations

引用

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024年第7期30卷 3609-3621页

作者： Pisha, Louis Yadegari, Shahrokh Univ Calif San Diego Sonic Arts Res & Dev Qualcomm Inst San Diego CA 92093 USA

Most systems for simulating sound propagation in a virtual environment for interactive applications use ray- or path-based models of sound. With these models, the "early" (low-order) specular reflection paths play a key role in defining the "sound" of the environment. However, the wave nature of sound, and the fact that smooth objects are approximated by triangle meshes, pose challenges for creating realistic approximations of the reflection results. Existing methods which produce accurate results are too slow to be used in most interactive applications with dynamic scenes. This paper presents a method for reflections modeling called spatially sampled near-reflective diffraction (SSNRD), based on an existing approximate diffraction model, Volumetric Diffraction and Transmission (VDaT). The SSNRD model addresses the challenges mentioned above, produces results accurate to within 1-2 dB on average compared to edge diffraction, and is fast enough to generate thousands of paths in a few milliseconds in large scenes. This method encompasses scene geometry processing, path trajectory generation, spatial sampling for diffraction modeling, and a small deep neural network (DNN) to produce the final response of each path. All steps of the method are GPU-accelerated, and NVIDIA RTX real-time ray tracing hardware is used for spatial computing tasks beyond just traditional ray tracing.

关键词： Diffraction Reflection Computational modeling Solid modeling Acoustics Real-time systems Graphics processing units graph and tree search strategies neural nets parallel algorithms raytracing virtual reality

来源：评论

学校读者我要写书评

暂无评论

parallel finite-difference algorithms for three-dimensional space-fractional diffusion equation with ψ-Caputo derivatives

引用

COMPUTATIONAL & APPLIED MATHEMATICS 2020年第3期39卷 1-20页

作者： Bohaienko, V. O. NAS Ukraine VM Glushkov Inst Cybernet Glushkov Ave 40 Kiev Ukraine

The paper deals with the issues of parallel computations' organization while solving three-dimensional space-fractional diffusion equation with the psi-Caputo derivatives using finite difference schemes. For an implicit scheme and locally one-dimensional splitting scheme, we present parallel algorithms for distributed memory systems that use one-dimensional block and red-black data partitioning. To reduce the order of algorithms' computational complexity, we use an approach based on the expansion of integral operator's kernel into series. We present the theoretical estimates of parallel algorithms' performance and the results of computational experiments conducted on a testing problem that has an analytical solution for the case of the Caputo-Katugampola derivative. The results of the experiments show close-to-linear parallelization efficiency of one-dimensional splitting scheme with block partitioning and inefficiency of red-black partitioning in this case. For the implicit scheme, the scalability of parallel algorithms is weak and the use of red-black partitioning is more efficient than the use of block partitioning when running on a small number of computational resources.

关键词： Diffusion Space-fractional differential equation parallel algorithms psi-Caputo derivative Finite-difference approximation Splitting schemes

来源：评论

学校读者我要写书评

暂无评论

Distributed Augmentation, Hypersweeps, and Branch Decomposition of Contour Trees for Scientific Exploration

引用

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025年第1期31卷 152-162页

作者： Li, Mingzhe Carr, Hamish Rubel, Oliver Wang, Bei Weber, Gunther H. Univ Utah Salt Lake City UT 84112 USA Univ Leeds Leeds England Lawrence Berkeley Natl Lab Berkeley CA USA

Contour trees describe the topology of level sets in scalar fields and are widely used in topological data analysis and visualization. A main challenge of utilizing contour trees for large-scale scientific data is their computation at scale using high-performance computing. To address this challenge, recent work has introduced distributed hierarchical contour trees for distributed computation and storage of contour trees. However, effective use of these distributed structures in analysis and visualization requires subsequent computation of geometric properties and branch decomposition to support contour extraction and exploration. In this work, we introduce distributed algorithms for augmentation, hypersweeps, and branch decomposition that enable parallel computation of geometric properties, and support the use of distributed contour trees as query structures for scientific exploration. We evaluate the parallel performance of these algorithms and apply them to identify and extract important contours for scientific visualization.

关键词： branch decomposition Contour trees parallel algorithms computational topology computational topology topological data analysis topological data analysis computational topology topological data analysis

来源：评论

学校读者我要写书评

暂无评论

SLoB: Suboptimal Load Balancing Scheduling in Local Heterogeneous GPU Clusters for Large Language Model Inference

引用

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS 2024年第6期11卷 7941-7951页

作者： Jiang, Peiwen Wang, Haoxin Cai, Zinuo Gao, Lintao Zhang, Weishan Ma, Ruhui Zhou, Xiaokang Shanghai Jiao Tong Univ Sch Elect Informat & Elect Engn Shanghai 200240 Peoples R China Shanghai Aerosp Syst Engn Inst Shanghai 201111 Peoples R China China Univ Petr East China Qingdao Inst Software Coll Comp Sci & Technol Qingdao 266555 Peoples R China Kansai Univ Business Data Sci Osaka 5650823 Japan

Large language models (LLMs) are becoming powerful engines for social productivity in the manufacturing lifecycle. Existing application-level LLMs inference services focus on large datacenter and small edge intelligence (EI) scenarios, adopting iteration-level batch schedulers to solve resource utilization and inference speed problems. However, these services are incompatible with the scene of medium-sized local heterogeneous graphics processing unit (GPU) clusters with specific patterns, whose scale is between the two aforementioned scenarios. This type of scene proposes tradeoff problems for inference resource and speed, as well as user satisfaction problems for the semisparse frequency of queries with streaming responses. We propose suboptimal load balancing (SLoB), a distributed LLMs inference service scheduler in medium-sized local heterogeneous GPU clusters. SLoB leverages a multilevel adapter to accommodate LLMs usage patterns of scenes and balance resource utilization with inference efficiency. For semisparse problems, it adopts a mixed-priority pipeline scheduler with the least-padding principle to improve users' satisfaction, a metric considering the weights of different tokens in streaming responses. Based on the system prototype, our experiments under simulated workloads demonstrate that SLoB gains a maximum improvement of 29.4x under the satisfaction metric compared with the traditional run-to- completion scheduling solution while improving by up to 3.0x compared with the state-of-the-art (SOTA) solution Orca.

关键词： Task analysis Manufacturing Graphics processing units Adaptation models Job shop scheduling Transformers Pipelines Heterogeneous GPU clusters inference service large language models parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：