检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

2,872 篇 会议
64 册 图书
45 篇 期刊文献

馆藏范围

2,980 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

2,088 篇 工学
- 1,866 篇 计算机科学与技术...
- 969 篇 软件工程
- 351 篇 电气工程
- 271 篇 信息与通信工程
- 267 篇 电子科学与技术（可...
- 109 篇 控制科学与工程
- 76 篇 机械工程
- 63 篇 生物工程
- 50 篇 仪器科学与技术
- 48 篇 生物医学工程（可授...
- 41 篇 动力工程及工程热...
- 37 篇 光学工程
- 33 篇 建筑学
- 30 篇 材料科学与工程（可...
- 30 篇 土木工程
- 25 篇 化学工程与技术
- 25 篇 交通运输工程
- 24 篇 网络空间安全
- 23 篇 安全科学与工程
601 篇 理学
- 397 篇 数学
- 115 篇 物理学
- 68 篇 生物学
- 62 篇 系统科学
- 41 篇 化学
- 32 篇 统计学（可授理学、...
239 篇 管理学
- 160 篇 管理科学与工程(可...
- 101 篇 图书情报与档案管...
- 72 篇 工商管理
55 篇 医学
- 48 篇 临床医学
25 篇 经济学
- 25 篇 应用经济学
21 篇 法学
15 篇 文学
14 篇 农学
4 篇 军事学
3 篇 教育学
1 篇 艺术学

主题

365 篇 parallel process...
190 篇 graphics process...
170 篇 computer archite...
135 篇 parallel archite...
121 篇 graphics process...
113 篇 hardware
106 篇 parallel algorit...
104 篇 parallel process...
83 篇 computational mo...
79 篇 instruction sets
78 篇 image processing
75 篇 signal processin...
70 篇 multicore proces...
69 篇 parallel program...
68 篇 field programmab...
63 篇 concurrent compu...
63 篇 gpu
62 篇 algorithm design...
62 篇 kernel
60 篇 optimization

机构

9 篇 natl univ def te...
6 篇 hosei univ dept ...
6 篇 school of comput...
6 篇 inria rennes
6 篇 national laborat...
5 篇 college of compu...
5 篇 univ aizu dept c...
5 篇 college of compu...
5 篇 karlsruhe instit...
5 篇 city university ...
5 篇 st francis xavie...
4 篇 queens univ belf...
4 篇 nanyang technol ...
4 篇 chinese acad sci...
4 篇 univ chinese aca...
4 篇 hainan internati...
4 篇 department of co...
4 篇 universidad carl...
4 篇 sun yat-sen univ...
4 篇 institute of com...

作者

11 篇 jack dongarra
8 篇 roman wyrzykowsk...
8 篇 quintana-orti en...
7 篇 hannig frank
7 篇 teich juergen
7 篇 nakano koji
7 篇 konrad karczewsk...
6 篇 ito yasuaki
6 篇 liu jie
6 篇 carretero jesus
6 篇 peng shietung
6 篇 li yamin
6 篇 chu wanming
6 篇 wang gang
5 篇 dongarra jack
5 篇 wanlei zhou
5 篇 qian depei
5 篇 namyst raymond
5 篇 ewa deelman
5 篇 dolz manuel f.

语言

2,938 篇 英文
32 篇 其他
11 篇 中文
2 篇 俄文

检索条件"任意字段=7th International Conference on Algorithms and Architectures for Parallel Processing"

共 2981 条记录，以下是71-80 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

IMTIC 2023 - 7th international Multi-Topic ICT conference 2023: AI Convergence towards Sustainable Communications

IMTIC 2023 - 7th International Multi-Topic ICT Conference 20...

引用

7th international Multi-Topic ICT conference: IMTIC 2023

ISBN: (纸本)9798350338461

the proceedings contain 34 papers. the topics discussed include: EEG based thought-to-text translation via deep learning;penta-band circular patch antenna with partial ground for wireless applications;fingerprint generation and authentication though adaptive convolution generative adversarial network (ADCGAN);go together: bridging the gap between learners and teachers;design, implementation, and power analysis for network-on-chip architectures;SELthA: secure, efficient and lightweight authentication mechanism for unmanned aerial vehicle network;a hybrid statistical model for ultra short term wind speed prediction;a comprehensive study of the role of self-driving vehicles in agriculture: a review;praise or insult? identifying cyberbullying using natural language processing;inductance enhancement using nested inductor topology for RF and voltage regulators applications;comparison of memory-less and memory-based models for short-term solar irradiance forecasting;and exploring the impact of false location identification on the inference of social ties in location-based social networks.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Modified Efficient parallel Distributed Arithmetic based FIR Filter Architecture for ASIC and FPGA 10

Modified Efficient Parallel Distributed Arithmetic based FIR...

引用

10th international conference on Signal processing and Integrated Networks, SPIN 2023

作者： Soni, Teena Kumar, Anil Panda, Manoj Kumar PDPM-Indian Institute of Information Technology Design and Manufacturing Department of Electronics and Communication Jabalpur India PDPM-Indian Institute of Information Technology Design and Manufacturing Department of Natural Science Jabalpur India

ISBN: (纸本)9781665490993

Digital FIR filters can be efficiently implemented using distributed arithmetic (DA). Original DA provides low throughput. parallel DA is proven to be a promising technique for efficient DA implementation. Block-based parallel DA architecture proposed by Singhal and Mohanty is examined and improved by applying a modified LUT decomposition scheme. Experiments with different levels of LUT decomposition are performed with FIR filters of orders 16 and 32. the proposed architectures are implemented in Basys-3 (Artix-7, XC7A35T-1CPG236C) FPGA board. Several critical performance metrics such as the number of slices, maximum clock frequency, dynamic power consumption, and throughput are estimated for different filter orders for the targeted FPGA Board. the proposed architecture is also implemented for ASIC using a 45 nm NanGate open cell library and area, power, and delay are reported. Comparison with state-of-the-art DA architectures for FPGA implementation provides an average of 64% reduction in area and 22% improvement in throughput. © 2023 IEEE.

关键词： FIR filters

来源：评论

学校读者我要写书评

暂无评论

Optimizing CSR-Based SpMV on a New MIMD Architecture Pezy-SC3s 1

引用

23rd international conference on algorithms and architectures for parallel processing (ICA3PP)

作者： Guo, Jihu Liu, Jie Wang, Qinglin Zhu, Xiaoxiong Natl Univ Def Technol Lab Digitizing Software Frontier Equipment Changsha 410073 Peoples R China Natl Univ Def Technol Natl Key Lab Parallel & Distributed Comp Changsha 410073 Peoples R China

ISBN: (数字)9789819708017

ISBN: (纸本)9789819708000;9789819708017

Sparse matrix-vector multiplication (SpMV) is extensively used in scientific computing and often accounts for a significant portion of the overall computational overhead. therefore, improving the performance of SpMV is crucial. However, sparse matrices exhibit a sporadic and irregular distribution of non-zero elements, resulting in workload imbalance among threads and challenges in vectorization. To address these issues, numerous efforts have focused on optimizing SpMV based on the hardware characteristics of computing platforms. In this paper, we present an optimization on CSR-Based SpMV, since the CSR format is the most widely used and supported by various high-performance sparse computing libraries, on a novel MIMD computing platform Pezy-SC3s. Based on the hardware characteristics of Pezy-SC3s, we tackle poor data locality, workload imbalance, and vectorization challenges in CSRBased SpMV by employing matrix chunking, applying Atomic Cache for workload scheduling, and utilizing SIMD instructions during performing SpMV. As the first study to investigate SpMV optimization on Pezy-SC3s, we evaluate the performance of our work by comparing it with the CSR-Based SpMV and SpMV provided by Nvidia's CuSparse. through experiments conducted on 2092 matrices obtained from SuiteSparse, we demonstrate that our optimization achieves a maximum speedup ratio of x17.63 and an average of x1.56 over CSR-Based SpMV and an average bandwidth utilization of 35.22% for large-scale matrices (nnz >= 10(6)) compared with 36.17% obtained using CuSparse. these results demonstrate that our optimization effectively harnesses the hardware resources of Pezy-SC3s, leading to improved performance of CSR-Based SpMV.

关键词： SpMV Optimization CSR-Based SpMV Pezy-SC3s

来源：评论

学校读者我要写书评

暂无评论

A Novel Approach to Braun Multiplier Design Utilizing High-Speed parallel Prefix Adder for Low Power Applications 7

A Novel Approach to Braun Multiplier Design Utilizing High-S...

引用

7th IEEE international conference on Computational Systems and Information Technology for Sustainable Solutions, CSITSS 2023

作者： Kiran, V. Savi Teja, E.M. Rv College of Engineering Department of Ece Bengaluru India

ISBN: (纸本)9798350343144

this project develops an innovative Braun Multiplier design to address power consumption and chip area challenges. Integrating a high-speed parallel prefix adder enhances computational speed by leveraging parallel processing capabilities. the synergy between the Braun Multiplier and the adder yields a solution optimizing performance, power efficiency, and chip area usage. Meticulous design and engineering yield notable improvements in power efficiency and chip area utilization. the project drives advancements in digital circuit design and optimization methodologies, achieving a 16% area reduction and 18.2% lower power consumption via the FS-GDI technique, while achieving full-swing output voltage, distinguishing it from GDI and M-GDI techniques. © 2023 IEEE.

关键词： Adders

来源：评论

学校读者我要写书评

暂无评论

Parareal with a Physics-Informed Neural Network as Coarse Propagator 29th

Parareal with a Physics-Informed Neural Network as Coarse Pr...

引用

29th international conference on parallel and Distributed Computing (Euro-Par)

作者： Ibrahim, Abdul Qadir Goetschel, Sebastian Ruprecht, Daniel Hamburg Univ Technol Inst Math Chair Computat Math Hamburg Germany

ISBN: (纸本)9783031396977;9783031396984

parallel-in-time algorithms provide an additional layer of concurrency for the numerical integration of models based on time-dependent differential equations. Methods like Parareal, which parallelize across multiple time steps, rely on a computationally cheap and coarse integrator to propagate information forward in time, while a parallelizable expensive fine propagator provides accuracy. Typically, the coarse method is a numerical integrator using lower resolution, reduced order or a simplified model. Our paper proposes to use a physics-informed neural network (PINN) instead. We demonstrate for the Black-Scholes equation, a partial differential equation from computational finance, that Parareal with a PINN coarse propagator provides better speedup than a numerical coarse propagator. Training and evaluating a neural network are both tasks whose computing patterns are well suited for GPUs. By contrast, mesh-based algorithms with their low computational intensity struggle to perform well. We show that moving the coarse propagator PINN to a GPU while running the numerical fine propagator on the CPU further improves Parareal's single-node performance. this suggests that integrating machine learning techniques into parallel-in-time integration methods and exploiting their differences in computing patterns might offer a way to better utilize heterogeneous architectures.

关键词： Parareal parallel-in-time integration PINN Machine learning GPUs heterogeneous architectures

来源：评论

学校读者我要写书评

暂无评论

On a Template Programming Approach for Shared Memory parallel architectures with Applications to the Fully Implicit Stokes Solver 17th

On a Template Programming Approach for Shared Memory Paral...

引用

17th international Scientific conference on parallel Computational Technologies, PCT 2023

作者： Evstigneev, N.M. Ryabkov, O.I. Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences Moscow Russia

ISBN: (纸本)9783031388637

In this paper, we consider a fully implicit Stokes solver implementation targeting both GPU and multithreaded CPU architectures. the solver is aimed at the semistructured mesh often emerging during permeability calculations in geology. the solver basically consists of four main parts: geometry and topology analysis, linear system construction, linear system solution, and postprocessing. A modified version of the AMGCL library developed by the authors in earlier research is used for the solution. Previous experiments showed that the GPU architecture can deliver extremely high performance for such types of problems, especially when the whole stack is implemented on the GPU. However, the GPU memory limitation significantly reduces the available mesh sizes. For some applications, the computation time is not as important as the mesh size. therefore, it is convenient to have both GPU (for example, CUDA) and multithreaded CPU versions of the same code. the direct code port is time-consuming and error-prone. Several automatic approaches are available: OpenACC standard, DVM-system, SYCL, and others. Often, however, these approaches still demand careful programming if one wants to deliver maximum performance for a specific architecture. Some problems (such as the analysis of connected components, in our case) require totally different optimal algorithms for different architectures. Furthermore, sometimes native libraries deliver the best performance and are preferable for specific parts of the solution. For these reasons, we used another approach, based on C++ language abilities as template programming. the main two components of our approach are array classes and ‘for each’ algorithms. Arrays can be used on both CPU and CUDA architectures and internally substitute the memory layout that best fits the current architecture (as an ‘array of structures’ or ‘structure of arrays’). ‘For each’ algorithms generate kernels or parallel cycles that implement parallel processing for ind

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

MARO: Enabling Full MPI Automatic Refactoring in DSL-Based Programming Framework 24th

MARO: Enabling Full MPI Automatic Refactoring in DSL-Based ...

引用

24th international conference on algorithms and architectures for parallel processing, ICA3PP 2024

作者： Lei, Tong Chen, Zongjing Che, Yonggang Xu, Chuanfu Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology Changsha410073 China National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Defense Technology Changsha410073 China

ISBN: (纸本)9789819615445

Currently, the landscape of computer hardware architecture presents the characteristics of heterogeneity and diversity, prompting widespread attention to cross-platform portable parallel programming techniques. Most existing portable programming approaches adopt the "MPI+X" strategy, focusing mainly on node-level "X" parallelism, while inter-node parallelization still relies on manual MPI programming. OP2 is a domain-specific language-based portable parallel programming framework for unstructured mesh applications that supports the automatic generation of MPI parallel code. However, programmers must manually write pre-processing code during code refactoring, such as handling the mapping relationship between global and process-local variables and the initial distribution of process-local data. In this paper, we propose MARO (MPI Automatic Refactoring for OP2), an automatic refactoring method for generating pre-processing code targeting MPI code generation within the OP2 programming framework for unstructured mesh applications. this method automatically generates the pre-processing code required for MPI parallelism through the source-to-source translator, enabling OP2 to generate MPI parallel code without requiring users to write MPI-related code manually. We compare the automatically refactored OP2 applications using the proposed method with those manually refactored to validate its effectiveness. the proposed method enhances OP2’s automatic parallel code generation capability, enabling it to achieve fully automatic MPI parallelization. © the Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

关键词： Automatic programming

来源：评论

学校读者我要写书评

暂无评论

Enhanced Cognitive Quality Inspection in Manufacturing through Siamese Few-Shot Learning 7

Enhanced Cognitive Quality Inspection in Manufacturing throu...

引用

7th IEEE Pune Section international conference, PuneCon 2024

作者： Dharaskar, Yog Parikh, Atharva Aloni, Ketaki Rajput, Vipulkumar Vishwakarma Institute of Information Technology Department of Computer Engineering Pune India Purdue University Department of Statistics West Lafayette United States Vishwakarma Institute of Information Technology Department of Electronics and Telecommunication Pune India

ISBN: (纸本)9798331527822

Defect detection in manufacturing remains challenging, with traditional methods relying on inflexible, hardcoded image processing techniques. While deep learning approaches show promise, they often lack scalability across product variations and require extensive training data. this paper presents a novel few-shot learning-based strategy for defect detection, addressing these limitations. Our proposed solution centers on a Siamese Defect classification model utilizing the Swin-L (Shifted Window Transformer) architecture. this approach enables efficient scaling across new products with minimal technical overhead. the models are trained on the industry-standard MVTEC dataset and optimized for real-time processing on embedded GPU devices, ensuring practical applicability in manufacturing environments. the developed prototype incorporates automated removal of defective items while preserving detailed information for subsequent analysis. By combining few-shot learning with advanced model architectures, our solution offers a reliable and highly scalable approach to defect detection. © 2024 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Enhancing Deep Learning-based Depression Level Estimation Based on Multi-task Learning 24

Enhancing Deep Learning-based Depression Level Estimation Ba...

引用

7th international conference on Machine Learning and Machine Intelligence, MLMI 2024

作者： Ye, Qingxin Xie, Zhenming Sun, Hao Xin, Luyao Chen, Youwen Song, Jian Chen, Yen-Wei Provincial key Laboratory of Computational Science Huaqiao University Quanzhou China Intelligent Image Processing Lab Zhejiang University Hangzhou China Intelligent Image Processing Lab Ritsumeikan University Osaka Japan

ISBN: (纸本)9798400717833

In recent years, depression, as a serious mental illness, has received widespread attention from various sectors of society. How to identify depressive emotions in a timely manner and detect depression has become an urgent issue that needs to be addressed. Currently, many experts and scholars propose using deep learning methods to assess the level of depression. Compared to other deep learning models, Transformer models have parallel computing capabilities with their self-attention mechanism, enabling them to handle entire sequences and capture long-distance dependencies. After the success of Transformers in natural language processing, similar architectures have been introduced into the field of computer vision, promoting advancements in multi-modal research. However, how to integrate other information or tasks to improve assessment accuracy remains a challenge. In our team's preliminary work, we proposed a multi-modal adaptive fusion Transformer network for assessing depression levels. Inspired by multi-task learning, we introduced gender classification as a second auxiliary task alongside the main task of depression regression, forming a three-task learning framework including depression classification. this was done to enhance the model's performance. We validated the effectiveness of the proposed method by testing it on a public dataset (AVEC 2019 Detecting Depression with AI Sub-challenge). the experimental results show that our proposed method achieved a Concordance Correlation Coefficient (CCC) of 0.60448, which is 1.688% higher than the accuracy of the dual-task learning method (CCC = 0.5876). © 2024 Copyright held by the owner/author(s).

关键词： Multi-task learning

来源：评论

学校读者我要写书评

暂无评论

Improving Utilization of Dataflow architectures through Software and Hardware Co-Design 29th

Improving Utilization of Dataflow Architectures Through Soft...

引用

29th international conference on parallel and Distributed Computing (Euro-Par)

作者： Fan, Zhihua Li, Wenming Tang, Shengzhong An, Xuejun Ye, Xiaochun Fan, Dongrui Chinese Acad Sci SKLP Inst Comp Technol Beijing Peoples R China Univ Chinese Acad Sci Beijing Peoples R China

ISBN: (纸本)9783031396977;9783031396984

Dataflow architectures can achieve much better performance and higher efficiency than general-purpose core, approaching the performance of a specialized design while retaining programmability. However, dataflow architectures often face challenges of low utilization of computational resources if the application algorithms are irregular. In this paper, we propose a software and hardware co-design technique that makes both regular and irregular applications efficient on dataflow architectures. First, we dispatch instructions between dataflow graph (DFG) nodes to ensure load balance. Second, we decouple threads within the DFG nodes into consecutive pipeline stages and provide architectural support. By time-multiplexing these stages on each PE, dataflow hardware can achieve much higher utilization and performance. We show that our method improves performance by gmean 2.55x (and up to 3.71x) over a conventional dataflow architecture (and by gmean 1.80x over Plasticine) on a variety of challenging applications.

关键词： Dataflow Architecture Decoupled Architecture

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共299页 << < 4 5 6 7 8 9 10 11 12 13 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：