The solution of tridiagonal linear systems is used in in various fields and plays a crucial role in numerical simulations. However, there is few efficient solver for tridiagonal linear systems on the new Sunway superc...
The solution of tridiagonal linear systems is used in in various fields and plays a crucial role in numerical simulations. However, there is few efficient solver for tridiagonal linear systems on the new Sunway supercomputer. Based on a three-dimensional heat conduction problem, we propose an adaptive heterogeneous tridiagonal matrix algorithm (AH-TDMA). Our major innovations include: (1) To address computational hotspots within AH-TDMA, a multi-level parallel approach involving MPI+Athread has been adopted. (2) An adaptive data partitioning scheme has been set up to achieve load balance. (3) Employing direct memory access and establishing shared space between registers and main memory, instead of employing discrete memory access, is done to enhance memory access efficiency. (4) The optimization of the loop structure has been made in adjusting the sequencing of the dual-layered loops to reduce communication overhead. The experimental results show that, with one core group, the AH-TDMA achieves a speedup of 99.6 times for hotspot and total time speedup up to 59.4 times compared to the Parallel and Scalable Library for Tridiagonal Matrix Algorithm (PaScal TDMA). The AH-TDMA is scalable up to 2048 core groups, with a parallel efficiency of 69.2%.
Matrix computing plays a vital role in many scientific and engineering applications, but previous work can only handle the data with specified precision based on FPGA. This study first presents algorithms, data flows,...
详细信息
Personalized Federated Learning (PFL) aims to acquire customized models for each client without disclosing raw data by leveraging the collective knowledge of distributed clients. However, the data collected in real-wo...
详细信息
Frontotemporal Dementia (FTD) diagnosis has been successfully progress using deep learning techniques. However, current FTD identification methods suffer from two limitations. Firstly, they do not exploit the potentia...
详细信息
Recently, Segmenting Anything Model has taken a significant step towards general artificial intelligence. Simultaneously, its reliability and fairness have garnered significant attention, particularly in the field of ...
详细信息
How to learn concepts from few-shot samples remains an open challenge in the deep learning era. The previous meta-learning methods require a large number of annotated samples in the training phase, which still contrib...
详细信息
With the development of information technology and the ubiquity of mobile devices, increasing amounts of data are generated, processed, and transmitted by mobile devices. To alleviate the tension between the energy po...
详细信息
With the development of computertechnology, statistics-based machine learning method has made great break-throughs, and also improved the development of artificial intelligence. Nevertheless, as a very influential mo...
详细信息
Due to its powerful representational capabilities, Transformers have gradually become the mainstream model in the field of machine vision. However, the vast and complex parameters of Transformers impede researchers fr...
ISBN:
(纸本)9798331314385
Due to its powerful representational capabilities, Transformers have gradually become the mainstream model in the field of machine vision. However, the vast and complex parameters of Transformers impede researchers from gaining a deep understanding of their internal mechanisms, especially error mechanisms. Existing methods for interpreting Transformers mainly focus on understanding them from the perspectives of the importance of input tokens or internal modules, as well as the formation and meaning of features. In contrast, inspired by research on information integration mechanisms and conjunctive errors in the biological visual system, this paper conducts an in-depth exploration of the internal error mechanisms of Transformers. We first propose an information integration hypothesis for Transformers in the machine vision domain and provide substantial experimental evidence to support this hypothesis. This includes the dynamic integration of information among tokens and the static integration of information within tokens in Transformers, as well as the presence of conjunctive errors therein. Addressing these errors, we further propose heuristic dynamic integration constraint methods and rule-based static integration constraint methods to rectify errors and ultimately improve model performance. The entire methodology framework is termed as Transformer Doctor, designed for diagnosing and treating internal errors within transformers. Through a plethora of quantitative and qualitative experiments, it has been demonstrated that Transformer Doctor can effectively address internal errors in transformers, thereby enhancing model performance. For more information, please visit https://***/.
Neural networks (NNs) are increasingly applied in safety-critical systems such as autonomous vehicles. However, they are fragile and are often ill-behaved. Consequently, their behaviors should undergo rigorous guarant...
详细信息
暂无评论