检索结果-内蒙古大学图书馆

21st IEEE International Symposium on parallel and Distributed processing with Applications, 13th IEEE International Conference on Big Data and Cloud Computing, 16th IEEE International Conference on Social Computing and Networking and 13th International Conference on Sustainable Computing and Communications, ISPA/BDCloud/SocialCom/SustainCom 2023

作者： Luo, Yongtao Liu, Jie Xiao, Tiaojie Gong, Chunye National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory Laboratory of Digitizing Program for Frontier Equipment Changsha China National Supercomputer Center in Tianjin Tianjin China

ISBN: (纸本)9798350329223

SHA-256 plays an important role in widely used applications, such as data security, data integrity, digital signatures, and cryptocurrencies. However, most of the current optimized implementations of SHA-256 are based on CPUs or dedicated hardware, such as ASICs and FPGAs. Consequently, there is a need to explore whether new heterogeneous parallel framework can improve the computational performance of the hash function. To address this issue, we conducted a study on the MT-3000 platform, which is a special architecture processor for the next-generation exascale prototype supercomputer. We proposed MT-SHA256, a heterogeneous multistage parallel implementation for hashing multiple messages on the MT-3000. Combining the architectural features of this processor, we developed an effective solution that significantly improved the computational performance of SHA-256. As a result, MT-SHA256 achieved a maximum throughput of 1045.68 MB/s on a single acceleration core of MT-3000. This is 9.84x higher than the C code implementation on one CPU core of MT-3000. We also performed a scalability test and found that MT-SHA256 achieved a throughput of 98.04 GB/s on a computing node, and extended to 512 nodes (2048 acceleration clusters) on this system with good scalability. © 2023 IEEE.

关键词： Hash functions

来源：评论

学校读者我要写书评

暂无评论

parallel Implementation of SHA256 on Multizone Heterogeneous Systems

Parallel Implementation of SHA256 on Multizone Heterogeneous...

引用

IEEE International Conference on Big Data and Cloud Computing (BdCloud)

作者： Yongtao Luo Jie Liu Tiaojie Xiao Chunye Gong Science and Technology on Parallel and Distributed Processing Laboratory Laboratory of Digitizing program for Frontier Equipment National University of Defense Technology Changsha China National Supercomputer Center in Tianjin Tianjin China

关键词：

来源：评论

学校读者我要写书评

暂无评论

A High-Throughput Multi-Cluster NoC Architecture

A High-Throughput Multi-Cluster NoC Architecture

引用

11th IEEE International Conference on Computational Science and Engineering (CSE 2008)

作者： Henrique C. Freitas Philippe O. A. Navaux Parallel and Distributed Processing Group Graduate Program in Computer Science Universidade Federal do Rio Grande do Sul Brazil

During the last years a large number of research works has focused on problems related to multi-core processors. Due to the possibilities of many cores, the number of opportunities in High Performance Computing (HPC) has grown a lot. In fact, new fields related to HPC and processor architecture increase the future possibilities of a Grid-on-Chip (GoC). The goal of this paper is to show a high-throughput MCNoC (Multi-Cluster Network-on-Chip) as an alternative architecture to support clusters of cores and Grid features. In this new scenario data throughput, flexibility, and scalability are very important. The results verify that MCNoC has a similar area occupation and a better data throughput than a traditional Network-on-Chip.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Evaluating On-Chip Interconnection Architectures for parallel processing

Evaluating On-Chip Interconnection Architectures for Paralle...

引用

IEEE International Conference on Computational Science and Engineering Workshops (CSEWORKSHOPS)

作者： Henrique Cota de Freitas Philippe Olivier Alexandre Navaux Pontifícia Universidade Católica de Minas Gerais Brazil Parallel and Distributed Processing Group Graduate Program Computer Science Universidade Federal do Rio Grande do Sul Belo Horizonte Brazil Parallel and Distributed Processing Group Graduate Program Computer Science Universidade Federal do Rio Grande do Sul Brazil

For the next processor generation, many cores and parallel programming will provide high-throughput and high-performance processing. As a consequence, research works have studied on-chip interconnection architectures to identify alternatives capable of decreasing the communication latencies. The objective of this paper is to present the evaluation of three well-known architectures (bus, crossbar switch and a conventional network-on-chip) in order to propose a multi-cluster network-on-chip architecture for parallel processing. The results show that a NoC composed of programmable routers and crossbar switches to interconnect clusters of cores has a better performance than conventional NoCs.

关键词： Magnetic cores Topology System-on-a-chip Computer architecture parallel processing Routing Delay

来源：评论

学校读者我要写书评

暂无评论

Journal of Information Science and Engineering: Editorial notice

引用

Journal of Information Science and Engineering 2005年第5期21卷 i-ii页

作者： Lee, Der-Tsai Amato, Nancy M. Chang, Shih-Fu Chen, Homer H. Chen, Tsuhan Hsu, Tsan-Sheng Hwang, Jenq-Neng Kuo, Sy-Yen Kuo, Tei-Wei Li, Chung-Sheng Tokuyama, Takeshi Wing, Jeannette Wu, Tzong-Chen Yen, John Computer Science at Texas A and M University United States Parasol Laboratory United States IEEE Transactions on Parallel and Distributed Systems Computing Research Association's Committee on the Status of Women in Computing Research CRA-W's Distributed Mentor Program United States Department of Electrical Engineering Columbia University United States Lab. United States College of Electrical Engineering and Computer Science National Taiwan University Taiwan IEEE Transactions on Circuits and Systems for Video Technology IEEE Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh PA United States ACM ACM SIGACT IEEE Computer Society IICM Austria Research and Development of the Department Multimedia Signal Processing Technical Committee IEEE Signal Processing Society United States IEEE Transactions on Circuits and systems for Video Technology College of Electrical Engineering and Computer Science National Taiwan Ocean University Keelung Taiwan Department of Electrical Engineering National Taiwan University Taiwan Department of Computer Science and Information Engineering National Taiwan University Taipei Taiwan IEEE Technical Committee on Real-Time Systems Computer Science Division IBM T.J. Watson Research Center United States IBM Research Division Graduate School of Information Sciences Tohoku University Japan ACM IPSJ Mathematical Society of Japan Japan Department of Computer Science Computer Science Department Carnegie Mellon University United States National Academies of Science's Computer Science and Telecommunications Board United States Microsoft's Trustworthy Academic Advisory Board Intel Research Pittsburgh's Advisory Board United States Dartmouth's Institute for Security Technology Studies Advisory Committee Canada Sloan Research Fellowships Program Committee United States ACM Taiwan China Information Sciences and Technology Pennsylvania State University United States Laboratory for Intelligent Agents Penn State's School of Info

No abstract available

关键词：

来源：评论

学校读者我要写书评

暂无评论

Optimum design technique for optoelectronic devices using simulated annealing

引用

ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS 1996年第1期79卷 22-32页

作者： Hara, K Iwamoto, T Kyuma, K Member Semiconductor Research Laboratory Mitsubishi Electric Corporation Amagasaki Japan 661 Received his B.S. and M.S. degrees in Electronics Engineering from Osaka University in 1986 and 1988 respectively. He joined Mitsubishi Electric Corporation in 1988. At present he is with the Neural & Parallel Processing Technology Development Center. He is involved in research on optical switches optical device simulators artificial retina chip and digital neuro-chip. He is a member of the Society of Applied Physics. Nonmember Received his B.S. and M.S. degrees in Physics and Methodology from Kyoto University in 1985 and 1987 respectively. He completed the required course work in the doctoral program of the same university in 1990. He joined Mitsubishi Electric Corporation in 1990. At present he is with the Neural & Parallel Processing Technology Development Center. He is involved in research on information processing using nonlinear motive power and optimization. He is a member of the Japan Society of Physics. Received his B.S. and Ph.D. degrees in Electronics Engineering from Tokyo Institute of Technology in 1972 and 1977 respectively. He joined Mitsubishi Electric Corporation in 1977. At present he is the head of the Neural & Parallel Processing Technology Development Center. From 1985 to 1986 he was an invited scholar at California Institute of Technology in the United States. He is involved in research on neural theory optical neural networks artificial retina chip VLSI neuro-chips and optical fiber sensors. He is a member of IEEE OSA International Neural Network Society Society of Applied Physics and Society of Instrumentation and Control.

A new procedure for the optimum design of optoelectronic devices is explained in this paper and an automatic search is made simultaneously for the structure satisfying various demands. The feature of this procedure is in the introduction of cost by which the quantitative evaluation of the structure becomes possible and the global search for the required structure by simulated annealing can be carried out. First, the definition of cost and details of the optimization procedure are clarified. In optimization, in addition to convergence to the minimum point of the cost function (the optimum configuration from theoretical viewpoint), the convergence also is possible in structures with great tolerance to fabrication errors (neighborhood cost (NC) and finite temperature annealing (FTA) methods). Next, these three proposed methods are used in the design of a pnpn differential optical switch and the effectiveness of the methods is verified. The method of cost expression, the relation between annealing parameters, and convergence are investigated. It is shown that cost expression with large degree of freedom improves the search for high-performance structures and the initial temperature of annealing or the fixed temperature of FTA method is the important parameter which sets up the probability of acceptance. Further, it is shown that the convergence cost is inversely proportional to the time spent in annealing. These results are useful guidelines in the optimum design of arbitrary optoelectronic devices.

关键词： optoelectronic devices cost function device design optimization convergence

来源：评论

学校读者我要写书评

暂无评论

AN ANALYSIS OF THE HOT-SPOT CONTENTION AND MESSAGE COMBINING ON THE SIMPLE SERIAL SYNCHRONIZED-MULTISTAGE INTERCONNECTION NETWORK

引用

SYSTEMS AND COMPUTERS IN JAPAN 1995年第9期26卷 1-12页

作者： GAYE, K HANAWA, T AMANO, H Member Faculty of Science and Technology Keio University Yokohama Japan 223 Nonmembers Tosbihiro Hanawa:received his B.E. degree in 1993 from Dept. Electrical Eng. Fac. Sci. Tech. Keio Univ. where he is presently in the Master's program. He is interested in performance analysis of interconnection network for parallel computers. Hidebaru Amano:received his B.E. and Dr. of Eng. degrees in 1981 and 1986 respectively from Keio Univ. He is engaged in research on parallelcomputer system. Presently Assoc. Prof. Dept. Electrical Eng. Fac. Sci. Tech. Keio Univ. He is co-author ofDigital Circuit for Everybody and Parallel Processing Mechanism.

Simple serial synchronized (SSS) multistage interconnection network (MIN) is a processor-memory connection network that has a high performance/cost ratio, where the packet is inputted and switch synchronously in the MIN, which has a high pass-through ratio and is composed of simple elements. This paper evaluates the effect of the hot spot contention and the effect of the synchronous bit-serial (SBS) message combining in SSS-MIN, by the theoretical analysis based on probability and simulation. In contrast to conventional MIN, there does not arise a complete tree saturation in SSS-MIN, but an area, to which the access is difficult, is produced according to the relative position to the hot spot contention. From such a viewpoint, an analysis method for the pass-through ratio is presented, which considers the position of the switching element to the hot spot. It is verified as a result of evaluation that the proposed method of analysis gives a result close to that of simulation, so long as the access to the hot spot and the connection network architecture stay within a practical range. It is also seen that the pass-through ratio is deteriorated less in SSS-MIN by the hot spot contention than in the conventional MIN, and the effect can be almost completely eliminated by the SBS message combining. When a multiprocessor system is actually constructed, performance deterioration due to hot spot contention is greater than in the case where only the pass-through ratio is considered. This can also be eliminated almost completely by the SBS message combining.

关键词： MULTISTAGE NETWORK MESSAGE COMBINING, PERFORMANCE EVALUATION MULTIPROCESSOR

来源：评论

学校读者我要写书评

暂无评论

A flexible architectural study methodology

A flexible architectural study methodology

引用

International Workshop on Graph Reduction, 1986

作者： Tighe, Steven Zink, Ken Brice, Richard Alexander, William Parallel Processing Program MCC United States Database Program MCC United States

ISBN: (纸本)9783540184201

An efficient emulation/simulation system for evaluating architectures and scheduling strategies for reduction systems is described. Execution traces of example programs are generated by the emulator. The execution method of the emulator exercises all possible parallelism available in the execution model under study. The trace of each program execution is then reduced to an "architecturally neutral" precedence graph. The precedence graph can then be used repeatedly in simulations to study the effects of changes in architecture or scheduling strategy. © 1987, Springer-Verlag.

关键词： Architecture

来源：评论

学校读者我要写书评

暂无评论

Faster Architectural Simulation through parallelism

Faster Architectural Simulation through Parallelism

引用

Design Automation Conference

作者： J.W. Smith K.S. Smith R.J. Smith Endot Inc. Cleveland Ohio Parallel Processing Program at the Microelectronics Computer Technology Corporation Austin TX USA

Architectural simulation of complex systems is usually constrained by available computational resources. Recently, several commercial parallel processing systems have appeared with price-performance levels that make very intense simulations affordable. In this paper, we briefly review architectural simulation technology, then describe the approach used to develop a parallel architectural simulator. Performance of the parallel simulator is then experimentally characterized and analyzed. This study is one of the earliest to report measured performance of a widely-used commercial simulator, running non-trivial designs on a popular parallel computing system.

关键词： Computational modeling parallel processing Acceleration Permission Analytical models Trademarks Operating systems Proposals Kernel Performance analysis

来源：评论

学校读者我要写书评

暂无评论

Software of silicon? the designer's option

引用

Proceedings of the IEEE 1986年第6期74卷 861-874页

作者： A.C. Hartmann Parallel Processing Program Microelectronics and Computer Technology Corporation Austin TX USA

Traditionally, the bulk of computer system functionality is implemented in the software medium, as a sequence of instructions for a general-purpose processor. Historically, this has provided the best balance of flexibility, cost, and performance. The new economics of VLSI and continuing advances in VLSI CAD capability open the possibility of application-specific functionality embedded in silicon as a matter of routine. This paper presents several case studies of silicon solutions used in typical software areas, including regular language recognition, Ada program unit replacement, dictionary machines, and string pattern matching. Either software or hardware designers may benefit from a study of such architectures, and Organick's notion of heterosystems designers proficient in both domains is supported.

关键词： Silicon Very large scale integration Computer aided instruction Costs Design automation Pattern recognition Dictionaries Pattern matching Hardware Software design

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：