Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form...
详细信息
Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form is very important for the research on HPC software and hardware technologies. To effectively evaluate the per- formance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation plat- form, called HPC-NetSim. HPC-NetSim uses application- driven workloads and inherits the characteristics of the de- tailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router's on/off states. We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.
Communication and coordination between OSS developers who do not work physically in the same location have always been the challenging *** pull-based development model,as the state-of-art collaborative development mec...
详细信息
Communication and coordination between OSS developers who do not work physically in the same location have always been the challenging *** pull-based development model,as the state-of-art collaborative development mechanism,provides high openness and transparency to improve the visibility of contributors'***,duplicate contributions may still be submitted by more than one contributors to solve the same problem due to the parallel and uncoordinated nature of this *** not detected in time,duplicate pull-requests can cause contributors and reviewers to waste time and energy on redundant *** this paper,we propose an approach combining textual and change similarities to automatically detect duplicate contributions in pull-based model at submission *** a new-arriving contribution,we first compute textual similarity and change similarity between it and other existing *** then our method returns a list of candidate duplicate contributions that are most similar with the new contribution in terms of the combined textual and change *** evaluation shows that 83.4%of the duplicates can be found in average when we use the combined textual and change similarity compared to 54.8%using only textual similarity and 78.2%using only change similarity.
Transformer-based methods have demonstrated remarkable performance on image super-resolution tasks. Due to high computational complexity, researchers have been working to achieve a balance between computation costs an...
详细信息
In this paper, we propose an approach to assess the ability of developers based on their behavior data from OSS. Specifically, we classify developers' ability into code ability, project management ability, and soc...
详细信息
Multidimensional parallel training has been widely applied to train large-scale deep learning models like GPT-3. The efficiency of parameter communication among training devices/processes is often the performance bott...
详细信息
The self-attention mechanism is the core component of Transformer, which provides a powerful ability to understand the sequence context. However, the self-attention mechanism also suffers from a large amount of redund...
详细信息
Symbolic execution is an effective path oriented and constraint based program analysis technique. Recently, there is a significant development in the research and application of symbolic execution. However, symbolic e...
详细信息
In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features e...
详细信息
In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features effectively supporting high performance communications, ranging over remote direct memory access, collective optimization, hardwareenable reliable end-to-end communication, user-level message passing services, etc. Measured hardware performance results are also presented.
Dear editor,Docker1), as a de-facto industry standard [1], enables the packaging of an application with all its dependencies and execution environment in a light-weight, self-contained unit, i.e., *** launching the co...
详细信息
Dear editor,Docker1), as a de-facto industry standard [1], enables the packaging of an application with all its dependencies and execution environment in a light-weight, self-contained unit, i.e., *** launching the container from Docker image, developers can easily share the same operating system, libraries, and binaries [2]. As the configuration file, the dockerfile plays an important role,
Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large ...
详细信息
暂无评论