The self-attention mechanism is the core component of Transformer, which provides a powerful ability to understand the sequence context. However, the self-attention mechanism also suffers from a large amount of redund...
详细信息
Dear editor,Docker1), as a de-facto industry standard [1], enables the packaging of an application with all its dependencies and execution environment in a light-weight, self-contained unit, i.e., *** launching the co...
详细信息
Dear editor,Docker1), as a de-facto industry standard [1], enables the packaging of an application with all its dependencies and execution environment in a light-weight, self-contained unit, i.e., *** launching the container from Docker image, developers can easily share the same operating system, libraries, and binaries [2]. As the configuration file, the dockerfile plays an important role,
Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large ...
详细信息
As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking ...
详细信息
Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the infl...
详细信息
Time series data are pervasive in varied real-world applications, and accurately identifying anomalies in time series is of great importance. Many current methods are insufficient to model long-term dependence, wherea...
In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features e...
详细信息
In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features effectively supporting high performance communications, ranging over remote direct memory access, collective optimization, hardwareenable reliable end-to-end communication, user-level message passing services, etc. Measured hardware performance results are also presented.
Large-scale models have demonstrated outstanding performance across various downstream tasks. Pipeline parallelism is essential for fine-tuning large models on commodity GPU servers, as it plays a crucial role in maki...
详细信息
We consider the maximal vector problem on uncertain data, which has been recently posed by the study on processing skyline queries over a probabilistic data stream in the database context. Let D n be a set of n points...
详细信息
We consider the maximal vector problem on uncertain data, which has been recently posed by the study on processing skyline queries over a probabilistic data stream in the database context. Let D n be a set of n points in a d-dimensional space and q (0 < q 1) be a probability threshold; each point in D n has a probability to occur. Our problem is concerned with how to estimate the expected size of the probabilistic skyline, which consists of all the points that are not dominated by any other point in D n with a probability not less than q. We prove that the upper bound of the expected size is O(min{n, (- ln q)(ln n) d-1 }) under the assumptions that the value distribution on each dimension is independent and the values of the points along each dimension are distinct. The main idea of our proof is to find a recurrence about the expected size and solve it. Our results reveal the relationship between the probability threshold q and the expected size of the probabilistic skyline, and show that the upper bound is poly-logarithmic when q is not extremely small.
暂无评论