As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking ...
As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking the financial constraints faced by most researchers. To attain the best performance within the cost limitation, we introduce a throughput-cost metric to accurately characterize clusters' cost-effectiveness. Based on this metric, we design a cost-effective cluster featuring the 3090 with NVLink. The experiment results demonstrate that our cluster achieves remarkable cost-effectiveness in various distributed model training schemes.
Encryption technology has become an important mechanism of securing data stored in the outsourced database. However, it is a difficulty to query efficiently the encrypted data and many researchers take it into conside...
详细信息
Multidimensional parallel training has been widely applied to train large-scale deep learning models like GPT-3. The efficiency of parameter communication among training devices/processes is often the performance bott...
Multidimensional parallel training has been widely applied to train large-scale deep learning models like GPT-3. The efficiency of parameter communication among training devices/processes is often the performance bottleneck of large model training. Analysis of parameter communication mode and traffic has important reference significance for the research of interconnection network design and computing task scheduling to improve the training performance. In this paper, we analyze the parametric communication modes in typical 3D parallel training (data parallelism, pipeline parallelism, and tensor parallelism), and model the traffic in different communication modes. Finally, taking GPT-3 as an example, we present the communication in its 3D parallel training.
In this paper, we introduce a generic model to deal with the event matching problem of content-based publish/ subscribe systems over structured P2P overlays. In this model, we claim that there are three methods (event...
详细信息
The deep neural named entity recognition model automatically learns and extracts the features of entities and solves the problem of the traditional model relying heavily on complex feature engineering and obscure prof...
详细信息
Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the infl...
Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the influence of the observed area changing with distance. In addition, The current sampling strategies are all focused on the distribution of sampling points on the ray, without paying attention to the sampling of the ray. We found that the current ray sampling strategy for scenes with the camera moving forward severely reduces the convergence speed. In this work, we extend the point representation to area representation by using relative positional encoding, and propose a ray sampling strategy that is suitable for camera trajectory moving forward. We validated the effectiveness of our method on multiple public datasets.
Constant degree peer-to-peer (P2P) system is turning into the P2P domain's promising hotspot due to constant degree digraphs having good propertis. However, it is often hard to convert a stardard constant degree d...
详细信息
Constant degree peer-to-peer (P2P) system is turning into the P2P domain's promising hotspot due to constant degree digraphs having good propertis. However, it is often hard to convert a stardard constant degree digraph to a DHT schema. Thus, most researches focus on DHT's construction and maintenance, while leaving optimization and supporting to complex query behind. Underlying topology affects upper-layers' character a lot. For constant degree P2P topologies, their inherent property makes a constant degree P2P system built using classical technique be poor in the data locality, and unfit for efficient, low-cost complex queries. Aiming at this shortage, a general-purpose construction technique towards efficient complex queries is proposed, which adds an embedding transformation layer between data layer and DHT overlay. In this way, adjacent data are stored in overlay's adjacent peers and the data locality is improved, so that the number of peers referred in complex queries can be minimized with a limited time overhead. To validate this technology, the first constant degree P2P system based on Kautz digraph FissionE is reconstructed as a typical example, which includes re-allocating of resources, query algorithm and locality maintenance strategies. Experimental results show that this construction technique can ensure data locality, reduce query cost and lead to systems' efficiency without changing the underlying DHT layer.
Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large ...
Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large models available to the general public. Previous solutions fail to achieve an efficient memory-balanced pipeline parallelism. In this poster, we introduce a memory load-balanced pipeline parallel solution. This solution balances memory consumption across stages on commodity GPU servers via NVLink bridges. It establishes a new pathway to offload data from GPU to CPU by using the PCIe link of adjacent GPUs connected by the NVLink bridge. Furthermore, our method orchestrates offload operations to minimize the offload latency during large model fine-tuning. Experiments demonstrate that our solution can balance the memory footprint among pipeline stages without sacrificing training performance.
Data distribution is a key technology for resources convergence and sharing in distributed environment. To better meet the requirement for real time data distribution in the dynamic network, a trace routing algorithm ...
详细信息
Decision trees are famous machine learning classifiers which have been widely used in many areas, such as healthcare, text classification and remote diagnostics, etc. The service providers usually host a decision tree...
详细信息
暂无评论