Class incremental learning has received wide attention due to its better adaptability to the changing characteristics of online learning. However, data cannot be shared between organizations in the data isolated islan...
详细信息
ISBN:
(数字)9781728143286
ISBN:
(纸本)9781728143293
Class incremental learning has received wide attention due to its better adaptability to the changing characteristics of online learning. However, data cannot be shared between organizations in the data isolated islands scenario. Existing solutions cannot adapt to incremental classes without aggregating data. In this paper, we propose a distributed class-incremental learning framework, called DCIGAN. It uses GAN generators to store the information of past data and continuously update GAN parameters with new data. In particular, we proposed CIGAN to ensure that the distribution of generated pseudo data is as close as possible to the real data on a single node, which guarantees the accuracy of class-incremental learning. Furthermore, we propose GF, a generators fusion method to integrate local generators of multi-nodes into a new global generator. To evaluate the performance of DCIGAN, we conduct experiments on six datasets under various parameter settings on both two and multi nodes distributed scenarios. Extensive experiments confirm that DCIGAN outperform the general baselines and achieve classification accuracy closes to the method of data aggregating.
With the frequent occurrence of traffic accidents, the timely acquisition of traffic images using Vehicular Ad Hoc Network (VANET) technology is of great significance for traffic rescue and evidence preservation. Howe...
详细信息
ISBN:
(数字)9781728143286
ISBN:
(纸本)9781728143293
With the frequent occurrence of traffic accidents, the timely acquisition of traffic images using Vehicular Ad Hoc Network (VANET) technology is of great significance for traffic rescue and evidence preservation. However, due to the limited bandwidth of VANET, the amount of data transmitted is limited. In response to this problem, this paper designs an image acquisition algorithm. The algorithm selects several images from the driving recorder of the vehicles around the accident. These images are transmitted to the traffic management department. In addition, the selected images have strong temporal and spatial correlation, so they can be compressed by compressed sensing technology to further reduce the amount of data. Therefore, the method of processing images on the vehicle node using compressed sensing technology are proposed. Finally, the effectiveness of the image acquisition algorithm is verified by simulation experiments.
Some enterprises have demands for distributed stream processing framework to manage and process diverse business services. But their business is usually user-oriented, the access data sources and outputs are heterogen...
详细信息
ISBN:
(数字)9781728143286
ISBN:
(纸本)9781728143293
Some enterprises have demands for distributed stream processing framework to manage and process diverse business services. But their business is usually user-oriented, the access data sources and outputs are heterogeneous and need to be configurable, and the user-oriented service mode requires the platform have the ability to deploy new algorithm dynamically and provide on-demand services. As we know, there is no system that can meet all these requirements, hence a Model Driven distributed Real-time Stream Processing Framework is proposed. By building project and task models, the functional model is separated from the implementation platform. The framework support multi-source heterogeneous data input and configurable output, on-demand services based on user's request, and algorithm dynamic loading. Service oriented architecture is deployed to lower the threshold of algorithm development and deployment. Furthermore, the functionality and performance is evaluated with a practical case to verify whether the system can meet the needs of practical application.
Predictive modeling is an important part of the monitoring process in cloud computing systems that helps to improve the service availability for the customers. This paper describes two industrial examples of predictiv...
详细信息
ISBN:
(纸本)9783319994475;9783319994468
Predictive modeling is an important part of the monitoring process in cloud computing systems that helps to improve the service availability for the customers. This paper describes two industrial examples of predictive monitoring models for database disk space utilization and Java memory leaks. Practical recommendations are given to improve the forecast accuracy, which can also be used in the other similar cases. The results of this work are validated in the open source monitoring system and are implemented in three big international telecommunications companies.
The dense Tucker decomposition method is one of the most popular algorithms for analyzing and compressing data with multi-way relationship. Its execution time is typically dominated by dense matrix multiplication oper...
详细信息
ISBN:
(纸本)9781538683842
The dense Tucker decomposition method is one of the most popular algorithms for analyzing and compressing data with multi-way relationship. Its execution time is typically dominated by dense matrix multiplication operations, which makes it well-suited for GPU acceleration. State-of-the-art distributed dense Tucker implementations for CPU clusters adopt multi-dimensional partitioning that optimizes for storage and communication. This, however, leads to smaller matrix dimensions that result in under-utilizing the GPU resources. In this paper, we present our optimized implementation and performance analysis of dense Tucker decomposition on a multi-GPU cluster. We propose three key optimizations: a new partitioning strategy that improves performance for GPUs, a new tensor matricization layout that halves the number of communication and matricization steps, and a variation of the randomized SVD algorithm to overcome the eigenvalue calculation bottleneck that arises from the high speedup gained from GPU acceleration. When compared to the state-of-the-art TuckerMPI library, our best GPU implementation, which employs all three optimizations described above, achieves up to 11.8x speedup on 64 nodes. Our best CPU implementation, which also employs all three optimizations, achieves up to 3.6x speedup over TuckerMPI on 64 nodes. When we compare our best GPU implementation to our best CPU implementation, the speedup ranges from 2.1x to 3.6x on a single node, and from 1.8x to 3.3x on 64 nodes, depending on the input data set.
The paradigm of ultra-scale computing has been recently pushed forward by the current trends in distributedcomputing. This novel architecture concept is focused towards a federation of multiple geographically distrib...
详细信息
ISBN:
(数字)9783319751788
ISBN:
(纸本)9783319751788;9783319751771
The paradigm of ultra-scale computing has been recently pushed forward by the current trends in distributedcomputing. This novel architecture concept is focused towards a federation of multiple geographically distributed heterogeneous systems under a single system image, thus allowing efficient deployment and management of very complex architectures applications. To enable sustainable ultra-scale computing, there are multiple major challenges, which have to be tackled, such as, improved data distribution, increased systems scalability, enhanced fault tolerance, elastic resource management, low latency communication and etc. Regrettably, the current research initiatives in the area of ultra-scale computing are in a very early stage of research and are predominantly concentrated on the management of the computational and storage resources, thus leaving the networking aspects unexplored. In this paper we introduce a promising new paradigm for cluster-based Multi-objective service-oriented network provisioning for ultra-scale computing environments by unifying the management of the local communication resources and the external inter-domain network services under a single point of view. We explore the potentials for representing the local network resources within a single distributed or parallel system and combine them together with the external communication services.
The proceedings contain 112 papers. The topics discussed include: superposition coding in alternate DF relaying systems with inter-relay interference cancellation;transmit antenna selection in OFDM relay system as a s...
ISBN:
(纸本)9781538638392
The proceedings contain 112 papers. The topics discussed include: superposition coding in alternate DF relaying systems with inter-relay interference cancellation;transmit antenna selection in OFDM relay system as a solution for energy efficiency improvement;reliable multipath multi-channel route migration over multi link-failure in wireless ad hoc networks;WebShawn, simulating wireless sensors networks from the web;a distributed data management scheme for industrial IoT environments;work in progress: compilation of environmental data through portable low-cost sensors with delay-tolerant mobile ad-hoc networks;and optimal aggregated ConvergeCast scheduling with an SINR interference model.
Deep learning techniques have revolutionized many areas including computer vision and speech recognition. While such networks require tremendous amounts of data, the requirement for and connection to Big Data storage ...
详细信息
ISBN:
(纸本)9781538650356
Deep learning techniques have revolutionized many areas including computer vision and speech recognition. While such networks require tremendous amounts of data, the requirement for and connection to Big Data storage systems is often undervalued and not well understood. In this paper, we explore the relationship between Big Data storage, networking, and Deep Learning workloads to understand key factors for designing Big Data/Deep Learning integrated solutions. We find that storage and networking bandwidths are the main parameters determining Deep Learning training performance. Local data caching can provide a performance boost and eliminate repeated network transfers, but it is mainly limited to smaller datasets that fit into memory. On the other hand, local disk caching is an intriguing option that is overlooked in current state-of-the-art systems. Finally, we distill our work into guidelines for designing Big Data/Deep Learning solutions.
The proceedings contain 64 papers. The topics discussed include: redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on sunway TaihuLight;18.9-Pflops nonlinear earthquake simulatio...
ISBN:
(纸本)9781450351140
The proceedings contain 64 papers. The topics discussed include: redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on sunway TaihuLight;18.9-Pflops nonlinear earthquake simulation on sunway TaihuLight: enabling depiction of 18-hz and 8-meter scenarios;massively parallel 3D image reconstruction;LocoFS: a loosely-coupled metadata service for distributed file systems;tagit: an integrated indexing and search service for file systems;a configurable rule based classful token bucket filter network request scheduler for the luster file system;and understanding error propagation in deep learning neural network (DNN) accelerators and applications.
As data centre traffic dynamics are changing, optical networking is becoming increasingly important for low-latency, high bandwidth intra-data centre communication. Nanosecond-reconfigurable, scalable photonic switch ...
详细信息
ISBN:
(纸本)9781538693926
As data centre traffic dynamics are changing, optical networking is becoming increasingly important for low-latency, high bandwidth intra-data centre communication. Nanosecond-reconfigurable, scalable photonic switch fabrics and advances in photonic integration are key enablers for optical packet switching. However, the control plane and in particular the switch scheduler is believed to be a critical factor on packet latency and scalability. To that end, we report a low-latency scheduler for Clos-network switch fabrics based on a fixed path assignment scheme and parallel and distributed path arbitration. Cycle-accurate network emulation results show nanosecond average latency at input port loads up to 60% of capacity for a 256x256 switch size. In comparison to previous work, the scheduler can now control a switch 8 times the size at double the input port saturation load for the same average latency. Scaling the switch from 16 to 256 ports shows only a small drop in saturation load from 70% to 60%. Also fairness on a per flow basis is demonstrated.
暂无评论