Stream clustering is an important data mining technique to capture the evolving patterns in real-time data streams. Today's data streams, e.g., IoT events and Web clicks, are usually high-speed and contain dynamic...
详细信息
ISBN:
(纸本)9781728170022
Stream clustering is an important data mining technique to capture the evolving patterns in real-time data streams. Today's data streams, e.g., IoT events and Web clicks, are usually high-speed and contain dynamically-changing patterns. Existing stream clustering algorithms usually follow an online-offline paradigm with a one-record-at-a-time update model, which was designed for running in a single machine. These stream clustering algorithms, with this sequential update model, cannot be efficiently parallelized and fail to deliver the required high throughput for stream clustering. In this paper, we present DistStream, a distributed framework that can effectively scale out online-offline stream clustering algorithms. To parallelize these algorithms for high throughput, we develop a mini-batch update model with efficient parallelization approaches. To maintain high clustering quality, DistStream's mini-batch update model preserves the update order in all the computation steps during parallel execution, which can reflect the recent changes for dynamically-changing streaming data. We implement DistStream atop Spark Streaming, as well as four representative stream clustering algorithms based on DistStream. Our evaluation on three real-world datasets shows that DistStream-based stream clustering algorithms can achieve sublinear throughput gain and comparable (99%) clustering quality with their single-machine counterparts.
Heterogeneous multiprocessor platforms are becoming widespread in the embedded system domain, mainly for the opportunity to improve timing performance and to minimize energy/power consumption and costs. Therefore, whe...
详细信息
The proceedings contain 9 papers. The special focus in this conference is on distributed Computing for Emerging Smart Networks. The topics include: A Comparative Study of Vehicle Detection Methods in a Video Sequence;...
ISBN:
(纸本)9783030401306
The proceedings contain 9 papers. The special focus in this conference is on distributed Computing for Emerging Smart Networks. The topics include: A Comparative Study of Vehicle Detection Methods in a Video Sequence;energy Efficient Handshake Algorithm for Wireless Sensor Networks;Inter-slice Mobility Management in the Context of SDN/NFV Networks;on a New Quantization Algorithm for Secondary User Scheduling in 5G Network;an Efficient Fault-Tolerant Scheduling Approach with Energy Minimization for Hard real-time Embedded systems;using Dynamic Bayesian Networks to Solve Road Traffic Congestion in the Sfax City;energy Efficient Target Coverage in Wireless Sensor Networks Using Adaptive Learning.
作者:
Junhong SunTianhao LiTongtong GuoYongfu LiChangyun FuYan LiuDeren of Micro-Nno Eecronics
Shnghi Jio Tong Universiy ChinDeren of Micro-Nno Eecronics Shnghi Jio Tong Universiy ChinDeren of Micro-Nno Eecronics Shnghi Jio Tong Universiy ChinDeren of Micro-Nno Eecronics Shnghi Jio Tong Universiy ChinDeren of Micro-Nno Eecronics Shnghi Jio Tong Universiy ChinDeren of Micro-Nno Eecronics Shnghi Jio Tong Universiy Chin
Brain machine interface systems will require recording thousands of neural channels in parallel to acquire large scale neuronal activity. High bandwidth action potential signal will overload the data communication ban...
详细信息
ISBN:
(数字)9781665484855
ISBN:
(纸本)9781665484862
Brain machine interface systems will require recording thousands of neural channels in parallel to acquire large scale neuronal activity. High bandwidth action potential signal will overload the data communication bandwidth, and on-site spike sorting can extract essential information, however, requires extensive computational resources to achieve high classification accuracy. This demands for high resources consuming, especially in large-scale real-time sorting systems. In this work, a customized unsupervised training engine incorporated with distributed and optimized sorting channels is presented in order to reduce the hardware complexity without compromising the accuracy of spike sorting. A mixed-domain feature set is extracted in each channel, followed by feature based sorting. Each channel will constantly monitor the sorting accuracy and will request training engine intervention when in need. The proposed system is implemented in a 180 nm CMOS process, consuming only 0.33 W/channel with a clock of 25 kHz and power supply of 1.8 V, and in-channel sorting occupies 0.0023 mm~2, with training engines occupying 1.956 mm~2, which can be shared by all the channels.
Traffic data collection and information extraction have been a wide area of study for various objectives. One such objective is to predict the nature of traffic in a particular road region followed by its visualizatio...
详细信息
ISBN:
(数字)9789811524493
ISBN:
(纸本)9789811524493;9789811524486
Traffic data collection and information extraction have been a wide area of study for various objectives. One such objective is to predict the nature of traffic in a particular road region followed by its visualization. The primary objective of this paper is to analyze the traffic big data using two comparative parallel algorithms of M5P rules and random forest regression for determining the average journey time based on other parameters related to nature of traffic such as flow, time of the day. These algorithms have been implemented in a distributed computing environment in Spark clusters using Apache Mesos resource management. The secondary objective of the paper is to visualize the correlation of average journey time with the flow of traffic and plotting comparative graphs for the real and predicted values of the average journey time. Based on root-mean-square error, mean absolute error, and other performance parameters like correlation coefficient, this paper concludes that parallel algorithms fared better in terms of prediction accuracy and error rates than traditional regression methods.
While various software development methodologies have been proposed to increase the design productivity and maintainability of software, they usually focus on the develop- ment of application software running on a sin...
While various software development methodologies have been proposed to increase the design productivity and maintainability of software, they usually focus on the develop- ment of application software running on a single processing element, without concern about the non-functional requirements of an embedded system such as latency and re- source requirements. In this thesis, we present a model-based software development method for paral- lel and distributed embedded systems. An application is specified as a set of tasks that follow a set of given rules for communication and synchronization in a hierarchical fash- ion, independently of the hardware platform. Having such rules enables us to perform static analysis to check some software errors at compile time to reduce the verification difficulty. Platform-specific program is synthesized automatically after mapping of tasks onto processing elements is determined. The program synthesizer is also proposed to generate codes which satisfies platform requirements for parallel and distributed embedded systems. As multiple models which can express dynamic behaviors can be depicted hierarchically, the synthesizer supports to manage multiple task graphs with a different hierarchy to run tasks with parallelism. Also, the synthesizer shows methods of managing codes for heterogeneous platforms and generating various communication methods. The viability of the proposed software de- velopment method is verified with a real-life surveillance application that runs on six pro- cessing elements with three remote communication methods, and remote deep learning example is conducted to use heterogeneous multiprocessing components on distributedsystems. Also, supporting a new platform and network requires a small effort by measur- ing and estimating development costs. Since tolerance to unexpected errors is a required feature of many embedded sys- tems, we also support an automatic fault-tolerant code generation. Fault tolerance can be ap
Many applications rely on distributed databases. However, only few discovery methods exist to extract patterns without centralizing the data. In fact, this centralization is often less expensive than the communication...
详细信息
ISBN:
(纸本)9783030548315;9783030548322
Many applications rely on distributed databases. However, only few discovery methods exist to extract patterns without centralizing the data. In fact, this centralization is often less expensive than the communication of extracted patterns from the different nodes. To circumvent this difficulty, this paper revisits the problem of pattern mining in distributed databases by benefiting from pattern sampling. Specifically, we propose the algorithm DDSAMPLING that randomly draws a pattern from a distributed database with a probability proportional to its interest. We demonstrate the soundness of DDSAMPLING and analyze its time complexity. Finally, experiments on benchmark datasets highlight its low communication cost and its robustness. We also illustrate its interest on real-world data from the Semantic Web for detecting outlier entities in DBpedia and Wikidata.
The increasing penetration of converter-based generation in many power systems around the world has sparked a discussion about how to operate these power systems with the usual levels of efficiency, reliability and co...
详细信息
The increasing penetration of converter-based generation in many power systems around the world has sparked a discussion about how to operate these power systems with the usual levels of efficiency, reliability and cost-effectiveness. Current grid-following converter-based generators have proven to run stably in parallel to one another, even if there are thousands of them connected in a power system, and even in very small isolated power systems with extremely low system inertia. Discussions around the necessity of additional converter performance, usually under the 'grid-forming' and 'Virtual Synchronous Machines' concepts, have recently been transferred from the academic sphere to national and international industry fora. Formal discussions have started in Great Britain, in Germany and at ENTSO-E level. However, there is still a lot of uncertainty about the real and not simulated performance of grid-forming converters, whilst the needs case for requiring this radically different control method has not been adequately justified. With the present paper we raise key questions that will serve towards an objective discussion about power system needs, grid infeed technologies and their interaction.
暂无评论