The Square Kilometre Array (SKA) is a next-generation radio astronomy-driven big data facility that will revolutionise our understanding of the Universe and the laws of fundamental physics, and needs innovative soluti...
详细信息
ISBN:
(纸本)9798350366235;9798350366242
The Square Kilometre Array (SKA) is a next-generation radio astronomy-driven big data facility that will revolutionise our understanding of the Universe and the laws of fundamental physics, and needs innovative solutions for efficient dataprocessing. The SKA Regional Centres Network (SRCNet) is a collaborative ecosystem tasked with the demanding role of processing and analyzing SKA data products. With SKA, the near-exascale computing will be a challenge, chief among them being the issue of data movement. As computational capabilities, the sheer volume of generated data becomes staggering. The traditional approach of moving tons of data to centralized computing resources becomes impractical due to the limitations of existing networks and storage infrastructures. The data transfer bottleneck becomes a critical impediment, hindering the overall efficiency. To overcome this challenge, a paradigm shift is imperative. Strategies such as in-situ processing and distributed computing models where computation is moved to the data emerge as promising solutions. In the realm of SKA and specifically within SRCNet dataprocessing needs, the conjunction of Function-as-a-Service (FaaS) with a decision-making entity driven by Evolutionary algorithms (EAs) becomes pivotal. FaaS abstracts away infrastructure management concerns, enabling the deployment of modular functions in close proximity to data sources. This development aligns with the principle of bringing computation to the data, mitigating the challenges associated with extensive data transfers. The decision-making entity, guided by EAs, facilitates a systematic exploration of near-optimal execution plans, that will provide with detailed information on how and where a function should be executed within the overall computing and data infrastructure. With the focus on two objectives such as execution time and energy consumption, and constraints like data transfers or data locations, Multi-Objective Evolutionary algorithms (MOEA
作者:
Cação, JoséAntunes, MárioSantos, JoséMonteiro, MiguelTEMA
Centro de Tecnologia Mecânica e Automação Departamento de Engenharia Mecânica Univerisdade de Aveiro Aveiro3810-193 Portugal LASI
Laboratório Associado de Sistemas Inteligentes Guimarães Portugal DETI
Departamento de Eletrónica Telecomunicações e Informática Universidade de Aveiro Campus de Santiago Aveiro3810-193 Portugal IT
Instituto de Telecomunicações Aveiro Aveiro3810-193 Portugal Bosch Termotecnologia S.A.
Cacia 3800-627 Portugal
The industrial landscape is undergoing a significant transformation marked by the integration of technology and manufacturing processes, giving rise to the concept of the Industrial Internet of Things (IIoT). IIoT is ...
详细信息
This proceedings contains 14 papers. The Proceedings of the VLDB Endowment (PVLDB) provides a high-quality publication service to the data management research community. This conference topics include the core of data...
This proceedings contains 14 papers. The Proceedings of the VLDB Endowment (PVLDB) provides a high-quality publication service to the data management research community. This conference topics include the core of data management including new approaches to cost and cardinality estimation (using a variety of learning methods), efficient query evaluation algorithms for single-node and distributed settings as well as advances to the optimization of such queries, new algorithms for graph processing, novel techniques for data cleaning, new approaches to improving replicated storage using programmable switches, and advances in query evaluation under differential privacy. It also extends to the geospatial community by developing new algorithms for route planning and data series, etc. The key terms of this proceedings include multi-vectorizing, big-data clusters, cardinality estimation, sparse vector, near-linear scalability, cost estimator, free gap information, noisy max mechanisms, network conflict detection, Denial constraints.
Time series data usually have complex dynamic characteristics, and it is difficult for a single prediction model to fully capture the various patterns contained. To alleviate the issue, different models obtaining diff...
详细信息
Graphs are widely used to represent complex information and signal domains with irregular support. Typically, the underlying graph topology is unknown and must be estimated from the available data. Common approaches a...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Graphs are widely used to represent complex information and signal domains with irregular support. Typically, the underlying graph topology is unknown and must be estimated from the available data. Common approaches assume pairwise node interactions and infer the graph topology based on this premise. In contrast, our novel method not only unveils the graph topology but also identifies three-node interactions, referred to in the literature as second-order simplicial complexes (SCs). We model signals using a graph autoregressive Volterra framework, enhancing it with structured graph Volterra kernels to learn SCs. We propose a mathematical formulation for graph and SC inference, solving it through convex optimization involving group norms and mask matrices. Experimental results on synthetic and real-world data showcase a superior performance for our approach compared to existing methods.
In data center (DC) environments, machine learning algorithms play an important role in resource management to increase efficiency by means of proper predictive monitoring workload trends and adjusting jobs accordingl...
详细信息
In data center (DC) environments, machine learning algorithms play an important role in resource management to increase efficiency by means of proper predictive monitoring workload trends and adjusting jobs accordingly. In this paper, we propose a system to predict the CPU usage of virtual machines (VMs) of a DC. Our proposal performs clustering of VMs based on their historical information (i.e., time series) by evaluating several traditional ML algorithms using common statistical features of VM time series, which facilitates grouping VMs with similar behaviors and establishing clusters based on these features. Then, training of representative models is performed to finally choose the one with the lowest mean error per cluster. The simulation results show that by performing clustering and training the model with representative time series, it is indeed possible to obtain a low mean error while reducing the local training time per individual VM. (c) 2020 The Authors. Published by Elsevier B.V.
This paper explores the analysis and visualization of stock data based on LSTM neural networks. Taking Ping An Bank's stock data from January 1, 2020, to April 30, 2024, as a case study. Through data acquisition, ...
详细信息
The proceedings contain 115 papers. The topics discussed include: neuronal structure segmentation in drosophila first instar larva ventral nerve cord using u-net convolution network;the identification of transiting ex...
ISBN:
(纸本)9781450376457
The proceedings contain 115 papers. The topics discussed include: neuronal structure segmentation in drosophila first instar larva ventral nerve cord using u-net convolution network;the identification of transiting exoplanet candidates based on convolutional neural network;the application of machine learning in chess endgames prediction;a research on channel estimation algorithms for OFDM system;hot topics of big data research in china;research on marketing promotion of Wuhan east lake scenic area from the perspective of micro-communication;sentiment analysis of e-commerce customer reviews based on natural language processing;research on the application of cloud accounting in small and medium-sized enterprises under the background of big data;an empirical analysis of total retail sales of consumer goods based on holt-winters;and strength and water stability characteristics of loess modified by consolidation agent.
Motivated by the real-world application of traffic classification at the network edge, we study the problem of robust decentralized online learning against malicious data generators that can manipulate their data feat...
详细信息
ISBN:
(纸本)9798350300529
Motivated by the real-world application of traffic classification at the network edge, we study the problem of robust decentralized online learning against malicious data generators that can manipulate their data features with an aim to gain preferred classification outcomes. Multiple agents cooperatively learn classification models to make online decisions. They periodically exchange their models, e.g., traffic classification models, between neighbors in a decentralized network and update local model parameters on the fly based on the models they have access to and feedback on the observed local data samples that are dynamically delayed. In this work, we propose two decentralized online learning algorithms, RDOC-O and RDOC-C, respectively against ordinary malicious and clairvoyant malicious data generators. Our theoretical performance analysis shows that the two algorithms have provable sub-linear individual regret bounds under mild conditions. To validate our analysis, extensive performance evaluations are conducted in the application of network traffic classification using two real-world data traces. Our results show that the two proposed algorithms compare favorably with an optimal offline classification model in the presence of malicious data generators, and they can achieve a steady-state F-1 score of around 0.85, which validates their effectiveness and makes them appealing in practice.
The proceedings contain 41 papers. The topics discussed include: prediction and mapping of air pollution in Bandung using generalized space time autoregressive and simple kriging;neural network modeling on wave dissip...
ISBN:
(纸本)9781728182353
The proceedings contain 41 papers. The topics discussed include: prediction and mapping of air pollution in Bandung using generalized space time autoregressive and simple kriging;neural network modeling on wave dissipation due to mangrove forest;modeling traffic flow on Buah Batu exit toll gate using cellular automata;exploration-exploitation balanced krill herd algorithm for thesis examination timetabling;principal component analysis to determine main factors stock price of consumer goods industry;analysis of adversarial attacks on skin cancer recognition;twitter sentiment analysis: case study on the revision of the Indonesia’s corruption eradication commission (KPK) law 2019;instantaneous height of sea surface: a comparison between local field observation and the simulated level from global models;and big data analytics for processing real-time unstructured data from CCTV in traffic management.
暂无评论