Transformers are becoming more and more used for solving various Natural Language Processing tasks. Recently, they have also been employed to process source code to analyze very large code-bases automatically. This pa...
详细信息
ISBN:
(纸本)9783030916077;9783030916084
Transformers are becoming more and more used for solving various Natural Language Processing tasks. Recently, they have also been employed to process source code to analyze very large code-bases automatically. This paper presents a custom-designed data analysis pipeline that can classify source code from competitive programming solutions. Our experiments show that the proposed models accurately determine the number of distinct solutions for a programming challenge task, even in an unsupervised setting. Together with our model, we also introduce a new dataset called AlgoSol-10 for this task that consists of ten programming problems together with all the source code submissions manually clustered by experts based on the algorithmic solution used to solve each problem. Taking into account the success of the approach on small source codes, we discuss the potential of further using transformers for the analysis of large code bases.
This article focuses on creating a virtual production line model using Wonderware's product ArchestrA. Creating a virtual model is an important part that is needed to control the physical production line. With opp...
详细信息
ISBN:
(纸本)9781538611227
This article focuses on creating a virtual production line model using Wonderware's product ArchestrA. Creating a virtual model is an important part that is needed to control the physical production line. With opportunities Wonderware provides, it is possible to create a virtual multi-level model consisting of individual zones, stations and equipment that stations contain, which will make the possibility to apply control strategies to verify the impact on the virtual model, and also test new control approaches based on the results we obtained.
This paper aims to the proposal of a method for acquiring data from industrial processes for further data mining analysis and for obtaining new knowledge from the process data. This paper covers initial stage of a pro...
详细信息
ISBN:
(纸本)9781538611227
This paper aims to the proposal of a method for acquiring data from industrial processes for further data mining analysis and for obtaining new knowledge from the process data. This paper covers initial stage of a project which is devoted to process data analysis for failure prediction and process optimization. This project will be in further stages evaluated in real automotive production company. In this paper we try to point out current state of data acquisition in automotive production companies and also the importance of collecting process data in order to meet Industry 4.0 requirements.
In learning-to-rank for information retrieval, a ranking model is usually learned using features that are extracted from the different fields of the documents and naively combined. However, such a conventional way to ...
详细信息
ISBN:
(纸本)9783030916077;9783030916084
In learning-to-rank for information retrieval, a ranking model is usually learned using features that are extracted from the different fields of the documents and naively combined. However, such a conventional way to learn a ranking model does not accurately reflect the utility and contribution of the fields and may also risk joining highly correlated features from different fields. It lacks an empirical analysis of how field-grouped features determine or influence the performance of the ranking models learned. In this paper, we classify features by the fields they are extracted from and investigate the role of using field-grouped features in the learning-to-rank method, particularly to see whether using field-grouped features leads to a different and better performance than using the naively combined feature list. Our experiments, on two large scale publicly available learning-to-rank benchmark datasets, show that ranking models learned using field-grouped features have competitive advantages over the models learned using a naively combined feature list, and that aggregation results of different fields present a better performance. These results suggest that learning ranking models using field-grouped features can be useful to obtain more effective performances in learning-to-rank methods.
Electricity has been acquiring a more significant presence in our lives, and it is estimated that the future will be increasingly electric. Nowadays, we have access to enormous amounts of data that do not have much-ad...
详细信息
ISBN:
(纸本)9783030916077;9783030916084
Electricity has been acquiring a more significant presence in our lives, and it is estimated that the future will be increasingly electric. Nowadays, we have access to enormous amounts of data that do not have much-added value if they cannot support decision-making or plan systems in advance and correctly. Forecasts are vital tools to support decision-making. We believe it is possible to resort to open data available on the Internet to make electricity price forecasts that - decision-makers can use in the sector. In this work, we study the multi-attribute hourly forecast of the electricity price in MIBEL (Iberian electricity market) for the 24 h of the following day, using open data. The realization of the multi-attribute predictions fell on the TIM ('Tangent Information Modeler') tool with AutoML ('Auto Machine learning') capabilities. The TOPSIS ('technique for order of preference by similarity to ideal solution') decision support technique was used to analyze the results.
High Dimensional data (HDD) is one of the biggest challenges in data Science arising from Big data. The application of dimensionality reduction techniques over HDD allows visualization and, thus, a better problem unde...
详细信息
ISBN:
(纸本)9783030916077;9783030916084
High Dimensional data (HDD) is one of the biggest challenges in data Science arising from Big data. The application of dimensionality reduction techniques over HDD allows visualization and, thus, a better problem understanding. In addition, these techniques also can enhance the performance of Machine learning (ML) algorithms while increasing the explanatory power. This paper presents an automatic method capable of obtaining an adequate representation of the data, given a previously trained ML model. Likewise, an automatic method is introduced to bring a Support Vector Machine (SVM) model based on an adequate representation of the data. Both methods provide an Explanaible Machine learning procedure. The proposal is tested on several data sets providing promising results. It significantly eases the visualization and understanding task to the data scientist when a ML model has already been trained, as well as the ML selection parameters when a reduced representation of data has been achieved.
data science is nowadays more and more used in many fields from industry to medicine. The main condition for application datamining methods is to have clean and consistent data sets. Often it is difficult to acquire t...
详细信息
ISBN:
(纸本)9781538611227
data science is nowadays more and more used in many fields from industry to medicine. The main condition for application datamining methods is to have clean and consistent data sets. Often it is difficult to acquire this kind of data straight out of the process (RAW data). These data can suffer from various errors. This paper aims to evaluate novel approach to dealing with one of these errors, which is missing values of data parameters in some data records. For the purpose of this article, we have evaluated the reliability of neural networks for interpolation of missing data medical dataset.
This paper presents the use of artificial intelligence for the control of a permanent magnet DC motor. The system can be found especially in robotic systems that perform repetitive operations. The control law is gener...
详细信息
ISBN:
(纸本)9781728186092
This paper presents the use of artificial intelligence for the control of a permanent magnet DC motor. The system can be found especially in robotic systems that perform repetitive operations. The control law is generated by an intelligent Reinforcement learning algorithm. From the numerous variants of this type of algorithm, the Policy Iteration type algorithm was chosen. The algorithm was experimentally implemented using a data acquisition system and Matlab/Simulink software. The control system was tested for several variants of the load (by changing its inertia moment). The simulation and experimental results show that the intelligent control method based on reinforcement learning has better trajectory tracking and vibrations suppression.
In floating offshore wind turbines (FOWT) not only the wind but the waves have a strong impact on the dynamic of the floating device, and thus on its efficiency. So, it is important to study the relationship between w...
详细信息
ISBN:
(纸本)9783030916077;9783030916084
In floating offshore wind turbines (FOWT) not only the wind but the waves have a strong impact on the dynamic of the floating device, and thus on its efficiency. So, it is important to study the relationship between wind speed and wave height. In this paper, data obtained from Casco Bay buoy (USA) in 2020 are analyzed. First, a deep learning model of the wind based on LSTM (Long Short-Term Memory) networks is developed, with inputs the mean wind speed and wind direction. To test the model, the last 876 h of the year were used. The model has been proved to be able of forecasting average hourly wind speed with good accuracy. In addition, the data of the average wave height for the same period of time and the same location are considered, and the correlation between both variables (wind and waves) is obtained. This study allows to better understand the behavior of environmental loads on floating wind turbines.
The aim of this work is to localize a query mobile photograph by utilizing surveillance images, which naturally provide location information. We cast this cross-device visual localization problem as a classification t...
详细信息
ISBN:
(纸本)9781479952083
The aim of this work is to localize a query mobile photograph by utilizing surveillance images, which naturally provide location information. We cast this cross-device visual localization problem as a classification task. By exploiting the surveillance network to collect reference images, the data acquisition process is significantly facilitated. However, the discrepancy between mobile images and surveillance images makes the training samples difficult to be used directly, and the scarcity of training samples caused by the immobility of surveillance cameras further degrades the performance. In contrast to most traditional domain adaptation problems and semi-supervised problems, the scarce labeled data and plentiful unlabeled data exist in different domains. Our location recognition method first exploits the unsupervised subspace alignment to weaken the discrepancy between the two domains, and then adopts the semi-supervised Laplacian SVM to reinforce the discriminant information utilizing the unlabeled mobile images. Experimental results show that our location recognition method significantly outperforms other related methods.
暂无评论