Despite exploratory data analysis (EDA) is a powerful approach for uncovering insights from unfamiliar datasets, existing EDA tools face challenges in assisting users to assess the progress of exploration and synthesi...
详细信息
Despite exploratory data analysis (EDA) is a powerful approach for uncovering insights from unfamiliar datasets, existing EDA tools face challenges in assisting users to assess the progress of exploration and synthesize coherent insights from isolated findings. To address these challenges, we present FactExplorer, a novel fact-based EDA system that shifts the analysis focus from raw data to data facts. FactExplorer employs a hybrid logical-visual representation, providing users with a comprehensive overview of all potential facts at the outset of their exploration. Moreover, FactExplorer introduces fact-mining techniques, including topic-based drill-down and transition path search capabilities. These features facilitate in-depth analysis of facts and enhance the understanding of interconnections between specific facts. Finally, we present a usage scenario and conduct a user study to assess the effectiveness of FactExplorer. The results indicate that FactExplorer facilitates the understanding of isolated findings and enables users to steer a thorough and effective EDA.
Online sales forecasting has become an essential aspect of effective business planning in the digital era. The widespread adoption of digital transformation has enabled companies to collect substantial datasets relate...
详细信息
Online sales forecasting has become an essential aspect of effective business planning in the digital era. The widespread adoption of digital transformation has enabled companies to collect substantial datasets related to consumer behavior, market trends, and sales drivers. This study attempts to uncover patterns and predict sales growth by utilizing product images and their associated filenames as input. To achieve this, we use EDA combined with LSTM and Gated Recurrent Unit (GRU), which excel in processing sequential data. However, the performance of these networks is significantly affected by the quality of data and the preprocessing methods applied. This study highlights the importance of exploratory data analysis (EDA) and Ensemble Methods in enhancing the efficacy of RNNs for online sales forecasting. EDA plays a crucial role in identifying significant patterns such as trends, seasonality, and autocorrelation while addressing data irregularities such as missing values and outliers. These findings show that integrating EDA substantially improves the performance metrics of RNN, as indicated by the reduction in loss and mean absolute error (MAE) values across training epochs (e.g. loss: 0.0720, MAE: 0.1918 at epoch 15). These results indicate that EDA improves the accuracy, stability, and efficiency of the model, allowing RNN to provide more reliable sales predictions while minimizing the risk of overfitting.
Automated biomedical dataanalysis tools are crucial in research and clinical practice;however, they are not always accessible to everyone. This paper introduces a web-based system that facilitates exploratorydata an...
详细信息
Automated biomedical dataanalysis tools are crucial in research and clinical practice;however, they are not always accessible to everyone. This paper introduces a web-based system that facilitates exploratory data analysis and machine learning, focusing on identifying audio and video data patterns. This system applies to various biomedical contexts, such as the study of Parkinson's disease. Developed using Python and the Streamlit framework, it offers an intuitive interface for dataanalysis, visualization, and automated classification. Its flexibility makes it a valuable resource for researchers and healthcare professionals, enabling meaningful insights and fostering advancements in biomedical research.
The aim of this paper is to provide a prescriptive framework for exploratory data analysis (EDA) in quality-improvement projects. The framework is developed on the basis of a large number of real-life applications. Th...
详细信息
The aim of this paper is to provide a prescriptive framework for exploratory data analysis (EDA) in quality-improvement projects. The framework is developed on the basis of a large number of real-life applications. The three steps of EDA are described: display the data, identify salient features, and interpret salient features. Graphical display of data, Shewhart's assignable causes, the maximum entropy principle, abduction, and explanatory coherence all are part of the resulting framework. Furthermore, the roles of probabilistic reasoning and automatic statistical procedures in EDA are discussed.
We tested the detection properties of four MOX sensors toward different ozone mixtures to identify sets of sensing layers and interfering compounds concentrations most suitable for a reliable detection of ozone. The m...
详细信息
We tested the detection properties of four MOX sensors toward different ozone mixtures to identify sets of sensing layers and interfering compounds concentrations most suitable for a reliable detection of ozone. The measurement campaign lasted I year divided in four sessions. We collected a substantial amount of measurements (more than 500) with diverse interfering gases: ammonia, ethanol, ethylene, carbon monoxide and humidity. Due to the dimension of the data set it could not be analyzed using the conventional methods generally applied for characterizing gas sensors: evaluating the sensor performance by visual inspection of the sensors responses is unfeasible. For this reason we systematically applied the exploratory data analysis methodology. We used some simple but effective statistical techniques to insight the data. This approach allows us to draw sound conclusions about the causes of variation in the data, e.g. time (sensors' long-term stability) or interfering effects of different chemical compounds. All the analysis techniques employed in this work are implemented in a software package developed at our laboratory. We concluded that the two best stable and sensitive sensors are based on WO3 and SnO2 (An catalyzed). We ranked the contributions of different gases on sensor responses, deducing that out sensors are suitable to detect steps of 50 ppb of ozone when ethylene is less than 10 ppm. Carbon monoxide does not affect the measurements still, the strongest interfering compound is humidity that needs to be controlled or parallely measured also in a preliminary stage. (c) 2008 Elsevier B.V. All rights reserved.
A factor analysis was applied to soil geochemical data to define anomalies related to buried Pb-Zn mineralization.A favorable main factor with a strong association of the elements Zn,Cu and Pb,related to mineralizatio...
详细信息
A factor analysis was applied to soil geochemical data to define anomalies related to buried Pb-Zn mineralization.A favorable main factor with a strong association of the elements Zn,Cu and Pb,related to mineralization,was selected for *** median+2 MAD(median absolute deviation)method of exploratory data analysis(EDA)and C-A(concentration-area)fractal modeling were then applied to the Mahalanobis distance,as defined by Zn,Cu and Pb from the factor analysis to set the thresholds for defining multi-element *** a result,the median+2 MAD method more successfully identified the Pb-Zn mineralization than the C-A fractal *** soil anomaly identified by the median+2 MAD method on the Mahalanobis distances defined by three principal elements(Zn,Cu and Pb)rather than thirteen elements(Co,Zn,Cu,V,Mo,Ni,Cr,Mn,Pb,Ba,Sr,Zr and Ti)was the more favorable reflection of the ore *** identified soil geochemical anomalies were compared with the in situ economic Pb-Zn ore bodies for *** results showed that the median+2 MAD approach is capable of mapping both strong and weak geochemical anomalies related to buried Pb-Zn mineralization,which is therefore useful at the reconnaissance drilling stage.
The electrical energy demand (EED) in Greece for the time period 2002-2016 is investigated. The aim of the study is to introduce a framework for the exploratory data analysis (EDA) of the EED in the time domain. To th...
详细信息
The electrical energy demand (EED) in Greece for the time period 2002-2016 is investigated. The aim of the study is to introduce a framework for the exploratory data analysis (EDA) of the EED in the time domain. To this end, the EED at the hourly, daily, seasonal and annual time scale along with the mean daily temperature and the Gross Domestic Product (GDP) of Greece are visualized. The forecast of the EED provided by the Greek Independent Power Transmission Operator (IPTO) is also visualized and is compared with the actual EED. Furthermore, the EED pricing system is visualized. The results of the study in general confirm and summarize the conclusions of previous relevant studies in Greece, each one treating a single topic and covering shorter and earlier time periods.. Furthermore, some unexpected patterns are observed, which if not considered carefully could result to dubious models. Therefore, it is shown that the EDA of the EED in the time domain coupled with weather-, climate-related and socioeconomic variables is essential for the building of a model for the short-, medium- and long-term EED forecasting, something not highlighted in the literature. (C) 2017 Elsevier Ltd. All rights reserved.
In an intercontinental container liner service, container shipping operators reserve container slots for the customers who book capacity for their cargoes a few weeks before the ship depart from a particular port. In ...
详细信息
In an intercontinental container liner service, container shipping operators reserve container slots for the customers who book capacity for their cargoes a few weeks before the ship depart from a particular port. In practice, some of these bookings are finally cancelled without loading any containers onto the ships, which leads to a low loading rate and revenue loss. However, deep analyses of container slot booking cancellation in container liner services rarely appear in the literature due to the lack of real data. The characteristics and patterns of the container slot booking cancellation are unclear from both academic and managerial perspectives. To fill this gap, this study first proposes a conceptual model for the container slot booking cancellation analysis in intercontinental shipping services. A case study on a container liner service between Asia and US west coast is then conducted based on the proposed model. The static and dynamic cancellation rates of voyage, the attributes of bookings, and the factors that may influence the cancellation behaviours are inspected and discussed. The primary findings of this study will benefit both the academic research on container shipping slot bookings and the practices of the slot cancellation control in container shipping company.
The catalog of moment tensor solutions, which contains the information on spatial location, origin time and fault mechanism type of earthquakes, has been interpreted by researchers based on a wealth of past knowledge....
详细信息
The catalog of moment tensor solutions, which contains the information on spatial location, origin time and fault mechanism type of earthquakes, has been interpreted by researchers based on a wealth of past knowledge. However, the long-term routine analyses have led to the accumulation of a huge amount of data in the moment tensor catalog, and it is worth considering moving away from the artisanal approach. In this study, using dimensionality reduction of unsupervised machine learning, we performed exploratory data analysis of the moment tensor catalog in Japan to objectively obtain comprehensive images of seismic activity and to acquire knowledge on the spatial and temporal characteristics of the earthquake mechanism. Source parameters of the moment tensor catalog in Japan, spatial location (latitude, longitude, and focal depth) and source-mechanism diagram information were embedded in two-dimensional space via a non-linear graph-based dimensionality-reduction method, Uniform Manifold Approximation and Projection. On the embedding map, earthquakes in eastern and western Japan are distributed separately and are further embedded to reflect their characteristic fault mechanism and focal depth in each region. The similarity degree of the earthquakes can be obtained as the distance on the embedding map. This study demonstrates that the data visualization using dimensionality reduction is useful for intuitively and objectively understanding the regional characteristics of earthquake mechanisms. The embedding map can also be employed to visualize temporal changes in regional seismic activity and to perform a similarity search with a past event.
Hatu and Baogutu are two typical gold deposits in the study area. Hatu gold deposit is associated with magmatism and controlled by regional-scale faults;mineralisation mainly occurs within hydrothermally altered felsi...
详细信息
Hatu and Baogutu are two typical gold deposits in the study area. Hatu gold deposit is associated with magmatism and controlled by regional-scale faults;mineralisation mainly occurs within hydrothermally altered felsic rocks and quartz veins. In the west region of the Hatu mining area, Cu, Ag, As and Sb are present in high concentrations in carbon tuffaceous shale. Baogutu gold deposit is associated with the evolution of felsic magmas, and the porphyry copper-gold mineralisation and copper-gold ore body dominated by sulphide were formed in the rock or near the contract zone in the faults, respectively. The ore-forming elements include Au, As and Sb. In this study, exploratory data analysis (EDA) and singularity mapping (SM) techniques were applied to identify geochemical anomalies caused by Au-related mineralisation according to stream sediment geochemical data set in Karamay mineral district, northwestern China. Silver, As, Au and Sb were chosen as indicator elements. The results show that EDA could not well identify weak anomalies within the strong variance of the background, while SM can recognise effectively weak anomalies, and quantify the properties of enrichment caused by mineralisation. The results obtained by SM demonstrated that the anomalies are closely associated with the known Au deposits in the study area. The anomalous areas delineated by the SM have potential for follow-up mineral exploration. In addition, the results document that Ag, As, Au and Sb may be reliable indicator elements for Au-related mineralisation in the study area. (C) 2014 Elsevier B.V. All rights reserved.
暂无评论