The telecoms industry is a highly competitive sector which is constantly challenged by customer churn or attrition. In order to remain steadfast in the consumer business, companies need to have sophisticated churn man...
详细信息
ISBN:
(纸本)9781538680469
The telecoms industry is a highly competitive sector which is constantly challenged by customer churn or attrition. In order to remain steadfast in the consumer business, companies need to have sophisticated churn management strategies that will harness valuable data for business intelligence. data mining and machine learning are tools which can be used by telecoms companies to monitor the churn behaviour of customers. This study implemented exploratory data analysis and feature engineering in a public domain Telecoms dataset and applied seven (7) classification techniques namely, Naive Bayes, Generalized Linear Model, Logistic Regression, Deep Learning, Decision Tree, Random Forest, and Gradient Boosted Trees. The results are analyzed using different metrics such as Accuracy, Classification error, Precision, Recall, F1-score, and AUC. This study discussed how these results are essential in reducing customer churn and improving customer service. The results obtained in the experiment demonstrate that the best classifier is Gradient Boosted Trees. It outperforms the other classifiers in almost all evaluation metrics. Further, all classifiers showed remarkable improved performance after the oversampling method is applied.
data breaches represent a permanent threat to all types of organizations. Although the types of breaches are different, the impacts are always the same. This paper focuses on analyzing over 9000 data breaches made pub...
详细信息
data breaches represent a permanent threat to all types of organizations. Although the types of breaches are different, the impacts are always the same. This paper focuses on analyzing over 9000 data breaches made public since 2005 that led to the loss of 11,5 billion individual records which have a significant financial and technical impact. Also, since the most devastating breaches are hacking breaches, we shed the light on type to unveil the most targeted organizations, and examine how the interest of hackers changes over time. On the other hand, the breaches caused by human factor are decreasing which can be explained by the awareness of employees and the application of security standards. This study would improve the state of knowledge about hacking breaches and help in securing organizations' data by prioritizing the most attacked sectors. (C) 2019 The Authors. Published by Elsevier B.V.
This article revisits an old problem;"systematically explore the information contained in a set of operating data records and find from it how to improve operational performance by taking the appropriate decision...
详细信息
This article revisits an old problem;"systematically explore the information contained in a set of operating data records and find from it how to improve operational performance by taking the appropriate decisions in the space of operating conditions," thus leading to continuous process improvement. A series of industrial case studies within the framework of the internships in the Leaders for Manufacturing (LFM) program at Massachusetts Institute of Technology led us to a reexamination of the traditional formulations for the above problem. The resulting methodology is characterized by the following features: (1) problem statement and solutions are expressed in terms of hyperrectangles in the decision space, replacing conventional pointwise results;(2) data-driven, nonparametric learning methodologies were advanced to produce the requisite mapping between performance and decisions;(3) operating performance is in essence multifaceted, leading to a multiobjective problem, which is treated as such. The proposed methodology has been applied to a number of industrial examples and in this paper we provide a brief overview only of those that can be discussed in the open literature.
Land use and transportation interaction is a complex, dynamic process. Many models have been used to study this interaction process during the past several decades. Empirical studies suggest that land use and transpor...
详细信息
exploratory data analysis (EDA) is an important initial step for any knowledge discovery process, in which data scientists interactively explore unfamiliar datasets by issuing a sequence of analysis operations (e.g. f...
详细信息
ISBN:
(纸本)9781450367356
exploratory data analysis (EDA) is an important initial step for any knowledge discovery process, in which data scientists interactively explore unfamiliar datasets by issuing a sequence of analysis operations (e.g. filter, aggregation, and visualization). Since EDA is long known as a difficult task, requiring profound analytical skills, experience, and domain knowledge, a plethora of systems have been devised over the last decade in order to facilitate EDA. In particular, advancements in machine learning research have created exciting opportunities, not only for better facilitating EDA, but to fully automate the process. In this tutorial, we review recent lines of work for automating EDA. Starting from recommender systems for suggesting a single exploratory action, going through kNN-based classifiers and active-learning methods for predicting users' interestingness preferences, and finally to fully automating EDA using state-of-the-art methods such as deep reinforcement learning and sequence-to-sequence models. We conclude the tutorial with a discussion on the main challenges and open questions to be dealt with in order to ultimately reduce the manual effort required for EDA.
exploratory data analysis (EDA) is a crucial step in any data science project. However, existing Python libraries fall short in supporting data scientists to complete common EDA tasks for statistical modeling. Their A...
详细信息
ISBN:
(纸本)9781450383431
exploratory data analysis (EDA) is a crucial step in any data science project. However, existing Python libraries fall short in supporting data scientists to complete common EDA tasks for statistical modeling. Their API design is either too low level, which is optimized for plotting rather than EDA, or too high level, which is hard to specify more fine-grained EDA tasks. In response, we propose ***, a novel task-centric EDA system in Python. dataPrep.E DA allows data scientists to declaratively specify a wide range of EDA tasks in different granularity with a single function call. We identify a number of challenges to implement dataPrep.E DA, and propose effective solutions to improve the scalability, usability, customizability of the system. In particular, we discuss some lessons learned from using Dask to build the data processing pipelines for EDA tasks and describe our approaches to accelerate the pipelines. We conduct extensive experiments to compare *** with Pandas-profiling, the state-of-the-art EDA system in Python. The experiments show that *** significantly outperforms Pandas-profiling in terms of both speed and user experience. *** is open-sourced as an EDA component of dataPrep: littps://***/sfu-db/dataprep.
In exploratory data analysis, domain knowledge and experience play a central role in order to extract information from the data and to derive proof and knowledge. However, experienced domain experts are rarely the sam...
详细信息
ISBN:
(纸本)9781450376266
In exploratory data analysis, domain knowledge and experience play a central role in order to extract information from the data and to derive proof and knowledge. However, experienced domain experts are rarely the same people who carry out the data analyses. Therefore, utilizing domain expertise for guidance in analytic processes is a complex challenge. In recent years, machine learning has seen great advances. Increasing processing power and growth in data as well as affordable storage have led to more advanced algorithms. Therefore, with the emergence of applicable machine learning algorithms, there is now a method for preserving and making use even of complex knowledge. In this paper, we present a concept that allows to extract and utilize domain knowledge for exploratory data analysis. We introduce concepts of interaction store and analysis context store to record user interaction and context during an exploratoryanalysis. We use the recorded data to construct semantic interaction sequences and predict their potential insight. The prediction can then be used to guide other data scientist in their sensemaking while performing exploratory data analysis in similar domains and use cases. Furthermore, we discuss possible research opportunities and implications resulting from the presented concept.
exploratory data analysis is one of the key activities for understanding and discovering new insights from data. As exploratory data analysis can involve both open-ended exploration and focused question answering, ana...
详细信息
exploratory data analysis is one of the key activities for understanding and discovering new insights from data. As exploratory data analysis can involve both open-ended exploration and focused question answering, analysis tool should facilitate both exploration breadth and analysis depth. However, existing data exploration tools typically require manual chart specification, which can be tedious and prevent analysts from rapidly exploring different aspects of the data. Moreover, analysts may be blindsided by their own cognitive biases and prematurely fixate on specific questions or hypotheses. Without discipline and time, analysts may overlook important insights in the data, such as potentially confounding factors and data quality issues, and produce inaccurate results in their analyses. To help analyst perform rapid and systematic data exploration, this dissertation presents the design of mixed-initiative systems that complement manual chart specification with chart recommendation. To better understand the practice and challenges of exploratory data analysis, we first conduct an interview study with 18 data analysts. From the interview data, we characterize the goals, process, and challenges of exploratory data analysis. We then identify design opportunities for exploratoryanalysis tools. One major opportunity is facilitating rapid and systematic exploration with automation and guidance. The rest of the dissertation addresses this opportunity by contributing a stack of systems to augment exploratoryanalysis tools with chart recommendation. At the foundations of this stack, we introduce new formal languages for chart specification and recommendation. The Vega-Lite visualization grammar provides a formal representation for specifying and reasoning about charts. Building on Vega-Lite, the CompassQL query language combines partial chart specification with recommendation directives to provide a generalizable framework for chart recommendation via queries over the spac
The work leading to this paper has been designed around the fusion of two concepts: exploratory data analysis (EDA) and resel processing. Socio-economic data are collected in irregular spatial units that are here desc...
详细信息
The work leading to this paper has been designed around the fusion of two concepts: exploratory data analysis (EDA) and resel processing. Socio-economic data are collected in irregular spatial units that are here described as resels. An EDA system, RESEL, has been developed to demonstrate and apply methods that have been adapted from filtering processes on regular gridded data to an irregularly tessellated surface. Problems both in adapting filter algorithms and applying resel geometry were found and the spatial science developed is discussed at length. An EDA system was used so that patterns in enumeration district census data could be visually enhanced and, at the same time, the process of resel filtering itself explored. The results from resel filtering are described and ways in which they may provide a new methodology for visualisation and spatial dataanalysis are discussed.
In the past few years, augmented reality (AR) and virtual reality (VR) technologies have experienced terrific improvements in both accessibility and hardware capabilities, encouraging the application of these devices ...
详细信息
ISBN:
(纸本)9781450370011
In the past few years, augmented reality (AR) and virtual reality (VR) technologies have experienced terrific improvements in both accessibility and hardware capabilities, encouraging the application of these devices across various domains. While researchers have demonstrated the possible advantages of AR and VR for certain data science tasks, it is still unclear how these technologies would perform in the context of exploratory data analysis (EDA) at large. In particular, we believe it is important to better understand which level of immersion EDA would concretely benefit from, and to quantify the contribution of AR and VR with respect to standard analysis workflows. In this work, we leverage a dataspace reconfigurable hybrid reality environment to study how data scientists might perform EDA in a co-located, collaborative context. Specifically, we propose the design and implementation of Immersive Insights, a hybrid analytics system combining high-resolution displays, table projections, and augmented reality (AR) visualizations of the data. We conducted a two-part user study with twelve data scientists, in which we evaluated how different levels of data immersion affect the EDA process and compared the performance of Immersive Insights with a state-of-the-art, non-immersive dataanalysis system.
暂无评论