This pattern was originally designed to classify sequences of events in log files by error-proneness. Sequences of events trace application use in real contexts. As such, identifying error-prone sequences helps unders...
详细信息
ISBN:
(纸本)9781467362962
This pattern was originally designed to classify sequences of events in log files by error-proneness. Sequences of events trace application use in real contexts. As such, identifying error-prone sequences helps understand and predict application use. The classification problem we describe is typical in supervised machinelearning, but the composite pattern we propose investigates it with several techniques to control for data brittleness. data pre-processing, feature selection, parametric classification, and cross-validation are the major instruments that enable a good degree of control over this classification problem. In particular, the pattern includes a solution for typical problems that occurs when data comes from several samples of different populations and with different degree of sparcity.
We present the Source Code statistical Language Model data analysis pattern. statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine tran...
详细信息
ISBN:
(纸本)9781467362962
We present the Source Code statistical Language Model data analysis pattern. statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine translation, and document summarization (to name a few) all rely on statistical language models to assign probability estimates to natural language utterances or sentences. In this data analysis pattern, we describe the process of building n-gram language models over software source files. We hope that by introducing the empirical software engineering community to best practices that have been established over the years in research for natural languages, statistical language models can become a tool that SE researchers are able to use to explore new research directions.
Software quality attributes can be identified based on software features such as security, reliability and user-friendliness. This process can be done either manually or automatically. Sentiment analysis refers to the...
详细信息
ISBN:
(纸本)9781467362719
Software quality attributes can be identified based on software features such as security, reliability and user-friendliness. This process can be done either manually or automatically. Sentiment analysis refers to the sentiment extraction task from resources such as natural language texts. We study the application of sentiment analysis on extracting the quality attributes of a software product based on the opinions of end-users that have been stated in microblogs such as Twitter. Our findings obtain advantageous techniques such as document frequency of words in a large number of tweets. The extracted results can help software developers know the advantages and disadvantages of their products.
data scientists in software engineering seek insight in data collected from software projects to improve software development. The demand for data scientists with domain knowledge in software development is growing ra...
详细信息
ISBN:
(纸本)9781467330763
data scientists in software engineering seek insight in data collected from software projects to improve software development. The demand for data scientists with domain knowledge in software development is growing rapidly and there is already a shortage of such data scientists. data science is a skilled art with a steep learning curve. To shorten that learning curve, this workshop will collect best practices in form of data analysis patterns, that is, analyses of data that leads to meaningful conclusions and can be reused for comparable data. In the workshop we compiled a catalog of such patterns that will help experienced data scientists to better communicate about data analysis. The workshop was targeted at experienced data scientists and researchers and anyone interested in how to analyze data correctly and efficiently in a community accepted way.
The proceedings contain 9 papers. The topics discussed include: why do sports officials dropout?;strategic patterns discovery in RTS-games for e-sport with sequential patternmining;maps for reasoning in ultimate;pred...
The proceedings contain 9 papers. The topics discussed include: why do sports officials dropout?;strategic patterns discovery in RTS-games for e-sport with sequential patternmining;maps for reasoning in ultimate;predicting the NFL using Twitter;use of performance metrics to forecast success in the national hockey league;finding similar movements in positional datastreams;comparison of machinelearning methods for predicting the recovery time of professional football players after an undiagnosed injury;predicting NCAAB match outcomes using ML techniques – some results and lessons learned;and key point selection and clustering of swimmer coordination through sparse Fisher-EM.
The proceedings contain 8 papers. The topics discussed include: a hybrid grid-based method for mining arbitrary regions-of-interest from trajectories;ensemble feature ranking for shellfish farm closure cause identific...
ISBN:
(纸本)9781450323697
The proceedings contain 8 papers. The topics discussed include: a hybrid grid-based method for mining arbitrary regions-of-interest from trajectories;ensemble feature ranking for shellfish farm closure cause identification;clustering household electricity use profiles;predicting petroleum reservoir properties from downhole sensor data using an ensemble model of neural networks;light-weight online predictive data aggregation for wireless sensor networks;and performance analysis of duty-cycling wireless sensor networks for train localization.
The proceedings contain 9 papers. The topics discussed include: the power of the data: opportunities and challenges in big and personal datamining;situation fencing: making geo-fencing personal and dynamic;crowds, Bl...
ISBN:
(纸本)9781450323970
The proceedings contain 9 papers. The topics discussed include: the power of the data: opportunities and challenges in big and personal datamining;situation fencing: making geo-fencing personal and dynamic;crowds, Bluetooth, and rock'n'roll: understanding music festival participant behavior;building health persona from personal datastreams;a mobile personal informatics system with interactive visualizations of mobility and social interactions;an evaluation of wearable activity monitoring devices;combining crowd-generated media and personal data: semi-supervised learning for context recognition;the influence of social norms on synchronous versus asynchronous communication technologies;and 'whaT's in it for me?' how can big multimedia aid quantified-self applications.
In the nonexistence of medical diagnosis substantiations, it is complicated for the expertto speak out about the grade of disease with affirmation. Generally many tests are done that involve clustering or classificati...
详细信息
ISBN:
(纸本)9781479909353;9781467361293
In the nonexistence of medical diagnosis substantiations, it is complicated for the expertto speak out about the grade of disease with affirmation. Generally many tests are done that involve clustering or classification of large scale data However many tests could complicate the main diagnosis process and lead to the difficulty in obtaining the end results, particularly in the case where many tests are performed This kind of difficulty could be resolved with the aid of machinelearning techniques. In this paper survey on three different disease diagnosis are taken in to the consideration. The heart Disease, Breast Cancer Disease and the Diabetes Disease are analyzed and observed with existing works. This survey paper reveals various existing approaches that have processed for diagnosis these diseases using datamining techniques.
Electronic sport, or e-sport, denotes the extreme practice of video games where so-called cyber-athletes compete in world-wide tournaments. As for any sport, such professionals are surrounded by sponsors and practice ...
详细信息
Electronic sport, or e-sport, denotes the extreme practice of video games where so-called cyber-athletes compete in world-wide tournaments. As for any sport, such professionals are surrounded by sponsors and practice within professional teams. These professional games are even broadcast by commentators over specialized TV channels. starCraft II (Blizzard Entertainment) is one of the most competitive video game and has now its own world-wide players ranking system (based on the well-known ELO system) and annual world cup competition series (WCS) with a US$1.6 millions prize pool for the year 2013. Each match between two opponents can be recorded as an exhaustive and noisy sequence of actions. Analyzing these sequences yields an important outcome for game strategy prediction and in-depth game understanding. In this work we report a preliminary study on starCraft II professional players strategies' discovery based on sequential patternmining.
暂无评论