The H1N1 Flu that came into existence in 2009 had a great impact on the lives of people around the world. It was a life-threatening season to hundreds of people mainly below 65 years old which eventually made the Worl...
详细信息
Building production ML applications is difficult because of their resource cost and complex failure modes. I will discuss these challenges from two perspectives: the Stanford DAWN Lab and experience with large-scale c...
详细信息
With the large-scale growth of data, traditional single-machine data processing methods are difficult to deal with massive data, especially iterative clustering algorithms that require frequent reading and writing ope...
详细信息
ISBN:
(纸本)9781450366007
With the large-scale growth of data, traditional single-machine data processing methods are difficult to deal with massive data, especially iterative clustering algorithms that require frequent reading and writing operations. On the basis of Spark framework, this paper proposes a distributed possibilistic c-means algorithm based on memory computing, called Spark-PCM. The proposed method improves the related processing of distributed matrix operation and is implemented on the Spark platform. Experimental results show that the proposed Spark-PCM algorithm runs in a linear relationship with the number of nodes and has a good scalability, which indicates that it has higher scalability and adaptability to large-scale data.
A new algorithm for automatic tomato detection in regular color images is proposed, which can reduce the influence of illumination, color similarity as well as suppress the effect of occlusion. The method uses a Suppo...
详细信息
ISBN:
(纸本)9781450366007
A new algorithm for automatic tomato detection in regular color images is proposed, which can reduce the influence of illumination, color similarity as well as suppress the effect of occlusion. The method uses a Support Vector machine (SVM) with Histograms of Oriented Gradients (HOG) to detect the tomatoes, followed by a color analysis method for false positive removal. And the Non-Maximum Suppression Method (NMS) is employed to merge the detection results. Finally, a total of 144 images were used for the experiment. The results showed that the recall and precision of the classifier were 96.67% and 98.64% on the test set. Compared with other methods developed in recent years, the proposed algorithm shows substantial improvement for tomato detection.
The combination of data mining and machinelearning technology with web-based education system is becoming an imperative research area to enhance the quality of education beyond the traditional concept. With the world...
详细信息
ISBN:
(纸本)9781728128160
The combination of data mining and machinelearning technology with web-based education system is becoming an imperative research area to enhance the quality of education beyond the traditional concept. With the worldwide fast growth of the Information Communication Technology (ICT), data come with significant large volume, high velocity and extensive variety. In this paper, four popular data mining methods are applied on Apache Spark using large volume of datasets from Online Cognitive learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The output of the paper convincingly presents useful strategies of computing resource allocation and tuning to make full advantage of the in-memory system of Apache Spark with the tasks of data mining and machinelearning on educational datasets.
Personality analysis on social media is a research hotspot due to the importance of personality research in psychology as well as the rapid development of social media. Many studies have used social media status to an...
详细信息
ISBN:
(纸本)9781450366007
Personality analysis on social media is a research hotspot due to the importance of personality research in psychology as well as the rapid development of social media. Many studies have used social media status to analyze user's personality, but most of them are conducted on inadequate label data and linguistic features. In this paper, to explore the usage of unlabeled data on personality analysis, a personality analysis framework based on semi supervised learning is introduced. Besides, for making full use of the language information in social media status, the well-known n-gram model is adopted to extract linguistic features. The experimental results demonstrate the semi-supervised learning can take advantage of unlabeled data and improve the accuracy of prediction model.
Heart Failure (HF) has been proven one of the leading causes of death that is why an accurate and timely prediction of HF risks is extremely essential. Clinical methods, for instance, angiography is the best and most ...
详细信息
ISBN:
(纸本)9781450366007
Heart Failure (HF) has been proven one of the leading causes of death that is why an accurate and timely prediction of HF risks is extremely essential. Clinical methods, for instance, angiography is the best and most effective way of diagnosing HF, however, studies show that it is not only costly but has side effects as well. Lately, machinelearning techniques have been used for the stated purpose. This survey paper aims to present a systematic literature review based on 35 journal articles published since 2012, where state of the art machinelearning classification techniques have been implemented on heart disease datasets. This study critically analyzes the selected papers and finds gaps in the existing literature and is assistive for researchers who intend to apply machinelearning in medical domains, particularly on heart disease datasets. The survey finds out that the most popular classification techniques are Support Vector machine, Neural Networks, and ensemble classifiers.
This paper discusses our work on discovering a set of emotional logic rules, derived from physiological data of individuals from a wearable technology perspective. We concentrated the analysis on physiological data su...
详细信息
ISBN:
(纸本)9781728128160
This paper discusses our work on discovering a set of emotional logic rules, derived from physiological data of individuals from a wearable technology perspective. We concentrated the analysis on physiological data such asplethysmography, respiration, galvanic skin response, and temperature that can be detected by wearable sensors. We sourced our data from the DEAP dataset, which is a popular labelled Affective computing dataset. Our approach implemented a fusion of preprocessing and data mining techniques, to discover logic rules relating to the valence and arousal emotional dimensions. Our findings indicate that while there are similar changes in heart rates or galvanic skin response across individuals during emotional stimuli, every individual has a unique and quantifiable physiological reaction.
Objective: Unprecedented social and economic health patterns change or when a new normal arises, people prefer to seek safety and wealth preservation. Crypto is the buzzword nowadays where people worldwide are investi...
详细信息
machinelearning has achieved outstanding performance in many fields, but its success heavily relies on the large number of annotated training samples. However, for many professional fields, data annotation is not onl...
详细信息
ISBN:
(纸本)9781450366007
machinelearning has achieved outstanding performance in many fields, but its success heavily relies on the large number of annotated training samples. However, for many professional fields, data annotation is not only tedious and time consuming, but also demanding specialty-oriented knowledge and skills, which are not easily accessible. To significantly reduce the cost of annotation, we propose a novel active learning framework called ALBS. ALBS uses the syncretic strategy which incorporates "most discriminative" and "most representative" to seek "worthy" samples from unlabeled dataset and update the model incrementally to enhance the performance continuously. We have evaluated our method on two different audio datasets, demonstrating that the syncretic strategy can makes the promotion of model model's performance more robust and faster than the other strategies, and subsampling the historical labeled dataset can reduce unnecessary computing costs and storage space.
暂无评论