检索结果-内蒙古大学图书馆

Flow graphs as data structures for inducing classifiers

International Journal of Hybrid Intelligent Systems 2019年第2期15卷 77-90页

作者： Nicoletti, Maria Do Carmo Rodrigues, Emilio Carlos Centro Universitário C. Limpo Paulista C.L. Paulista SP Brazil Universidade Federal de S. Carlos S. Carlos SP Brazil Instituto Federal de São Paulo Bragança Paulista SP Brazil

This paper describes an empirical research work based on the use of a suitable data structure, named Flow Graph (FG), that can be induced from a supervised training data set. A FG can be approached as a weighted and labeled digraph that summarizes a given supervised training set, aiming at its analysis. FGs can also be used as a repository of the information embedded in training sets, that supports the extraction of classification rules, aiming at the definition of classifiers. The work described in this paper reviews FGs and related concepts, as originally proposed i.e., a suitable structure for modeling discrete data, and proposes its customization for dealing with continuous data. The customization consists of a pre-processing step where a discretization process is carried out in a two-step hybrid approach named HFG (Hybrid Flow Graph). Several experiments with focus on the classifiers extracted from HFGs were conducted and their results were analyzed with focus on both, the value of some metrics associated with the induced digraph-based structure as well as the performance of the classifier extracted from the structure. For the experiments 19 diversified datasets were used and the classification results were comparatively analyzed with those obtained by classifiers induced using four other algorithms namely, J48, Naïve Bayes, k-Nearest-Neighbor and Support Vector machine. © 2019 - IOS Press and the authors. All rights reserved.

关键词： data discretization data structures extended flow graphs Flow graphs hybrid systems supervised machine learning algorithms

来源：评论

学校读者我要写书评

暂无评论

Breast Cancer Prediction and Detection Using Data Mining Classification algorithms: A Comparative Study

引用

TEHNICKI VJESNIK-TECHNICAL GAZETTE 2019年第1期26卷 149-155页

作者： Kaya Keles, Mumine Adana Sci & Technol Univ Dept Comp Engn Balcali MahallesiCatalan Caddesi 201-1 TR-01250 Saricam Adana Turkey

Today, cancer has become a common disease that can afflict the life of one of every three people. Breast cancer is also one of the cancer types for which early diagnosis and detection is especially important. The earlier breast cancer is detected, the higher the chances of the patient being treated. Therefore, many early detection or prediction methods are being investigated and used in the fight against breast cancer. In this paper, the aim was to predict and detect breast cancer early with non-invasive and painless methods that use data mining algorithms. All the data mining classification algorithms in Weka were run and compared against a data set obtained from the measurements of an antenna consisting of frequency bandwidth, dielectric constant of the antenna's substrate, electric field and tumor information for breast cancer detection and prediction. Results indicate that Bagging, IBk, Random Committee, Random Forest, and SimpleCART algorithms were the most successful algorithms, with over 90% accuracy in detection. This comparative study of several classification algorithms for breast cancer diagnosis using a data set from the measurements of an antenna with a 10-fold cross-validation method provided a perspective into the data mining methods' ability of relative prediction. From data obtained in this study it can be said that if a patient has a breast cancer tumor, detection of the tumor is possible.

关键词： breast cancer classification data mining detection and prediction of tumor supervised machine learning algorithms

来源：评论

学校读者我要写书评

暂无评论

Identification of Fake Reviews Using Semantic and Behavioral Features 4

Identification of Fake Reviews Using Semantic and Behavioral...

引用

4th International Conference on Information Management (ICIM)

作者： Wang, Xinyue Zhang, Xianguo Jiang, Chengzhi Liu, Haihang Inner Mongolia Univ Coll Comp Sci Hohhot Peoples R China

ISBN: (纸本)9781538661475

In recent years, online reviews have been playing an important role in making purchase decisions. This is because, these reviews can provide customers with large amounts of useful information about the goods or service. However, to promote factitiously or lower the quality of the products or services, spammers may forge and produce fake reviews. Due to such behavior of the spammers, customers would be misleaded and make wrong decisions. Thus detecting fake (spam) reviews is a significant problem. In this paper, we propose two types of features and apply supervised machine learning algorithms for performing classification on Yelp's real-life data. In terms of features used, there are two new semantic feature sets: readability features and topic features. Our results show that our proposed new features are more effective than n-gram features in detecting spam reviews. To improve classification on the real Yelp review data, we use a set of behavioral features about reviewers and their reviews for learning, which dramatically improves the classification result on real-life opinion spam data. For further improvement, we also ensure the number of reviewers instead of reviews is balanced.

关键词： fake reviews semantic and behavioral features supervised machine learning algorithms

来源：评论

学校读者我要写书评

暂无评论

Quality-Efficiency Trade-offs in machine learning for Text Processing 5

Quality-Efficiency Trade-offs in Machine Learning for Text P...

引用

IEEE International Conference on Big Data (IEEE Big Data)

作者： Baeza-Yates, Ricardo Liaghat, Zeinab Univ Pompeu Fabra DTIC Web Sci & Social Comp Grp Barcelona Catalonia Spain

ISBN: (纸本)9781538627150

As the amount of available digital documents keeps growing rapidly, extracting useful information from them has become a major challenge. Data mining, natural language processing, and machine learning are powerful techniques that can be used together to deal with this problem. Depending on the task at hand, there are many different approaches that can be used. The methods available are continuously improved, but not all of them have been tested and compared in a set of coherent problems using supervised machine learning algorithms. For example, what happens to the quality of the methods if we increase the training data size from, say, 100 MB to over 1 GB? Moreover, are quality gains worth it when the rate of data processing diminishes? Can we trade quality for time efficiency and recover the quality loss by just being able to process more data? We attempt to answer these questions in a general way for text processing tasks, considering the trade-offs involving training data size, learning time, and quality obtained. For this, we propose a performance trade-off framework and apply it to three important tasks: Named Entity Recognition, Sentiment Analysis and Document Classification. These problems were also chosen because they have different levels of object granularity: words, paragraphs, and documents. For each problem, we selected several supervised machine learning algorithms and we evaluated the trade-offs of them on large publicly available data sets (news, reviews, patents). To explore these trade-offs, we use different data subsets of increasing size ranging from 50 MB to several GB. For the last two tasks, we also consider similar algorithms with two different data sets and two evaluation techniques, to study their impact on the resulting trade-offs. We find that the results do not change significantly and that most of the time the best algorithms are the ones with fastest processing time. However, we also show that the results for small data (say less than 1

关键词： supervised machine learning algorithms text processing algorithmic trade-offs learning trade-offs

来源：评论

学校读者我要写书评

暂无评论

Quality-Efficiency Trade-offs in machine learning for Text Processing

Quality-Efficiency Trade-offs in Machine Learning for Text P...

引用

IEEE International Conference on Big Data

作者： Ricardo Baeza-Yates Zeinab Liaghat Web Science and Social Computing Group DTIC Universitat Pompeu Fabra Barcelona Catalonia Spain

关键词： supervised machine learning algorithms text processing algorithmic trade-offs learning trade-offs

来源：评论

学校读者我要写书评

暂无评论

An associative memory approach to medical decision support systems

引用

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2012年第3期106卷 287-307页

作者： Aldape-Perez, Mario Yanez-Marquez, Cornelio Camacho-Nieto, Oscar Argueelles-Cruz, Amadeo J. Ctr Res Comp GA Madero Mexico City 07738 DF Mexico Super Sch Comp GA Madero Mexico City 07738 DF Mexico

Classification is one of the key issues in medical diagnosis. In this paper, a novel approach to perform pattern classification tasks is presented. This model is called Associative Memory based Classifier (AMBC). Throughout the experimental phase, the proposed algorithm is applied to help diagnose diseases;particularly, it is applied in the diagnosis of seven different problems in the medical field. The performance of the proposed model is validated by comparing classification accuracy of AMBC against the performance achieved by other twenty well known algorithms. Experimental results have shown that AMBC achieved the best performance in three of the seven pattern classification problems in the medical field. Similarly, it should be noted that our proposal achieved the best classification accuracy averaged over all datasets. (C) 2011 Elsevier Ireland Ltd. All rights reserved.

关键词： Associative memories Decision support systems supervised machine learning algorithms Pattern classification

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：