Recently, the amount of blogs on the Internet rises sharply. Hence, mining valuable information in blogs possesses realistic significance for improving user experience, network services, etc. This paper proposes a min...
详细信息
ISBN:
(纸本)9783037858882
Recently, the amount of blogs on the Internet rises sharply. Hence, mining valuable information in blogs possesses realistic significance for improving user experience, network services, etc. This paper proposes a mining algorithm with blog authors' interests based on classification techniques, which introduces an evaluation standard of non-empty intersection. This algorithm can also improve the hit ratio of recommendation service based on blog authors' interests by means of the interest collection from expanding prediction;therefore, it can reach a higher degree of satisfaction. In addition, this paper performs experiments with the data set from Sina Blog and NetEase Blog, whose result illustrates the higher accuracy of our algorithm.
This paper proposes a serial architecture as implementation for Two Means Decision Tree. This DT algorithm exhibits lower complexity. This architecture is implemented on Field Programmable Gate Array (FPGA) running at...
详细信息
ISBN:
(纸本)9781665404785
This paper proposes a serial architecture as implementation for Two Means Decision Tree. This DT algorithm exhibits lower complexity. This architecture is implemented on Field Programmable Gate Array (FPGA) running at 62 MHz. Simulation results show that the proposed hardware architecture exhibits 10x speed-up as compared to its software implementation and runs 28x faster than C4.5 algorithm. It also consumes less power as compared to complex algorithms implemented On Graphical Processor Units (GPU). Hence the architecture is suitable for simple low power high speed applications.
Hyponymy is one of the most critical semantic relations, which contributes magnificently to semantic dictionary, information retrieval etc. In this paper, a method of extracting hyponymy is proposed based on multiple ...
详细信息
ISBN:
(纸本)9781467399043
Hyponymy is one of the most critical semantic relations, which contributes magnificently to semantic dictionary, information retrieval etc. In this paper, a method of extracting hyponymy is proposed based on multiple data sources fusion, which convert the extraction of hyponymy to the extraction of hypernyms for target words. First, mining candidate hypernyms for the target words based on search engine, encyclopedia resources and core suffix words. Second, fusing the candidates from the above data sources. At last, the classification algorithm is used to filter the noise and extract the hypernyms, which is a quite mature machine learning algorithm. There is hyponymy between the target words and their correctly extracted hypernyms. The experimental results show that the highest accuracy rate of hyponymy extraction reaches 0.832 using the proposed method.
Internet provides great convenience for our life and becomes an important channel to get information. However, a large amount of false information, called rumors, come with it. In terms of automatically detecting rumo...
详细信息
ISBN:
(纸本)9781728114101
Internet provides great convenience for our life and becomes an important channel to get information. However, a large amount of false information, called rumors, come with it. In terms of automatically detecting rumors, two main contributions of this paper are as follows: (1) To reduce the impact of unbalanced data on classification, we proposed an improvement SMOTE algorithm to resample data. (2) We proposed six new features based on Sina microblogs, including Words with Guidance (WG), Words with Menace (WM), Suspected Topic (ST), Recognition of Information (RI), Degree of Attention to Users (DAU) and Credit Rating (CR), which are related to user-based features, content-based features, propagation-based features and microblog-based features. By building subsets with new features and using machine learning algorithms including Xgboost etc. We tested the effect of rumor detection on a real data set. Experiments showed that our rumor detection method was significantly improved compared with the most advanced method of the same type, with precision, recall and F1 at 0.827, 0.837 and 0.825 respectively, and AUC at 0.895.
Researchers in higher education are beginning to explore the potential of data mining in analyzing data for the purpose of giving quality service and needs of their graduates. Thus, educational data mining emerges as ...
详细信息
ISBN:
(纸本)9781467393799
Researchers in higher education are beginning to explore the potential of data mining in analyzing data for the purpose of giving quality service and needs of their graduates. Thus, educational data mining emerges as one tools to study academic data to identify patterns and help for decision making affecting the education. This paper predicts the employability of IT graduates using nine variables. First, different classification algorithms in data mining were tested making logistic regression with accuracy of 78.4 is implemented. Based on logistic regression analysis, three academic variables directly affect;IT_Core, IT_Professional and Gender identified as significant predictors for employability. The data were collected based on the five year profiles of 515 students randomly selected at the placement office tracer study.
This paper explores the pros and cons of different algorithm models on the same selection problem, and then uses the combined prediction theory to obtain a new combined prediction model to explore its prediction accur...
详细信息
This paper explores the pros and cons of different algorithm models on the same selection problem, and then uses the combined prediction theory to obtain a new combined prediction model to explore its prediction accuracy. The actual problem to be solved is to help financial institutions to scientifically classify customers who choose financial products. We select the bank data set in the UCI database, which is derived from the survey data of a customer conducted by a financial institution in Portugal for a wealth management product. Decision tree C5.0 algorithm, naive Bayes classification algorithm and binary logit model are individually used to carry out a single model of empirical research on financial product customer classification. Through the empirical analysis of the five combination models, it is concluded that in the model that uses the least squares weighting method to determine the weight, the weight appears negative, which does not conform to the actual situation. The model that is based on the least squares weighting method and the model that is based on the simple weighting method are excluded. In contrast, the arithmetic mean weighted model is better than the reciprocal variance weighted model and the reciprocal mean square model. The accuracy reaches 89.91%, which is 0.43% higher than the accuracy of a single model. It can be concluded that the model that is based on the arithmetic average weighting is a better combination forecasting model.
Physiological state abnormality due to genetic diseases, excessive exercise, etc. is becoming a fatal killer endangering people's life and safety because of its hidden characteristics. K-Nearest Neighbor(KNN) Algo...
详细信息
ISBN:
(数字)9798350354621
ISBN:
(纸本)9798350354638;9798350354621
Physiological state abnormality due to genetic diseases, excessive exercise, etc. is becoming a fatal killer endangering people's life and safety because of its hidden characteristics. K-Nearest Neighbor(KNN) algorithm is widely used in various fields due to its simple implementation, but when the sample capacity is too large or the feature attributes are too many, the classification efficiency decreases significantly. This paper proposed an improved KNN(IKNN) algorithm based on clustering by hierarchically clustering the data in data pre-processing, which reduced the search space of the algorithm and effectively improved the search efficiency. When the improved KNN algorithm was applied in the physiological state abnormality discrimination field, which better improved the efficiency and accuracy of physiological abnormality discrimination. Results show that this could provide an effective guarantee for the early discovery of physiological parameter abnormality symptom, the timely adoption of dispositive measures, and the maintenance of people's life safety.
With the development of the times, the traditional personal credit is facing a severe test. This paper makes an exploratory study on the practical application and development of personal credit evaluation by using the...
详细信息
ISBN:
(纸本)9783319943619;9783319943602
With the development of the times, the traditional personal credit is facing a severe test. This paper makes an exploratory study on the practical application and development of personal credit evaluation by using the MicroBlog data. According to the previous study of personal credit evaluation literature to dig out the credit-related indicators. We summed up the three major attributes of "Attributes of Demographic", "Tweets Content", and "User Relationship Structure". We use support vector machine (SVM), naive Bayesian (NB), logical regression (LR) and AdaBoost classification algorithm, according to the actual problem modeling, to analysis of social network data on personal credit. Compared with other algorithms, the AUC value of AdaBoost algorithm achieves the best effect with 0.564 under the equalization setting.
A review of literature shows that bank branching can be influenced by economic, geographic, legal, cultural and demographic characteristics apart from institutional decision making. Identifying the branching pattern c...
详细信息
ISBN:
(纸本)9781538653142
A review of literature shows that bank branching can be influenced by economic, geographic, legal, cultural and demographic characteristics apart from institutional decision making. Identifying the branching pattern can help in answering many questions like how a particular banking group is different from other? It can reveal secrets regarding strategic decision making about diversification and growth of banking networks. Present study apply machine learning algorithms to recognize the branching pattern of banking networks in India and compare the differences across groups and types. An evaluation of the performance of the algorithms shows that it is very useful employ a narrative approach for getting important insights from the data compared to traditional approaches. The study also provides a visual map evaluation to enrich the outcomes from the pattern recognition exercise.
Alzheimer's disease (AD) is the most common neurodegenerative dementia of old age and the leading chronic disease contributor to disability and dependence among older people worldwide. Handwriting is among the mot...
详细信息
ISBN:
(纸本)9783030298913;9783030298906
Alzheimer's disease (AD) is the most common neurodegenerative dementia of old age and the leading chronic disease contributor to disability and dependence among older people worldwide. Handwriting is among the motor activities compromised by AD, which is the result of a complex network of cognitive, kinaesthetic and perceptive-motor skills. Indeed, researchers have shown that the patients affected by these diseases exhibit alterations in the spatial organization and poor control of movement. In this paper, we present the preliminary results of a study in which an experimental protocol (including the copy of words, letters and sentence task) has been used to assess the kinematic properties of the movements involved in the handwriting. The obtained results are very encouraging and seem to confirm the hypothesis that machine learning-based analysis of handwriting can be profitably used to support AD diagnosis.
暂无评论