Credit risk analysis seeks to determine whether a customer is likely to default on the financial obligation, which is a very important problem in finance. In this paper, we will present a machine learning framework to...
详细信息
ISBN:
(纸本)9781509034857
Credit risk analysis seeks to determine whether a customer is likely to default on the financial obligation, which is a very important problem in finance. In this paper, we will present a machine learning framework to deal with this problem by formulating it as a binary classification problem. The framework consists of two parts: dictionary learning and classifier training. Firstly, we introduce a sparse K-SVD method to discovery a sparse dictionary to represent the training data, which will contribute to good performance and efficient computation in subsequent classification. Secondly, SVM classifiers are training on the new representations of the training data. The trained classifiers will serve as a credit risk analysis expert to automatically classify the testing data. We evaluate the framework in a real-world credit risk prediction task, and the empirical results demonstrate the advantage of the method by comparing with other strategies.
In this paper, we present the simulation results of FFO(fractional frequency offset) influences on PLC preamble correlation based PLC synchronization in DOCSIS 3.1 Downstream. FFO makes inter-carrier interferences and...
详细信息
ISBN:
(纸本)9781509013265
In this paper, we present the simulation results of FFO(fractional frequency offset) influences on PLC preamble correlation based PLC synchronization in DOCSIS 3.1 Downstream. FFO makes inter-carrier interferences and breaks orthogonality in OFDM based transmission system. It degrades the performance of OFDM by reducing SNR(signal to noise ratio). The performance of PLC synchronization is influenced by the reduced SNR caused by FFO. We analyzed the influences of FFO on PLC synchronization by applying various FFO values. We showed the PLC synchronization performances to various FFO values and discussed the results considering requirements of FFO compensation performance in OFDM signal reception.
This paper introduces algorithms of extracting log patterns from system logs in LARGE system. LARGE is a log analyzing framework deployed in the Scientific Computing Grid Environment in Chinese Academy of Sciences. In...
详细信息
ISBN:
(纸本)9781509036837
This paper introduces algorithms of extracting log patterns from system logs in LARGE system. LARGE is a log analyzing framework deployed in the Scientific Computing Grid Environment in Chinese Academy of Sciences. In the process of monitoring systems logs, there is the need of obtaining patterns from log records, and set corresponding rules to deal with logs in certain patterns. Two algorithms named identical words rate and tree-matching to fulfill this requirement are illustrated, and their performances are compared to show the capability and efficiency of extracting log patterns. This work can be extended to various distributed and cloud environments for log monitoring.
In this paper, we propose the Online Contextual Influence Maximization Problem (OCIMP). In OCIMP, the learner faces a series of epochs in each of which a different influence campaign is run to promote a certain produc...
详细信息
In this paper, we propose the Online Contextual Influence Maximization Problem (OCIMP). In OCIMP, the learner faces a series of epochs in each of which a different influence campaign is run to promote a certain product in a given social network. In each epoch, the learner first distributes a limited number of free-samples of the product among a set of seed nodes in the social network. Then, the influence spread process takes place over the network, other users get influenced and purchase the product. The goal of the learner is to maximize the expected total number of influenced users over all epochs. We depart from the prior work in two aspects: (i) the learner does not know how the influence spreads over the network, i.e., it is unaware of the influence probabilities; (ii) influence probabilities depend on the context. We develop a learning algorithm for OCIMP, called Contextual Online INfluence maximization (COIN). COIN can use any approximation algorithm that solves the offline influence maximization problem as a subroutine to obtain the set of seed nodes in each epoch. When the influence probabilities are Hölder continuous functions of the context, we prove that COIN achieves sublinear regret with respect to an approximation oracle that knows the influence probabilities for all contexts. Moreover, our regret bound holds for any sequence of contexts. We also test the performance of COIN on several social networks, and show that it performs better than other methods.
Though various image segmentation techniques have been developed, it is still a very challenging task to design a robust and efficient algorithm to segment (noisy, blurred or even discontinuous edged) images having hi...
详细信息
ISBN:
(纸本)9781509006274
Though various image segmentation techniques have been developed, it is still a very challenging task to design a robust and efficient algorithm to segment (noisy, blurred or even discontinuous edged) images having high intensity inhomogeneity or non-homogeneity. In this article, a robust fuzzy energy based active contour, using both global and local information, is proposed to detect objects in a given image based on curve evolution. The local energy is generated by considering both local spatial and gray level/color information. The proposed model can better deal with images having high intensity inhomogeneity or non-homogeneity, noise and blurred boundary or discontinuous edges by incorporating local energy term in the proposed active contour energy function. The global energy term is used to avoid unsatisfactory results due to bad initialization. We show a realization of the proposed method and demonstrate its performance (both qualitatively and quantitatively) with respect to state-of-the-art techniques on several images having such kind of artifacts. analysis of results concludes that the proposed method can detect objects from given images in a better way than the existing ones.
In order to remove resource barriers and smooth the learning curve for education on big data analytics in STEM disciplines, we develop an portable open source labware that is called STEM-BD for promoting education on ...
详细信息
ISBN:
(纸本)9781509017911
In order to remove resource barriers and smooth the learning curve for education on big data analytics in STEM disciplines, we develop an portable open source labware that is called STEM-BD for promoting education on big data analytics. STEM-BD integrates the following four critical components, big data platform, big data sets, data analytics algorithms and hands-on lab exercises in a multi-dimensional and customizable way. In this paper, we provide a detailed description of the design goal of STEM-BD, its prototype, preliminary evaluation results, and future development.
The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, l...
详细信息
The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited budget and under high rates of data generation. The foremost challenge identified here is the data processing, and especially, mining and analysis for knowledge extraction. As the 'old' data mining frameworks were designed without Big Data requirements, a new generation of such frameworks is being developed fully implemented in Cloud platforms. One such frameworks is Apache Mahout aimed to leverage fast processing and analysis of Big Data. The performance of such new data mining frameworks is yet to be evaluated and potential limitations are to be revealed. In this paper we analyse the performance of Apache Mahout using large real data sets from the Twitter stream. We exemplify the analysis for the case of two clustering algorithms, namely, k-Means and Fuzzy k-Means, using a Hadoop cluster infrastructure for the experimental study.
Sentiment analysis is one of the most popular research topics in last years. There are lots of data on web which require analysis in order for them to become useful. Many researchers have focused on making sense of th...
详细信息
Sentiment analysis is one of the most popular research topics in last years. There are lots of data on web which require analysis in order for them to become useful. Many researchers have focused on making sense of these data. Therefore, sentiment analysis concept is proposed. Sentiment analysis methods try to emerge any opinions, feelings, and subjectivity behind the text. Machine learning algorithms and vocabulary based methods are used to perform sentiment analysis. In this research, (i) recently studied researches on machine learning based sentiment analysis are investigated to give background; (ii) they are classified according to their tasks on extracting information; (iii) the encountered and potential challenges on this research topic are revisited and discussed.
Recent research has demonstrated that social media could provide valuable spatio-temporal data about users activities. However, information extraction and computation from big amount of data pose various challenges. T...
详细信息
Recent research has demonstrated that social media could provide valuable spatio-temporal data about users activities. However, information extraction and computation from big amount of data pose various challenges. To effectively process massive datasets, several platforms have been developed. Our previous study [20] explored Hadoop-based cloud computing for processing big amount of social media data [9] to study geographic distributions of social media users. In this paper, we investigate an emerging system named Spark and present a timely pilot experience on geospatial big data research. In our study, Spark has been utilized to perform some classic geospatial analyses like K-Nearest Neighbors (KNN), geographic mean and median points, and the distribution of the median points. Our design is tested on an Amazon EC2 cluster. An exemplary study using 60GB, 120GB and 180GB Twitter data has demonstrated the performance achievements by migrating computing tasks from Hadoop to Spark. In our experiments, the Spark-based solution can be up to 2.3x faster than the Hadoop-based solution due to its in-memory processing and coarse-grained resource allocation strategy. In the paper, we also discuss optimization strategies on using Spark for different geospatial computing tasks.
A system is said to be fault-tolerant if it remains functional even after a fault occurs. By describing faults as unpredicted events, we study the active fault-tolerance of discrete-event systems (DES) while ensuring ...
详细信息
ISBN:
(纸本)9781467386838
A system is said to be fault-tolerant if it remains functional even after a fault occurs. By describing faults as unpredicted events, we study the active fault-tolerance of discrete-event systems (DES) while ensuring safety requirements. Starting from a finite automaton model of the uncontrolled plant, our proposed control framework consists of nominal supervision, fault diagnosis and active post-fault control reconfiguration. First a nominal supervisor is designed with respect to the nominal mode to ensure the control specification prior to the occurrence of faults. Second, a learning-based algorithm is proposed to compute a diagnoser that can detect the occurrence of a fault. Necessary and sufficient conditions under which a post-fault safety-enforcing control reconfiguration is feasible are explored, and a second learning-based designalgorithm for the post-fault supervisor is presented by using the limited lookahead policies. Effectiveness the proposed framework is examined through an example.
暂无评论