This article investigates the application of the eXtreme Gradient Boosting (XGB) method to the credit evaluation problem based on big data. We first study the theoretical modeling of the credit classification problem ...
详细信息
This article investigates the application of the eXtreme Gradient Boosting (XGB) method to the credit evaluation problem based on big data. We first study the theoretical modeling of the credit classification problem using XGB algorithm, and then we apply the XGB model to the personal loan scenario based on the open data set from Lending Club Platform in USA. The empirical study shows that the XGB model has obvious advantages in both feature selection and classification performance compared to the logistic regression and the other three tree-based models.
This article evolved because several instances of anemia are still discovered too late, especially in communities with limited medical resources and access to laboratory tests. Invasive diagnostic technologies and exp...
详细信息
This article evolved because several instances of anemia are still discovered too late, especially in communities with limited medical resources and access to laboratory tests. Invasive diagnostic technologies and expensive expenses are additional impediments to early diagnosis. An effective, accurate, and non-invasive method is required to detect anemia. In this study, the conjunctival image of the eye is analyzed as a non-invasive method of detecting anemia. Various model approaches were tested in an endeavor to categorize anemic and healthy patients as accurately as possible. The Support Vector machine (SVM) algorithm-integrated MobileNetV2 method was determined to be the most effective plan. With this combination, the accuracy of 93%, sensitivity of 91%, and specificity of 94%. These findings show that the model can successfully identify healthy patients while accurately identifying anemic patients. This method offers a non-invasive means of detecting anemia early on, making it promising for use in clinical settings. The SVM+MobileNetV2 technique relies on images of the eye's conjunctiva and can potentially improve healthcare by identifying people who may have had earlier anemia. This technique stands out as a solid option for the efficient and precise diagnosis of anemia when accuracy, sensitivity, and specificity are balanced.
Spoofing attacks are one of the most critical threats against secure global navigation satellite system (GNSS) positioning. Since correct positioning is a must in many vehicle-to-everything (V2X) communication systems...
详细信息
Spoofing attacks are one of the most critical threats against secure global navigation satellite system (GNSS) positioning. Since correct positioning is a must in many vehicle-to-everything (V2X) communication systems, the detection of these attacks is vital. In the literature on the detection strategies of spoofing attacks, the majority of the solutions are based on the assumption that all available signals are spoofed. In this paper, we focus on detecting spoofing attacks in which authentic and spoofing positioning signals coexist in a V2X system. The observable pseudorange values are utilized with the help of derived hyperbola equations during the design of the spoofing detection algorithms. We propose an algorithm, which is named as the sub-optimal search-based spoofing detection algorithm (Algorithm 1), and it considers all possible numbers of spoofing attacking signals, but not all spoofing scenarios with the same amount of spoofing signals. To address the complexity problems based on the increased number of search scenarios of this approach, we propose another algorithm, which is called subset selection-based spoofing detection algorithm (Algorithm 2), with a smart selection of the search subsets. Both of these algorithms are first compared with fixed detection thresholds, which are determined with the Pareto front approach. Then, the performance of the algorithms is investigated when vehicle mobility and spoofing imperfection are considered. Finally, a supervised learning-based decision tree machinelearning (ML) algorithm is run without specifying any detection threshold. The results indicate that Algorithm 1 provides higher detection rates than the subset selection-based algorithm;however, the false alarm ratios of Algorithm 2 are much lower than its original performance.
Solid-State Drives (SSDs) have significant performance advantages over traditional Hard Disk Drives (HDDs) such as lower latency and higher throughput. Significantly higher price per capacity and limited lifetime, how...
详细信息
Solid-State Drives (SSDs) have significant performance advantages over traditional Hard Disk Drives (HDDs) such as lower latency and higher throughput. Significantly higher price per capacity and limited lifetime, however, prevents designers to completely substitute HDDs by SSDs in enterprise storage systems. SSD-based caching has recently been suggested for storage systems to benefit from higher performance of SSDs while minimizing the overall cost. While conventional caching algorithms such as Least Recently Used (LRU) provide high hit ratio in processors, due to the highly random behavior of Input/Output (I/O) workloads, they hardly provide the required performance level for storage systems. In addition to poor performance, inefficient algorithms also shorten SSD lifetime with unnecessary cache replacements. Such shortcomings motivate us to benefit from more complex non-linear algorithms to achieve higher cache performance and extend SSD lifetime. In this article, we propose RC-RNN, the first reconfigurable SSD-based cache architecture for storage systems that utilizes machinelearning to identify performance-critical data pages for I/O caching. The proposed architecture uses Recurrent Neural Networks (RNN) to characterize ongoing workloads and optimize itself towards higher cache performance while improving SSD lifetime. RC-RNN attempts to learn characteristics of the running workload to predict its behavior and then uses the collected information to identify performance-critical data pages to fetch into the cache. We implement the proposed architecture on a physical server equipped with a Core-i7 CPU, 256GB SSD, and a 2TB HDD running Linux kernel 4.4.0. Experimental results show that RC-RNN characterizes workloads with an accuracy up to 94.6 percent for SNIA I/O workloads. RC-RNN can perform similarly to the optimal cache algorithm by an accuracy of 95 percent on average, and outperforms previous SSD caching architectures by providing up to 7x higher hit ratio
More and more studies found that many complex human diseases occur accompanied by aberrant expression of microRNAs (miRNAs). Small molecule (SM) drugs have been utilized to treat complex human diseases by affecting th...
详细信息
More and more studies found that many complex human diseases occur accompanied by aberrant expression of microRNAs (miRNAs). Small molecule (SM) drugs have been utilized to treat complex human diseases by affecting the expression of miRNAs. Several computational methods were proposed to infer underlying associations between SMs and miRNAs. In our study, we proposed a new calculation model of random forest based small molecule-miRNA association prediction (RFSMMA) which was based on the known SM-miRNA associations in the SM2miR database. RFSMMA utilized the similarity of SMs and miRNAs as features to represent SM-miRNA pairs and further implemented the machinelearning algorithm of random forest to train training samples and obtain a prediction model. In RFSMMA, integrating multiple kinds of similarity can avoid the bias of single similarity and choosing more reliable features from original features can represent SM-miRNA pairs more accurately. We carried out cross validations to assess predictive accuracy of RFSMMA. As a result, RFSMMA acquired AUCs of 0.9854, 0.9839, 0.7052, and 0.9917 +/- 0.0008 under global leave-one-out cross validation (LOOCV), miRNA-fixed local LOOCV, SM-fixed local LOOCV, and 5-fold cross validation, respectively, under data set 1. Based on data set 2, RFSMMA obtained AUCs of 0.8456, 0.8463, 0.6653, and 0.8389 +/- 0.0033 under four cross validations according to the order mentioned above. In addition, we implemented a case study on three common SMs, namely, 5-fluorouracil, 17 beta-estradiol, and 5-aza-2'-deoxycytidine. Among the top 50 associated miRNAs of these three SMs predicted by RFSMMA, 31, 32, and 28 miRNAs were verified, respectively. Therefore, RFSMMA is shown to be an effective and reliable tool for identifying underlying SM-miRNA associations.
Wireless sensor networks (WSNs) have evolved to become an integral part of the contemporary Internet of Things (IoT) paradigm. The sensor node activities of both sensing phenomena in their immediate environments and r...
详细信息
Wireless sensor networks (WSNs) have evolved to become an integral part of the contemporary Internet of Things (IoT) paradigm. The sensor node activities of both sensing phenomena in their immediate environments and reporting their findings to a centralized base station (BS) have remained a core platform to sustain heterogeneous service-centric applications. However, the adversarial threat to the sensors of the IoT paradigm remains significant. Denial of service (DoS) attacks, comprising a large volume of network packets, targeting a given sensor node(s) of the network, may cripple routine operations and cause catastrophic losses to emergency services. This paper presents an intelligent DoS detection framework comprising modules for data generation, feature ranking and generation, and training and testing. The proposed framework is experimentally tested under actual IoT attack scenarios, and the accuracy of the results is greater than that of traditional classification techniques. (C) 2019 Elsevier B.V. All rights reserved.
The rapid expansion of Internet of Things (IoT) adoption has brought about significant cybersecurity challenges, with botnet attacks being a critical concern. To address this issue, machine learning algorithms, partic...
详细信息
The rapid expansion of Internet of Things (IoT) adoption has brought about significant cybersecurity challenges, with botnet attacks being a critical concern. To address this issue, machine learning algorithms, particularly boosting-based approaches, have shown promise in detecting and mitigating botnet intrusions. However, the selection of an appropriate algorithm plays a crucial role in achieving accurate detection and reducing the probability of infection. This article focuses on the utilization of boosting-based algorithms for botnet detection in IoT environments. It evaluates the performance of five boosting-based machine learning algorithms in botnet binary detection. The empirical findings underscored the significant potential of boosting-based algorithms in effectively detecting botnet attacks within IoT environments. The histogram gradient boosting algorithm achieved the best performance for binary detection with an accuracy rate of 0.999977. In addition, a temporal evaluation is presented to evaluate the computational requirements of each algorithm to cope with the resources constrained nature of IoT.
The prediction of intrinsically disordered proteins is a hot research area in *** to the high cost of experimental methods to evaluate disordered regions of protein sequences,it is becoming increasingly important to p...
详细信息
The prediction of intrinsically disordered proteins is a hot research area in *** to the high cost of experimental methods to evaluate disordered regions of protein sequences,it is becoming increasingly important to predict those regions through computational *** this paper,we developed a novel scheme by employing sequence complexity to calculate six features for each residue of a protein sequence,which includes the Shannon entropy,the topological entropy,the sample entropy and three amino acid preferences including Remark 465,Deleage/Roux,and Bfactor(2STD).Particularly,we introduced the sample entropy for calculating time series complexity by mapping the amino acid sequence to a time series of *** our knowledge,the sample entropy has not been previously used for predicting IDPs and hence is being used for the first time in our *** addition,the scheme used a properly sized sliding window in every protein sequence which greatly improved the prediction ***,we used seven machine learning algorithms and tested with 10-fold cross-validation to get the results on the dataset R80 collected by Yang et *** of the dataset DIS1556 from the Database of Protein Disorder(DisProt)(https://***)containing experimentally determined intrinsically disordered proteins(IDPs).The results showed that k-Nearest Neighbor was more appropriate and an overall prediction accuracy of 92%.Furthermore,our method just used six features and hence required lower computational complexity.
The "cocktail party problem" refers to the ability of human listeners to separate the acoustic signal reaching their ears into its individual components, corresponding to individual sound sources in the envi...
详细信息
The "cocktail party problem" refers to the ability of human listeners to separate the acoustic signal reaching their ears into its individual components, corresponding to individual sound sources in the environment. Despite this phenomenon appearing trivial for humans, solving the cocktail party problem computationally remains an ambitious challenge. The approach used in this paper takes inspiration from human strategies for separating an acoustic environment into distinct perceptual auditory streams. A series of time-frequency-based features, analogous to those thought to emerge at various stages in the human auditory processing pathway, are derived from biaural auditory inputs. These feature vectors are used as inputs to an unsupervised cluster analysis used to group feature values that are assumed to correspond to the same object. Reconstructed auditory streams are then correlated to the original components used to create the auditory scene. Our model is capable of reconstructing streams that correlate to the original components (r = 0.3-0.7) used to create the complex auditory scene. The success of the reconstructions is largely dependent on the signal-to-noise ratio of the components of the auditory scene.
The Branch and Bound (BB) algorithm, while ensuring optimality, often encounters performance bottlenecks, characterized by slow execution and high computational overhead, especially when dealing with intricate or exte...
详细信息
The Branch and Bound (BB) algorithm, while ensuring optimality, often encounters performance bottlenecks, characterized by slow execution and high computational overhead, especially when dealing with intricate or extensive problem instances (NP-Hard). This study introduces an innovative approach by dividing the problem into partial (local) problems in a manner that would not compromise optimality and solving sub-problems of each local problem individually, to shrink the solution space. In the initial phase, this research establishes and validates the mathematical foundation of the proposed algorithm, which involves a pruning approach. Subsequently, enhancements are incorporated into the existing BB to partition the solution space into more manageable sub-spaces and consolidate solutions from these sub-spaces. In the final phase, the Enhanced Branch and Bound (EBB) algorithm is applied to a real-world power dispatching optimization case study. The outcomes of this investigation reveal the following: 1) For smaller problem instances, both the conventional BB and the proposed EBB algorithm yield identical optimal solutions. 2) In contrast, the EBB algorithm demonstrates significantly improved performance in solving NP-hard problems that pose challenges for the BB and BB with pruning. The primary contribution of this research is the introduction of EBB, an enhanced version of BB, specifically designed to effectively tackle NP-hard problems. This approach can be integrated with all pruning, branching, and bounding strategies used in BB, thereby boosting its performance and making it applicable to all problems solved by BB variations.
暂无评论