Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previo...
详细信息
Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previous studies have treated head and tail labels equally, resulting in unsatisfactory performance for identifying tail labels. To address this issue, this paper proposes a novel learning method that combines arbitrary models with two steps. The first step is the “diverse ensemble” that encourages diverse predictions among multiple shallow classifiers, particularly on tail labels, and can improve the generalization of tail *** second is the “error correction” that takes advantage of accurate predictions on head labels by the base model and approximates its residual errors for tail labels. Thus, it enables the “diverse ensemble” to focus on optimizing the tail label performance. This overall procedure is called residual diverse ensemble(RDE). RDE is implemented via a single-hidden-layer perceptron and can be used for scaling up to hundreds of thousands of labels. We empirically show that RDE consistently improves many existing models with considerable performance gains on benchmark datasets, especially with respect to the propensity-scored evaluation ***, RDE converges in less than 30 training epochs without increasing the computational overhead.
Multi-dimensional classification(MDC) aims to build classification models for multiple heterogenous class spaces simultaneously, where each class space characterizes the semantics of an object w.r.t. one specific dime...
详细信息
Multi-dimensional classification(MDC) aims to build classification models for multiple heterogenous class spaces simultaneously, where each class space characterizes the semantics of an object w.r.t. one specific dimension. Modeling dependencies among class spaces plays a key role in solving MDC tasks, where most approaches work by assuming directed acyclic graph(DAG) structure or random chaining structure over class spaces. Different from existing probabilistic strategies, a deterministic strategy named Seem for dependency modeling is proposed in this paper via stacked dependency exploitation. In the first-level, pairwise dependencies are considered which can be modeled more reliably than modeling full dependencies among all class spaces by DAG or chaining structure. In the second-level, the class label of unseen instance *** class space is determined by adaptively stacking predictive outputs from first-level pairwise *** results show that stacked dependency exploitation leads to superior performance against stateof-the-art MDC approaches.
We present the construction of quantum error-locating(QEL) codes based on classical error-locating(EL)codes. Similar to classical EL codes, QEL codes lie midway between quantum error-correcting codes and quantum error...
详细信息
We present the construction of quantum error-locating(QEL) codes based on classical error-locating(EL)codes. Similar to classical EL codes, QEL codes lie midway between quantum error-correcting codes and quantum errordetecting codes. Then QEL codes can locate qubit errors within one sub-block of the received qubit symbols but do not need to determine the exact locations of the erroneous qubits. We show that, an e-error-locating code derived from an arbitrary binary cyclic code with generator polynomial g(x), can lead to a QEL code with e error-locating abilities, only if g(x) does not contain the(1 + x)-factor.
network embedding is a very important task to represent the high-dimensional network in a lowdimensional vector space,which aims to capture and preserve the network *** existing network embedding methods are based on ...
详细信息
network embedding is a very important task to represent the high-dimensional network in a lowdimensional vector space,which aims to capture and preserve the network *** existing network embedding methods are based on shallow ***,actual network structures are complicated which means shallow models cannot obtain the high-dimensional nonlinear features of the network *** recently proposed unsupervised deep learning models ignore the labels *** address these challenges,in this paper,we propose an effective network embedding method of Structural Labeled Locally Deep Nonlinear Embedding(SLLDNE).SLLDNE is designed to obtain highly nonlinear features through utilizing deep neural network while preserving the label information of the nodes by using a semi-supervised classifier component to improve the ability of ***,we exploit linear reconstruction of neighborhood nodes to enable the model to get more structural *** experimental results of vertex classification on two real-world network datasets demonstrate that SLLDNE outperforms the other state-of-the-art methods.
Biological network alignment is an important research topic in the field of bioinformatics. Nowadays almost every existing alignment method is designed to solve the deterministic biological network alignment ***, it i...
详细信息
Biological network alignment is an important research topic in the field of bioinformatics. Nowadays almost every existing alignment method is designed to solve the deterministic biological network alignment ***, it is worth noting that interactions in biological networks, like many other processes in the biological realm,are probabilistic events. Therefore, more accurate and better results can be obtained if biological networks are characterized by probabilistic graphs. This probabilistic information, however, increases difficulties in analyzing networks and only few methods can handle the probabilistic information. Therefore, in this paper, an improved Probabilistic Biological network Alignment(PBNA) is proposed. Based on Iso Rank, PBNA is able to use the probabilistic information. Furthermore, PBNA takes advantages of Contributor and Probability Generating Function(PGF) to improve the accuracy of node similarity value and reduce the computational complexity of random variables in similarity matrix. Experimental results on dataset of the Protein-Protein Interaction(PPI) networks provided by Todor demonstrate that PBNA can produce some alignment results that ignored by the deterministic methods, and produce more biologically meaningful alignment results than Iso Rank does in most of the cases based on the Gene Ontology Consistency(GOC) measure. Compared with Prob method, which is designed exactly to solve the probabilistic alignment problem, PBNA can obtain more biologically meaningful mappings in less time.
Multi-class classification can be solved by decomposing it into a set of binary classification problems according to some encoding rules,e.g.,one-vs-one,one-vs-rest,error-correcting output *** works solve these binary...
详细信息
Multi-class classification can be solved by decomposing it into a set of binary classification problems according to some encoding rules,e.g.,one-vs-one,one-vs-rest,error-correcting output *** works solve these binary classification problems in the original feature space,while it might be suboptimal as different binary classification problems correspond to different positive and negative *** this paper,we propose to learn label-specific features for each decomposed binary classification problem to consider the specific characteristics containing in its positive and negative ***,to generate the label-specific features,clustering analysis is respectively conducted on the positive and negative examples in each decomposed binary data set to discover their inherent information and then label-specific features for one example are obtained by measuring the similarity between it and all cluster *** clearly validate the effectiveness of learning label-specific features for decomposition-based multi-class classification.
The information leakage problem often exists in bidirectional quantum secure direct communication or quantum dialogue. In this work, we find that this problem also exists in the one-way quantum secure communication pr...
详细信息
The information leakage problem often exists in bidirectional quantum secure direct communication or quantum dialogue. In this work, we find that this problem also exists in the one-way quantum secure communication protocol [Chin. Phys. Lett. 32 (2015) 050301]. Specifically, the first bit of every four-bit message block is leaked out without awareness. A way to improve the information leakage problem is given.
In high-speed Internet links, effective and efficient packet sampling technology in traffic measurement becomes more and more indispensable for its benefits of reducing data quantity and saving various resources. We a...
详细信息
Since Internet is dominated by TCP-based applications, active queue management (AQM) is considered as an effective way for congestion control. However, most AQM schemes suffer obvious performance degradation with dy...
详细信息
Since Internet is dominated by TCP-based applications, active queue management (AQM) is considered as an effective way for congestion control. However, most AQM schemes suffer obvious performance degradation with dynamic traffic. Extensive measurements found that Internet traffic is extremely bursty and possibly self-similar. We propose in this paper a new AQM scheme called multiscale controller (MSC) based on the understanding of traffic burstiness in multiple time scale. Different from most of other AQM schemes, MSC combines rate-based and queue-based control in two time scales. While the rate-based dropping on burst level (large time scales) determines the packet drop aggressiveness and is responsible for low and stable queuing delay, good robustness and responsiveness, the queue-based modulation of the packet drop probability on packet level (small time scales) will bring low loss and high throughput. Stability analysis is performed based on a fluid-flow model of the TCP/MSC congestion control system and simulation results show that MSC outperforms many of the current AQM schemes.
Recently, testing techniques based on dynamic exploration, which try to automatically exercise every possible user interface element, have been extensively used to facilitate fully testing web applications. Most of su...
详细信息
Recently, testing techniques based on dynamic exploration, which try to automatically exercise every possible user interface element, have been extensively used to facilitate fully testing web applications. Most of such testing tools are however not effective in reaching dynamic pages induced by form interactions due to their emphasis on handling client-side scripting. In this paper, we present a combinatorial strategy to achieve a full form test and build an automated test model. We propose an algorithm called pairwise testing with constraints (PTC) to iraplement the strategy. Our PTC algorithm uses pairwise coverage and handles the issues of semantic constraints and illegal values. We have implemented a prototype tool ComjaxTest and conducted an empirical study on five web applications. Experimental results indicate that our PTC algorithm generates less form test cases while achieving a higher coverage of dynamic pages than the general pairwise testing algorithm. Additionally, our ComjaxTest generates a relatively complete test model and then detects more faults in a reasonable amount of time, as compared with other existing tools based on dynamic exploration.
暂无评论