Why do neural networks trained with large learning rates for a longer time often lead to better generalization? In this paper, we delve into this question by examining the relation between training and testing loss in...
详细信息
Why do neural networks trained with large learning rates for a longer time often lead to better generalization? In this paper, we delve into this question by examining the relation between training and testing loss in neural networks. Through visualization of these losses, we note that the training trajectory with a large learning rate navigates through the minima manifold of the training loss, finally nearing the neighborhood of the testing loss minimum. Motivated by these findings, we introduce a nonlinear model whose loss landscapes mirror those observed for real neural networks. Upon investigating the training process using SGD on our model, we demonstrate that an extended phase with a large learning rate steers our model towards the minimum norm solution of the training loss, which may achieve near-optimal generalization, thereby affirming the empirically observed benefits of late learning rate decay.
This study proposes and evaluates a new Bayesian network classifier (BNC) having an I-map structure with the fewest class variable parameters among all structures for which the class variable has no parent. Moreover, ...
详细信息
ISBN:
(纸本)1577358872
This study proposes and evaluates a new Bayesian network classifier (BNC) having an I-map structure with the fewest class variable parameters among all structures for which the class variable has no parent. Moreover, a new learning algorithm to learn our proposed model is presented. The proposed method is guaranteed to obtain the true classification probability asymptotically. Moreover, the method has lower computational costs than those of exact learning BNC using marginal likelihood. Comparison experiments have demonstrated the superior performance of the proposed method.
As IoT networks become more complex and generate massive amounts of dynamic data, it is difficult to monitor and detect anomalies using traditional statistical methods and machine learning methods. Deep learning algor...
详细信息
ISBN:
(纸本)9798350304367;9798350304374
As IoT networks become more complex and generate massive amounts of dynamic data, it is difficult to monitor and detect anomalies using traditional statistical methods and machine learning methods. Deep learning algorithms can process and learn from large amounts of data and can also be trained using unsupervised learning techniques, meaning they don't require labelled data to detect anomalies. This makes it possible to detect new and unknown anomalies that may not have been detected before. Also, deep learning algorithms can be automated and highly scalable;thereby, they can run continuously in the backend and make it achievable to monitor large IoT networks instantly. In this work, we conduct a literature review on the most recent works using deep learning techniques and implement a model using ensemble techniques on the KDD Cup 99 dataset. The experimental results showcase the impressive performance of our deep anomaly detection model, achieving an accuracy of over 98%.
Recent Newton-type federated learning algorithms have demonstrated linear convergence with respect to the communication rounds. However, communicating Hessian matrices is often unfeasible due to their quadratic commun...
详细信息
ISBN:
(纸本)1577358872
Recent Newton-type federated learning algorithms have demonstrated linear convergence with respect to the communication rounds. However, communicating Hessian matrices is often unfeasible due to their quadratic communication complexity. In this paper, we introduce a novel approach to tackle this issue while still achieving fast convergence rates. Our proposed method, named as Federated Newton Sketch methods (FedNS), approximates the centralized Newton's method by communicating the sketched square-root Hessian instead of the exact Hessian. To enhance communication efficiency, we reduce the sketch size to match the effective dimension of the Hessian matrix. We provide convergence analysis based on statistical learning for the federated Newton sketch approaches. Specifically, our approaches reach super-linear convergence rates w.r.t. the communication rounds for the first time. We validate the effectiveness of our algorithms through various experiments, which coincide with our theoretical findings.
In this paper we initiate the study of financial asset design with fairness as an explicit goal. We consider a variation on the classical problem of optimal portfolio design. In our setting, an individual consumer is ...
详细信息
Item Response Theory (IRT) models aim to assess latent abilities of n examinees along with latent difficulty characteristics of m test items from categorical data that indicates the quality of their corresponding answ...
详细信息
Item Response Theory (IRT) models aim to assess latent abilities of n examinees along with latent difficulty characteristics of m test items from categorical data that indicates the quality of their corresponding answers. Classical psychometric assessments are based on a relatively small number of examinees and items, say a class of 200 students solving an exam comprising 10 problems. More recent global large scale assessments such as PISA, or internet studies, may lead to significantly increased numbers of participants. Additionally, in the context of Machine learning where algorithms take the role of examinees and data analysis problems take the role of items, both n and m may become very large, challenging the efficiency and scalability of computations. To learn the latent variables in IRT models from large data, we leverage the similarity of these models to logistic regression, which can be approximated accurately using small weighted subsets called coresets. We develop coresets for their use in alternating IRT training algorithms, facilitating scalable learning from large data.
Pretrained, self-supervised vision transformers are revolutionizing the field of computer vision with their ability to learn useful features for downstream classification tasks without requiring labeled training data....
详细信息
ISBN:
(纸本)9798350304626
Pretrained, self-supervised vision transformers are revolutionizing the field of computer vision with their ability to learn useful features for downstream classification tasks without requiring labeled training data. This paper asks if these selfsupervised techniques can also be used to transform the field of continuous learning. A fundamental challenge for continuous learning algorithms is to sequentially learn new tasks using just the new task data without degrading performance on the previously learned tasks. Sequential finetuning of a neural network's backbone while learning a new classification task often leads to overfitting the network's weights to new class and altering and degrading its performance on previously learned classes. This paper introduces a new approach that joins a pretrained, selfsupervised vision transformer with an incremental learning technique called eXtending Rapid Class Augmentation (XRCA). The XRCA method is distinct with its recursive memory and classifier-based incremental learning approach. This approach is shown to learn a new classification task extremely rapidly and in a manner that jointly optimizes over both old and new classes using just the new class data. This paper examines the coupling this classifier-focused incremental learning approach with a pretrained, self-supervised, feature extraction backbone. This new self-supervised approach is compared to those that use pretrained supervised features, finetuned features and domainadapted features. The results indicate a promising new direction for continuous learning algorithms that utilize self-supervision's ability to generalize to new classes with a recursive, classifiercentric approach to incremental learning.
The problem of learning a computational model from examples has been receiving growing attention. For the particularly challenging problem of learning models of distributed systems, existing results are restricted to ...
详细信息
ISBN:
(纸本)1577358872
The problem of learning a computational model from examples has been receiving growing attention. For the particularly challenging problem of learning models of distributed systems, existing results are restricted to models with a fixed number of interacting processes. In this work we look for the first time (to the best of our knowledge) at the problem of learning a distributed system with an arbitrary number of processes, assuming only that there exists a cutoff, i.e., a number of processes that is sufficient to produce all observable behaviors. Specifically, we consider fine broadcast protocols, these are broadcast protocols (BPs) with a finite cutoff and no hidden states. We provide a learning algorithm that can infer a correct BP from a sample that is consistent with a fine BP, and a minimal equivalent BP if the sample is sufficiently complete. On the negative side we show that (a) characteristic sets of exponential size are unavoidable, (b) the consistency problem for fine BPs is NP hard, and (c) that fine BPs are not polynomially predictable.
We propose a simple network of Hawkes processes as a cognitive model capable of learning to classify objects. Our learning algorithm, named HAN for Hawkes Aggregation of Neurons, is based on a local synaptic learning ...
详细信息
We propose a simple network of Hawkes processes as a cognitive model capable of learning to classify objects. Our learning algorithm, named HAN for Hawkes Aggregation of Neurons, is based on a local synaptic learning rule based on spiking probabilities at each output node. We were able to use local regret bounds to prove mathematically that the network is able to learn on average and even asymptotically under more restrictive assumptions.
A common approach for solving planning problems is to model them in a formal language such as the Planning Domain Definition Language (PDDL), and then use an appropriate PDDL planner. Several algorithms for learning P...
详细信息
ISBN:
(纸本)1577358872
A common approach for solving planning problems is to model them in a formal language such as the Planning Domain Definition Language (PDDL), and then use an appropriate PDDL planner. Several algorithms for learning PDDL models from observations have been proposed but plans created with these learned models may not be sound. We propose two algorithms for learning PDDL models that are guaranteed to be safe to use even when given observations that include partially observable states. We analyze these algorithms theoretically, characterizing the sample complexity each algorithm requires to guarantee probabilistic completeness. We also show experimentally that our algorithms are often better than FAMA, a state-of-the-art PDDL learning algorithm.
暂无评论