the International Workshop on Emerging trends in Software Metrics aims at gathering together researchers and practitioners to discuss the progress of software metrics. the motivation for this workshop is the low impac...
详细信息
ISBN:
(纸本)9781467330756
the International Workshop on Emerging trends in Software Metrics aims at gathering together researchers and practitioners to discuss the progress of software metrics. the motivation for this workshop is the low impact that software metrics has on current software development. the goals of this workshop includes critically examining the evidence for the effectiveness of existing metrics and identifying new directions for metrics. Evidence for existing metrics includes how the metrics have been used in practice and studies showing their effectiveness. Identifying new directions includes use of new theories, such as complex network theory, on which to base metrics.
We advance boththe theory and practice of robust ℓp-quasinorm regression for p ∈ (0,1] by using novel variants of iteratively reweighted least-squares (IRLS) to solve the underlying non-smooth problem. In the convex...
ISBN:
(纸本)9781713871088
We advance boththe theory and practice of robust ℓp-quasinorm regression for p ∈ (0,1] by using novel variants of iteratively reweighted least-squares (IRLS) to solve the underlying non-smooth problem. In the convex case, p = 1, we prove that this IRLS variant converges globally at a linear rate under a mild, deterministic condition on the feature matrix called the stable range space property. In the non- convex case, p ∈ (0, 1), we prove that under a similar condition, IRLS converges locally to the global minimizer at a superlinear rate of order 2 - p; the rate becomes quadratic as p → 0. We showcase the proposed methods in three applications: real phase retrieval, regression without correspondences, and robust face restoration. the results show that (1) IRLS can handle a larger number of outliers than other methods, (2) it is faster than competing methods at the same level of accuracy, (3) it restores a sparsely corrupted face image with satisfactory visual quality.
Structured prediction of tree-shaped objects is heavily studied under the name of syntactic dependency parsing. currentpractice based on maximum likelihood or margin is either agnostic to or inconsistent withthe eva...
ISBN:
(纸本)9781713871088
Structured prediction of tree-shaped objects is heavily studied under the name of syntactic dependency parsing. currentpractice based on maximum likelihood or margin is either agnostic to or inconsistent withthe evaluation loss. Risk minimization alleviates the discrepancy between training and test objectives but typically induces a non-convex problem. these approaches adopt explicit regularization to combat overfitting without probabilistic interpretation. We propose a moment-based distributionally robust optimization approach for tree structured prediction, where the worst-case expected loss over a set of distributions within bounded moment divergence from the empirical distribution is minimized. We develop efficient algorithms for arborescences and other variants of trees. We derive Fisher consistency, convergence rates and generalization bounds for our proposed method. We evaluate its empirical effectiveness on dependency parsing benchmarks.
this book constitutes the refereed proceedings of the 23rd Australasian Joint conference on Rough Sets and Intelligent Systems Paradigms, RSEISP 2014, held in Granada and Madrid, Spain, in July 2014. RSEISP 2014 was h...
ISBN:
(数字)9783319087290
ISBN:
(纸本)9783319087283;9783319087290
this book constitutes the refereed proceedings of the 23rd Australasian Joint conference on Rough Sets and Intelligent Systems Paradigms, RSEISP 2014, held in Granada and Madrid, Spain, in July 2014. RSEISP 2014 was held along withthe 9th International conference on Rough Sets and currenttrends in Computing, RSCTC 2014, as a major part of the 2014 Joint Rough Set Symposium, JRS 2014. JRS 2014 received 40 revised full papers and 37 revised short papers which were carefully reviewed and selected from 120 submissions and presented in two volumes. this volume contains the papers accepted for the conference RSEISP 2014, as well as the three invited papers presented at the conference. the papers are organized in topical sections on plenary lecture and tutorial papers; foundations of rough set theory; granular computing and covering-based rough sets; applications of rough sets; induction of decision rules - theory and practice; knowledge discovery; spatial data analysis and spatial databases; information extraction from images.
Equilibrium systems are a powerful way to express neural computations. As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibri...
ISBN:
(纸本)9781713871088
Equilibrium systems are a powerful way to express neural computations. As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep equilibrium models, or meta-learning. Here, we present a new principle for learning such systems with a temporally-and spatially-local rule. Our principle casts learning as a least-control problem, where we first introduce an optimal controller to lead the system towards a solution state, and then define learning as reducing the amount of control needed to reach such a state. We show that incorporating learning signals within a dynamics as an optimal control enables transmitting activity-dependent credit assignment information, avoids storing intermediate states in memory, and does not rely on infinitesimal learning signals. In practice, our principle leads to strong performance matching that of leading gradient-based learning methods when applied to an array of problems involving recurrent neural networks and meta-learning. Our results shed light on how the brain might learn and offer new ways of approaching a broad class of machine learning problems.
Math is a major contributor to many areas of study, and gives someone skills that (s)he can use across other subjects and different job roles. Unfortunately, a recent study from the National Assessment of Educational ...
详细信息
ISBN:
(数字)9798331527235
ISBN:
(纸本)9798331527242
Math is a major contributor to many areas of study, and gives someone skills that (s)he can use across other subjects and different job roles. Unfortunately, a recent study from the National Assessment of Educational Progress shows that no more than 26% of 12 th graders in the USA have been rated proficient in math since 2005, and COVID-19 only made the situation worse. In principle, appropriate online searching could promote learning of individual math concepts to help surmount the learning gap. In practice, however, current online searching works poorly for math. While traditional information retrieval systems identify semantically related documents outside of math, such systems were not designed for handling math formulas. Although some work has been done on Mathematical Information Retrieval (MIR) recently, little has focused specifically on developing indexing schemas to quickly search for and retrieve math formulas contained within math questions and answers. the objective of indexing symbols and notations used in math equations is to organize and categorize math information in a way that makes it easier to retrieve and access relevant answers to math questions. To achieve this objective, we propose a robust random search approach for retrieving math information, offering an optimal solution to speed up the process of searching huge volume of math archive. Our design goals of indexing math equations include fast matching math answers to questions, reducing disk input/output, and optimizing the process of solving math questions with suitable answers by enhancing its processing speed.
In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowl...
ISBN:
(纸本)9781713871088
In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep neural networks (e.g., RNN, LSTM) because of the exploding gradient issue. Gradient clipping is usually employed to address this issue in the single machine setting, but exploring this technique in the distributed setting is still in its infancy: it remains mysterious whether the gradient clipping scheme can take advantage of multiple machines to enjoy parallel speedup. the main technical difficulty lies in dealing with nonconvex loss function, non-Lipschitz continuous gradient, and skipping communication rounds simultaneously. In this paper, we explore a relaxed-smoothness assumption of the loss landscape which LSTM was shown to satisfy in previous works, and design a communication-efficient gradient clipping algorithm. this algorithm can be run on multiple machines, where each machine employs a gradient clipping scheme and communicate with other machines after multiple steps of gradient-based updates. Our algorithm is proved to have O (1/Nε4) iteration complexity and O(1/ε3) communication complexity for finding an e-stationary point in the homogeneous data setting, where N is the number of machines. this indicates that our algorithm enjoys linear speedup and reduced communication rounds. Our proof relies on novel analysis techniques of estimating truncated random variables, which we believe are of independent interest. Our experiments on several benchmark datasets and various scenarios demonstrate that our algorithm indeed exhibits fast convergence speed in practice and thus validates our theory.
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck and gradient compression is widely used to reduce the number of bits sent within each ...
ISBN:
(纸本)9781713871088
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck and gradient compression is widely used to reduce the number of bits sent within each communication round of iterative methods. there are two classes of compression operators and separate algorithms making use of them. In the case of unbiased random compressors with bounded variance (e.g., rand-k), the DIANA algorithm of Mishchenko et al. (2019), which implements a variance reduction technique for handling the variance introduced by compression, is the current state of the art. In the case of biased and contractive compressors (e.g., top-k), the EF21 algorithm of Richtárik et al. (2021), which instead implements an error-feedback mechanism, is the current state of the art. these two classes of compression schemes and algorithms are distinct, with different analyses and proof techniques. In this paper, we unify them into a single framework and propose a new algorithm, recovering DIANA and EF21 as particular cases. Our general approach works with a new, larger class of compressors, which has two parameters, the bias and the variance, and includes unbiased and biased compressors as particular cases. this allows us to inherit the best of the two worlds: like EF21 and unlike DIANA, biased compressors, like top-k, whose good performance in practice is recognized, can be used. And like DIANA and unlike EF21, independent randomness at the compressors allows to mitigate the effects of compression, withthe convergence rate improving when the number of parallel workers is large. this is the first time that an algorithm with all these features is proposed. We prove its linear convergence under certain conditions. Our approach takes a step towards better understanding of two so-far distinct worlds of communication-efficient distributed learning.
暂无评论