Many combinatorial optimisation problems are NP-Hard. Yet in practice high quality solutions are often obtained by (meta)heuristics. These work well in some cases, but not in others, indicating a potential for algorit...
详细信息
Recent meta-learning approaches are oriented towards algorithmselection, optimization or recommendation of existing algorithms. In this article we show how data-tailored algorithms can be constructed from building bl...
详细信息
Recent meta-learning approaches are oriented towards algorithmselection, optimization or recommendation of existing algorithms. In this article we show how data-tailored algorithms can be constructed from building blocks on small data sub-samples. Building blocks, typically weak learners, are optimized and evolved into data-tailored hierarchical ensembles. Good-performing algorithms discovered by evolutionary algorithm can be reused on data sets of comparable complexity. Furthermore, these algorithms can be scaled up to model large data sets. We demonstrate how one particular template (simple ensemble of fast sigmoidal regression models) outperforms state-of-the-art approaches on the Airline data set. Evolved hierarchical ensembles can therefore be beneficial as algorithmic building blocks in meta-learning, including meta-learning at scale.
The field of automaticalgorithm design has received increasing attention in recent years. From a multitude of available algorithms, a researcher can effectively design a new one customized to his/her own problem. For...
详细信息
ISBN:
(纸本)9781509060177
The field of automaticalgorithm design has received increasing attention in recent years. From a multitude of available algorithms, a researcher can effectively design a new one customized to his/her own problem. For this, hyper-heuristics techniques have proven to be useful. Their main objective is to search in the space of heuristics rather than in the problem solution space. The present paper proposes a hyper-heuristic for the automatic design of evolutionary algorithms supported by the use of an entropy metric. This metric is used as a trigger mechanism for switching between the algorithms components, aiding the formation of the new hybrid algorithm.
作者:
Luo, GangUniv Utah
Dept Biomed Informat Suite 140421 Wakara Way Salt Lake City UT 84108 USA
Machine learning studies automaticalgorithms that improve themselves through experience. It is widely used for analyzing and extracting value from large biomedical data sets, or "big biomedical data,'' a...
详细信息
Machine learning studies automaticalgorithms that improve themselves through experience. It is widely used for analyzing and extracting value from large biomedical data sets, or "big biomedical data,'' advancing biomedical research, and improving healthcare. Before a machine learning model is trained, the user of a machine learning software tool typically must manually select a machine learning algorithm and set one or more model parameters termed hyper-parameters. The algorithm and hyper-parameter values used can greatly impact the resulting model's performance, but their selection requires special expertise as well as many labor-intensive manual iterations. To make machine learning accessible to layman users with limited computing expertise, computer science researchers have proposed various automaticselection methods for algorithms and/or hyper-parameter values for a given supervised machine learning problem. This paper reviews these methods, identifies several of their limitations in the big biomedical data environment, and provides preliminary thoughts on how to address these limitations. These findings establish a foundation for future research on automatically selecting algorithms and hyper-parameter values for analyzing big biomedical data.
作者:
Luo, GangUniv Utah
Dept Biomed Informat Suite 140421 Wakara Way Salt Lake City UT 84108 USA
Background: Predictive modeling is fundamental to transforming large clinical data sets, or "big clinical data," into actionable knowledge for various healthcare applications. Machine learning is a major pre...
详细信息
Background: Predictive modeling is fundamental to transforming large clinical data sets, or "big clinical data," into actionable knowledge for various healthcare applications. Machine learning is a major predictive modeling approach, but two barriers make its use in healthcare challenging. First, a machine learning tool user must choose an algorithm and assign one or more model parameters called hyper-parameters before model training. The algorithm and hyper-parameter values used typically impact model accuracy by over 40 %, but their selection requires many labor-intensive manual iterations that can be difficult even for computer scientists. Second, many clinical attributes are repeatedly recorded over time, requiring temporal aggregation before predictive modeling can be performed. Many labor-intensive manual iterations are required to identify a good pair of aggregation period and operator for each clinical attribute. Both barriers result in time and human resource bottlenecks, and preclude healthcare administrators and researchers from asking a series of what-if questions when probing opportunities to use predictive models to improve outcomes and reduce costs. Methods: This paper describes our design of and vision for PredicT-ML ( prediction tool using machine learning), a software system that aims to overcome these barriers and automate machine learning model building with big clinical data. Results: The paper presents the detailed design of PredicT-ML. Conclusions: PredicT-ML will open the use of big clinical data to thousands of healthcare administrators and researchers and increase the ability to advance clinical research and improve healthcare.
Background: Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data,"advancing clinical research, and improving healthcare. Machine learning is a powerful...
详细信息
Background: Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data,"advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise. Methods: This paper presents our vision for and design of MLBCD (Machine Learning for Big Clinical Data), a new software system aiming to address these challenges and facilitate building machine learning predictive models using big clinical data. Results: The paper describes MLBCD's design in detail. Conclusions: By making machine learning accessible to healthcare researchers, MLBCD will open the use of big clinical data and increase the ability to foster biomedical discovery and improve care.
Many algorithms are now available for doing the same task (e.g. binarization, page segmentation, character recognition, etc.) in document image analysis (DIA) and choosing a particular algorithm(s) for a particular ta...
详细信息
ISBN:
(纸本)9780769549993
Many algorithms are now available for doing the same task (e.g. binarization, page segmentation, character recognition, etc.) in document image analysis (DIA) and choosing a particular algorithm(s) for a particular task is often a non-trivial problem. This paper proposes a model for automatically selecting the correct algorithm(s) for a given problem. Binarization has been taken a reference to illustrate the proposed approach. Several previously unexplored issues are addressed in this work. For example, only one method may not be good for the binarization of an entire document whereas a particular method may produce desired result for a particular region. Therefore, for a given document image, our model selects a set of one or more binarization techniques suitable for different regions of the document. This selection is completely automatic and guided by the machine learning approaches. Formulation of a completely automatic way for generating the annotated data for training the learning algorithms is also a novel contribution of this work. Evaluation of the approach is done using ICDAR 2003 Robust Reading data set and results highlight the potential of the proposed approach for automaticselection of correct DIA algorithm(s) from a set of several alternatives.
The results of empirical comparisons DE existing learning algorithms illustrate that each algorithm has a selective superiority;each is best for some but not ail tasks. Given a data set, it is often not clear beforeha...
详细信息
The results of empirical comparisons DE existing learning algorithms illustrate that each algorithm has a selective superiority;each is best for some but not ail tasks. Given a data set, it is often not clear beforehand which algorithm will yield the best performance. In this article we present an approach that uses characteristics of the given data set, in the form of feedback from the learning process, to guide a search for a tree-structured hybrid classifier. Heuristic knowledge about the characteristics that indicate one bias is better than another is encoded in the rule base of the Model Class selection (MCS) system. The approach does not assume that the entire instance space is best learned using a single representation language;for some data sets, choosing to form a hybrid classifier is a better bias, and MCS has the ability to determine these cases. The results of an empirical evaluation illustrate that MCS achieves classification accuracies equal to or higher than the best of its primitive learning components for each data set, demonstrating that the heuristic rules effectively select an appropriate learning bias.
Recent developments in very high-level language design indicate that these languages hold great promise for improving the level of man-machine communication, and hence improving computer and programmer utilization. (E...
详细信息
暂无评论