Algae can produce odor substances and toxins that make the smell and taste of water unpleasant and impair the quality of human and aquatic life. Appropriate countermeasures can be implemented in advance in water pu-ri...
详细信息
Algae can produce odor substances and toxins that make the smell and taste of water unpleasant and impair the quality of human and aquatic life. Appropriate countermeasures can be implemented in advance in water pu-rification processes to prevent algal disorders if the occurrence of algal blooms is accurately predicted. Several models have been developed to predict algal blooms. However, a comprehensive model that can be universally applied under various conditions is lacking. In this study, automatic relevance determination, a sparse modeling algorithm, and support vector machine were combined to construct prediction models for algal blooms in four Japanese dam reservoirs to predict their occurrence over 7 days. Automatic relevance determination was applied to a dataset consisting of monthly water quality data and daily hydraulic and meteorological data to identify variables relevant to the concentrations of Microcystis spp. and Dolichospermum spp., which are bloom-forming cyanobacteria and are dominant in freshwater ecosystems. A dataset of selected variables was used to train and validate the support vector machine models. The results of variable selection by automatic relevance determination revealed that the average concentration of total nitrogen in the past year and the average maximum temperature in the past 7 days may have an association with the algal concentration. Support vector machine models resulted in 92.3% accuracy and 86.4% precision for Microcystis spp. and 71.4% accuracy and 77.5% precision for Dolichospermum spp. on average in binary classification. The competitive relationship be-tween Microcystis spp. and Dolichospermum spp., which differs according to the nutrient level and temperature, probably affects the prediction performance of the models. Our study suggests that the combination of sparse modeling and machine learning is applicable to the construction of a prediction model for site-specific algal bloom events in dam reservoirs.
The planning, design and construction of buildings are all influenced by climate. Building climate zoning is significant for the formulation of building energy efficiency strategies in various regions. However, there ...
详细信息
The planning, design and construction of buildings are all influenced by climate. Building climate zoning is significant for the formulation of building energy efficiency strategies in various regions. However, there is overlapping among the variables used for building climate zoning (Temp(1) and Temp(7) in zones I and VI, and Temp(1) in zones II and VII) in the existing building climate zoning system in China. In addition, the climate conditions at some stations were not included in either zones due to the thresholds were low for some variables. In this study, based on data acquired by 701 national surface meteorological stations between 1984 and 2013 in China, a supervised classification algorithm was developed for building climate zoning and the Mahalanobis distance was used to assess the climate similarities between two regions. Three main variables from 172 stations were selected as the training samples for the supervised classification algorithm and the accuracy of the training results was 93.6%. The results obtained for 76 stations (10.8% of the total) using the interval judgment method could be considered controversial but only five stations (0.7% of the total) with the supervised classification algorithm. Compared with the interval judgment method, the supervised classification algorithm is advantageous and the classification results are suitable for indicating the climatic characteristics of different zones.
The probability measure of regularities recognition in an information stream is introduced in the paper. The measure allows for the creation of machine-learning models without a supervisor. The experiment described in...
详细信息
ISBN:
(纸本)9783030375997;9783030375980
The probability measure of regularities recognition in an information stream is introduced in the paper. The measure allows for the creation of machine-learning models without a supervisor. The experiment described in the paper empirically proves that the measure allows the recognition of regularities and helps to find out regular relations between the values of variables. The machine learning model finds out regular relations in data set and by the model allow reconstructing unknown values of the classification variable. The classificationalgorithm on the basis of the probability measure of regularities recognition is described in the paper. The measure connection with entropy is demonstrated and mutual information is used to optimise the algorithm's performance. The accuracy of the algorithm matches the accuracy of well-known supervised machine learning algorithms and also exceeds them.
With the advances of GIS (Geographical Information System), GPS (Global Positioning System) and remote sensing, spatial data has become increasingly available. A significant amount of such data is related to point loc...
详细信息
With the advances of GIS (Geographical Information System), GPS (Global Positioning System) and remote sensing, spatial data has become increasingly available. A significant amount of such data is related to point localities, such as locations of landslides, species occurrences, disease cases, and transportation accidents. There is a great need to predict the potential distribution of these geographical events given their localities and influencing features. In this study, we present a framework that can integrate a range of classificationalgorithms to predict the geographical distribution of a specific event. The proposed framework is unique in its implementation of a number of procedures that support a variety of geographical data types such as presence-only data, two-class data, and multi-class data. The framework is developed in C++ and based on object-oriented polymorphism, which enables us to add new classifiers to the framework by implementing a number of predefined interfaces. (C) 2012 Elsevier Ltd. All rights reserved.
In this paper we present an original framework to extract representative groups from a dataset, and we validate it over a novel case study. The framework specifies the application of different clustering algorithms, t...
详细信息
ISBN:
(纸本)9781424481262
In this paper we present an original framework to extract representative groups from a dataset, and we validate it over a novel case study. The framework specifies the application of different clustering algorithms, then several statistical and visualisation techniques are used to characterise the results, and core classes are defined by consensus clustering. Classes may be verified using supervised classification algorithms to obtain a set of rules which may be useful for new data points in the future. This framework is validated over a novel set of histone markers for breast cancer patients. From a technical perspective, the resultant classes are well separated and characterised by low, medium and high levels of biological markers. Clinically, the groups appear to distinguish patients with poor overall survival from those with low grading score and better survival. Overall, this framework offers a promising methodology for elucidating core consensus groups from data.
Heart disease remains primary cause of premature mortality in the western world. To date, there are no cures-just heuristics for reducing the risk(s) of contracting the disease through diet and exercise. In this study...
详细信息
ISBN:
(纸本)1932415831
Heart disease remains primary cause of premature mortality in the western world. To date, there are no cures-just heuristics for reducing the risk(s) of contracting the disease through diet and exercise. In this study, we examine two typical heart disease datasets that provide quantitative information regarding possible risk factors associated with heart disease. We wish to extract rules that relate these risk factors to the occurrence of heart disease in a form that is easily interpreted by both medical professionals and laymen alike. Our approach based on rough sets and the notion of approximate reducts, provides a rigorous and accurate classification scheme, which produces results that rival or excel more conventional methods. During the process of classification, rough sets removes redundant information and generates rules that can be used to describe the relationship(s) between the attributes and the final decision.
The forest cover classification is extremely important for land use planning and management. In this framework, the application of pixel based classifications of middle resolution images is well assessed while the use...
详细信息
ISBN:
(纸本)0780387422
The forest cover classification is extremely important for land use planning and management. In this framework, the application of pixel based classifications of middle resolution images is well assessed while the usefulness of segmentation processes and object classification is still improving. In this paper, a method based on tree-structured Markov random field (TS-MRF) is applied to Landsat TM images in order to assess the capability of the TS-MRF segmentation algorithm for discriminating forest-non forest covers in a test area located in the Eastern Italian Alps of Trentino. In particular, the regions of interest are selected from the image using a two step process based on a segmentation algorithm and an analysis process. The segmentation is achieved applying a MRF a-prior model, which takes into account the spatial dependencies in the image, and the TS-MRF optimisation algorithm which segments recursively the image in smaller regions using a binary tree structure. The analysis process links to each object identified by the segmentation a set of features related to the geometry (like shape, smoothness, etc.), to the spectral signature and to the neighbour regions (contextual features). These features were used in this study for classifying each object as forest or non-forest thought a simple supervised classification algorithm based on a thresholds built on the feature values obtained from a set of training objects. This method already allowed the detection of the Forest area within the study area with an accuracy of 90%, while better performances could be achieved using more sophisticated classificationalgorithm, like Neural Networks and Support Vector Machine.
暂无评论