Background: To identify differentially expressed genes (DEGs) from microarray data, users of the Affymetrix GeneChip system need to select both a preprocessing algorithm to obtain expression-level measurements and a w...
详细信息
Background: To identify differentially expressed genes (DEGs) from microarray data, users of the Affymetrix GeneChip system need to select both a preprocessing algorithm to obtain expression-level measurements and a way of ranking genes to obtain the most plausible candidates. We recently recommended suitable combinations of a preprocessing algorithm and gene ranking method that can be used to identify DEGs with a higher level of sensitivity and specificity. However, in addition to these recommendations, researchers also want to know which combinations enhance reproducibility. Results: We compared eight conventional methods for ranking genes: weighted average difference (WAD), average difference (AD), fold change (FC), rank products (RP), moderated t statistic (modT), significance analysis of microarrays (samT), shrinkage t statistic (shrinkT), and intensity-based moderated t statistic (ibmT) with six preprocessing algorithms (PLIER, VSN, FARMS, multi-mgMOS (mmgMOS), MBEI, and GCRMA). A total of 36 real experimental datasets was evaluated on the basis of the area under the receiver operating characteristic curve (AUC) as a measure for both sensitivity and specificity. We found that the RP method performed well for VSN-, FARMS-, MBEI-, and GCRMA- preprocessed data, and the WAD method performed well for mmgMOS-preprocessed data. Our analysis of the MicroArray Quality Control (MAQC) project's datasets showed that the FC-based gene ranking methods (WAD, AD, FC, and RP) had a higher level of reproducibility: The percentages of overlapping genes (POGs) across different sites for the FC-based methods were higher overall than those for the t-statistic-based methods (modT, samT, shrinkT, and ibmT). In particular, POG values for WAD were the highest overall among the FC-based methods irrespective of the choice of preprocessing algorithm. Conclusion: Our results demonstrate that to increase sensitivity, specificity, and reproducibility in microarray analyses, we need to select
In underwater imaging, Planar Synthetic Aperture Sonar (P-SAS) technique has been validated on both simulated and tank data. It showed a significant improvement of 3D representation of bottom and sub-bottom. In this p...
详细信息
ISBN:
(纸本)9781424443451
In underwater imaging, Planar Synthetic Aperture Sonar (P-SAS) technique has been validated on both simulated and tank data. It showed a significant improvement of 3D representation of bottom and sub-bottom. In this paper we present its application to real data acquired during sea experiments on a dump site in the Baltic sea. Data were acquired during a European project "SITAR" (Seafloor Imaging and Toxicity: Assessment of Risks caused by buried waste). The transmitter was a Topographic Parametric Sonar (TOPAS) fixed on a ROV which position was monitored. Two central frequencies were explored, 10 kHz and 20 kHz. As P-SAS algorithm was designed for data obtained on a regular planar grid important modification were required to handle real sea data and "realistic" navigation conditions (irregular grid): a "rearrangement" algorithm was designed for preprocessing actual data and correct trajectory disturbances (in 2D). This algorithm is the re-projection of data positions to a new (virtual) regularly grid. The algorithm was validated on tank experimental data prior to application to sea data. P-SAS processed data will be presented. A strata representation technique was used for analyzing the seafloor and the sub-bottom and for locating buried objects on the scanned dump site.
Nowadays assuring that search and recommendation systems are fair and do not apply discrimination among any kind of population has become of paramount importance. Those systems typically rely on machine learning algor...
详细信息
ISBN:
(数字)9783031093166
ISBN:
(纸本)9783031093166;9783031093159
Nowadays assuring that search and recommendation systems are fair and do not apply discrimination among any kind of population has become of paramount importance. Those systems typically rely on machine learning algorithms that solve the classification task. Although the problem of fairness has been widely addressed in binary classification, unfortunately, the fairness of multi-class classification problem needs to be further investigated lacking well-established solutions. For the aforementioned reasons, in this paper, we present the Debiaser for Multiple Variables, a novel approach able to enhance fairness in both binary and multi-class classification problems. The proposed method is compared, under several conditions, with the well-established baseline. We evaluate our method on a heterogeneous data set and prove how it overcomes the established algorithms in the multi-classification setting, while maintaining good performances in binary classification. Finally, we present some limitations and future improvements.
Currently, one of the biggest challenges of Machine Learning (ML) is to develop fairer models that do not propagate prejudices, stereotypes, social inequalities, and other types of discrimination in their decisions. B...
详细信息
ISBN:
(数字)9783031636165
ISBN:
(纸本)9783031636158;9783031636165
Currently, one of the biggest challenges of Machine Learning (ML) is to develop fairer models that do not propagate prejudices, stereotypes, social inequalities, and other types of discrimination in their decisions. Before ML faced the problem of unfair decision-making, the field of educational testing developed several mathematical tools to decrease bias in selections made by tests. Thus, the Item Response Theory is one of these main tools, and its great power of evaluation helps make fairer selections. Therefore, in this paper, we use the concepts of Item Response Theory to propose a novel sample reweighting method named IRT-SR. The IRT-SR method aims to assign weights to the most important instances to minimize discriminatory effects in binary classification tasks. According to our results, IRT-SR guides classification algorithms to fit fairer models, improving the main group fairness notions such as demographic parity, equal opportunity, and equalized odds without significant performance loss.
Nowadays assuring that search and recommendation systems are fair and do not apply dis-crimination among any kind of population has become of paramount importance. This is also highlighted by some of the sustainable d...
详细信息
Nowadays assuring that search and recommendation systems are fair and do not apply dis-crimination among any kind of population has become of paramount importance. This is also highlighted by some of the sustainable development goals proposed by the United Nations. Those systems typically rely on machine learning algorithms that solve the classification task. Although the problem of fairness has been widely addressed in binary classification, unfortunately, the fairness of multi-class classification problem needs to be further investigated lacking well-established solutions. For the aforementioned reasons, in this paper, we present the Debiaser for Multiple Variables (DEMV), an approach able to mitigate unbalanced groups bias (i.e., bias caused by an unequal distribution of instances in the population) in both binary and multi-class classification problems with multiple sensitive variables. The proposed method is compared, under several conditions, with a set of well-established baselines using different categories of classifiers. At first we conduct a specific study to understand which is the best generation strategies and their impact on DEMV's ability to improve fairness. Then, we evaluate our method on a heterogeneous set of datasets and we show how it overcomes the established algorithms of the literature in the multi-class classification setting and in the binary classification setting when more than two sensitive variables are involved. Finally, based on the conducted experiments, we discuss strengths and weaknesses of our method and of the other baselines.
Background: Affymetrix GeneChip microarrays are popular platforms for expression profiling in two types of studies: detection of differential expression computed by p-values of t-test and estimation of fold change bet...
详细信息
Background: Affymetrix GeneChip microarrays are popular platforms for expression profiling in two types of studies: detection of differential expression computed by p-values of t-test and estimation of fold change between analyzed groups. There are many different preprocessing algorithms for summarizing Affymetrix data. The main goal of these methods is to remove effects of non-specific hybridization, and to optimally combine information from multiple probes annotated to the same transcript. The methods are benchmarked by comparison with reference methods, such as quantitative reverse-transcription PCR (qRT-PCR). Results: We present a comprehensive analysis of agreement between Affymetrix GeneChip and qRT-PCR results. We analyzed the influence of filtering by fraction Present calls introduced by J. N. McClintick and H. J. Edenberg (2006) and 2 mapping procedures: updated probe sets definitions proposed by Dai et al. (2005) and our "naive mapping" method. Because of evolution of genome sequence annotations since the time when microarrays were designed, we also studied the effect of the annotation release date. These comparisons were prepared for 6 popular preprocessing algorithms (MAS5, PLIER, RMA, GC-RMA, MBEI, and MBEImm) in the 2 above-mentioned types of studies. We used data sets from 6 independent biological experiments. As a measure of reproducibility of microarray and qRT-PCR values, we used linear and rank correlation coefficients. Conclusions: We show that filtering by fraction Present calls increased correlations for all 6 preprocessing algorithms. We observed the difference in performance of PM-MM and PM-only methods: using MM probes increased correlations in fold change studies, but PM-only methods proved to perform better in detection of differential expression. We recommend using GC-RMA for detection of differential expression and PLIER for estimation of fold change. The use of the more recent annotation improves the results in both types of studies, en
As the Internet advances and with the great development of E-Commerce,companies involving in online shopping and services are eager for ways to find out what their customers *** now with the help of data mining,they c...
详细信息
As the Internet advances and with the great development of E-Commerce,companies involving in online shopping and services are eager for ways to find out what their customers *** now with the help of data mining,they can have a much more active role in promoting their online shops, merchandise and services compares to the conventional method of waiting for customer to look at their *** paper discusses about the new development of E-Commerce that differs a lot from the old conventional *** also goes in depth of a new architecture of data mining system based on E-commerce web *** conventional"single data-mining system"system is replaced by the new"distributed data mining system".The new distributed data mining system is a system that provides mining *** elaboration of the system,the E-Commerce Server request services in the network,Data Mining Server provides concurrent mining services for serial E-Commerce Servers (Serial E-Commerce Companies).Data Mining Server will summarize all the data in different E-Commerce Server for mining *** will produce more comprehensive and informative knowledge than conventional methods.A new effective preprocessing algorithm of distributed data mining for E-Commerce is provided in this *** to the architecture given in this paper,a distributed data-mining algorithm for finding association rules is provided.
暂无评论