In this paper, we consider the k-center problem with outliers (the (k, z)-center problem) in the context of Massively Parallel Computation (MPC). Existing MPC algorithms for the (k, z)-center problem typically require...
详细信息
The k-means with outliers problem is one of the most extensively studied clustering problems in the field of machine learning, where the goal is to discard up to z outliers and identify a minimum k-means clustering on...
The k-means with outliers problem is one of the most extensively studied clustering problems in the field of machine learning, where the goal is to discard up to z outliers and identify a minimum k-means clustering on the remaining data points. Most previous results for this problem have running time dependent on the aspect ratio Δ(the ratio between the maximum and the minimum pairwise distances) to achieve fast approximations. To address the issue of aspect ratio dependency on the running time, we propose sampling-based algorithms with almost linear running time in the data size, where a crucial component of our approach is an algorithm called Fast-Sampling. Fast-Sampling algorithm can find inliers that well approximate the optimal clustering centers without relying on a guess for the optimal clustering costs, where a 4-approximate solution can be obtained in time $O(\frac{ndk\log\log n}{\epsilon^2})$ with O(k/ε) centers opened and (1 + ε)z outliers discarded. To reduce the number of centers opened, we propose a center reduction algorithm, where an O(1/ε)-approximate solution can be obtained in time $O(\frac{ndk\log \log n}{\epsilon^2} + dpoly(k, \frac{1}{\epsilon})\log(n\Delta))$ with (1 + ε)z outliers discarded and exactly k centers opened. Empirical experiments suggest that our proposed sampling-based algorithms outperform state-of-the-art algorithms for the k-means with outliers problem.
The hitherto unknown specific etiology of Alzheimer’s disease (AD) poses a challenge for its prevention, diagnosis and treatment. Although genome-wide association studies (GWAS) are currently making rapid progress in...
详细信息
ISBN:
(数字)9781665468190
ISBN:
(纸本)9781665468206
The hitherto unknown specific etiology of Alzheimer’s disease (AD) poses a challenge for its prevention, diagnosis and treatment. Although genome-wide association studies (GWAS) are currently making rapid progress in identifying genetic variants associated with AD, the pathogenic mechanisms of the genetic loci identified are largely unknown. Transcriptome-wide association studies (TWAS) are an important class of methods for predicting disease genes. TWAS can explore the association of genes with the disease in relevant tissues by integrating genome-wide genetic regulatory data from specific tissues and disease-associated GWAS summary statistics. We found that TWAS analysis using different GWAS summary statistics may produce inconsistent results. To address this issue, we used ensemble summary statistics for AD-associated gene prediction considering the complementary nature of different datasets and the comparative nature between the results generated from different datasets. The prediction results were compared and analyzed to identify AD associated genes. The predicted genes were validated. In case study of an individual genes, we identified a potential association between AZGP1 and AD disease by this method.
Fair clustering problems have been paid lots of attention recently. In this paper, we study the k-Center problem under the group fairness and data summarization fairness constraints, denoted as Group Fair k-Center (GF...
详细信息
ISBN:
(纸本)9798400704864
Fair clustering problems have been paid lots of attention recently. In this paper, we study the k-Center problem under the group fairness and data summarization fairness constraints, denoted as Group Fair k-Center (GFkC) and Data Summarization Fair k-Center (DSFkC), respectively, in the massively parallel computational (MPC) distributed model. The previous best results for the above two problems in the MPC model are a 9-approximation with violation 7 (WWW 2022) and a (17+ε)-approximation without fairness violation (ICML 2020), respectively. In this paper, we obtain a (3+ε)-approximation with violation 1 for the GFkC problem in the MPC model, which is almost as accurate as the best known approximation ratio 3 with violation 1 for the sequential algorithm of the GFkC problem. Moreover, for the DSFkC problem in the MPC model, we obtain a (4+ε)-approximation without fairness violation, which is very close to the best known approximation ratio 3 for the sequential algorithm of the DSFkC problem. Empirical experiments show that our distributed algorithms perform better than existing state-of-the-art distributed methods for the above two problems.
Carbonized polymer dots (CPDs) have drawn a lot of attention in the past ten years because of their excellent selectivity and sensitivity. In this study, tungsten was doped into CPDs (W-CPDs) using a simple hydrotherm...
详细信息
Carbonized polymer dots (CPDs) have drawn a lot of attention in the past ten years because of their excellent selectivity and sensitivity. In this study, tungsten was doped into CPDs (W-CPDs) using a simple hydrotherm...
详细信息
Recent methods of patient clinical outcome prediction focus on embedding the temporal time-series data by sequential data encoders without considering the dependency between the different variables and the static demo...
详细信息
ISBN:
(数字)9781665468190
ISBN:
(纸本)9781665468206
Recent methods of patient clinical outcome prediction focus on embedding the temporal time-series data by sequential data encoders without considering the dependency between the different variables and the static demographics data. To solve this problem and achieve better patient outcome prediction, we propose an attention-based memory fusion (AMF) network with Gated Recurrent Unit (GRU) (called GRU-AMFN) to model the dependency between the different time-series and static demographic data and extract effective personalized representation about the patient’s clinical health status. We evaluate our proposed GRU-AMFN method on eICU, a publicly available dataset, to validate its effectiveness for the in-hospital mortality prediction task. Experimental results demonstrate that our proposed method outperforms several state-of-the-art models for the in-hospital mortality prediction task. Ablation studies show the effectiveness of the proposed attention-based memory fusion module and the adaptive fusion module. Besides, our proposed method finds several static demographic and time-series features that are important for mortality prediction.
Resting-state functional magnetic resonance imaging (rs-fMRI) images have been widely used for diagnosis of schizophrenia. With rs-fMRI, most existing schizophrenia diagnostic methods have revealed schizophrenia’s fu...
详细信息
ISBN:
(数字)9781665468190
ISBN:
(纸本)9781665468206
Resting-state functional magnetic resonance imaging (rs-fMRI) images have been widely used for diagnosis of schizophrenia. With rs-fMRI, most existing schizophrenia diagnostic methods have revealed schizophrenia’s functional abnormalities from the following three scales, i.e., regional neural activity alterations, functional connectivity abnormalities and brain network dysfunctions. However, many schizophrenia diagnosis methods do not consider the fusion of features from the three scales. In this study, we propose a schizophrenia diagnostic method based on multi-scale feature representation and ensemble learning. Firstly, features including the three scales (region, connectivity and network) are extracted from rs-fMRI images using the brainnetome atlas. For each scale, feature selection, i.e., least absolute shrinkage and selection operator, is applied to identify effective sub-features related to schizophrenia classification by a grid search. Then the selected sub-features of each scale are input to support vector machine with linear kernel to classify schizophrenia patients and healthy controls respectively. To further improve the schizophrenia diagnostic performance, an ensemble learning framework named E-RCN is proposed to average the probabilities obtained by the classifiers of each scale in decision level. By leave-one-out cross-validation on the center for biomedical research excellence dataset (COBRE), our proposed method achieves encouraging diagnosis performance, outperforming several state-of-the-art methods. In addition, ranked by the occurence frequency of each brain region within the leave-one-out cross-validation experiments, some brain regions related to schizophrenia, i.e., thalamus and middle temporal gyrus, and important elaborate subregions, i.e., Tha_L_8_8, MTG_L_4_4 and MTG_R_4_4, are found.
Accurately detecting Alzheimer’s disease (AD) and predicting mini-mental state examination (MMSE) score are important tasks in elderly health by magnetic resonance imaging (MRI). Most of the previous methods on these...
详细信息
Medical image segmentation is a fundamental step for diagnosis and prognosis. This study proposes a new body and edge aware network for automated 2D medical image segmentation (called BEA-SegNet). The proposed BEA-Seg...
详细信息
ISBN:
(纸本)9781665429825
Medical image segmentation is a fundamental step for diagnosis and prognosis. This study proposes a new body and edge aware network for automated 2D medical image segmentation (called BEA-SegNet). The proposed BEA-SegNet consists of a shared encoder, a body and edge decouple (BEdecouple) module, two parallel decoders for body and edge segmentation. In the encoder and decoders, short-term multi-scale concatenation (STMSC) modules are utilized to implement multi-scale representation. We design a BEdecouple module to decouple the convolutional features into the body and edge features, making the proposed method be body and edge aware. The body and edge decoders utilize Bedecouple modules in each level to learn more effective features for the body and edge segmentation respectively, and their outputs are fused to generate the final segmentation. Besides, the body and edge supervision are applied to improve the final segmentation. The proposed BEA-SegNet is trained and evaluated on the International Skin Imaging Collaboration challenge 2018 dataset (ISIC2018). Experimental results show that the proposed BEA-SegNet achieves an average Dice similarity coefficient of 90.3% and an average Hausdorff distance of 15.9 for the skin lesion segmentation task and outperforms five benchmarks for skin lesion segmentation.
暂无评论