Optimal margin Distribution Machine (ODM) is a newly proposed statistical learning framework rooting in the latest margin theory, which demonstrates better generalization performance than the traditional large margin ...
详细信息
Transactional stream processing engines (TSPEs) have gained increasing attention due to their capability of processing real-time stream applications with transactional semantics. However, TSPEs remain susceptible to s...
详细信息
ISBN:
(数字)9798350317152
ISBN:
(纸本)9798350317169
Transactional stream processing engines (TSPEs) have gained increasing attention due to their capability of processing real-time stream applications with transactional semantics. However, TSPEs remain susceptible to system failures and power outages. Existing TSPEs mainly focus on performance improvement, but still face a significant challenge to guarantee fault tolerance while offering high-performance services. We revisit commonly-used fault tolerance approaches in stream processing and database systems, and find that these approaches do not work well on TSPEs due to complex data dependencies. In this paper, we propose a novel TSPE called MorphStreamR to achieve fast failure recovery while guaranteeing low performance overhead at runtime. The key idea of MorphStreamR is to record intermediate results of resolved dependencies at runtime, and thus eliminate data dependencies to improve task parallelism during failure recovery. MorphStreamR further mitigates the runtime overhead by selectively tracking data dependencies and incorporating workload-aware log commitment. Experimental results show that MorphStreamR can significantly reduce the recovery time by up to 3.1 x while experiencing much less performance slowdown at runtime, compared with other applicable fault tolerance approaches.
Graph convolutional network (GCN) has achieved enormous success in learning structural information from unstructured data. As graphs become increasingly large, distributed training for GCNs is severely prolonged by fr...
详细信息
ISBN:
(数字)9798350383508
ISBN:
(纸本)9798350383515
Graph convolutional network (GCN) has achieved enormous success in learning structural information from unstructured data. As graphs become increasingly large, distributed training for GCNs is severely prolonged by frequent cross-worker communications. Existing efforts to improve the training efficiency often come at the expense of GCN performance, while the communication overhead persists. In this paper, we propose PSC-GCN, a holistic pipelined framework for distributed GCN training with communication-efficient sampling and inclusion-aware caching, to address the communication bottleneck while ensuring satisfactory model performance. Specifically, we devise an asynchronous pre-fetching scheme to retrieve stale statistics (features, embedding, gradient) of boundary nodes in advance, such that the embedding aggregation and model update are pipelined with statistics transmission. To alleviate communication volume and staleness effect, we introduce a variance-reduction based sampling policy, which prioritizes inner nodes over boundary ones for reducing the access frequency to remote neighbors, thus mitigating cross-worker statistics exchange. Complementing graph sampling, a feature caching module is co-designed to buffer hot nodes with high inclusion probability, ensuring that frequently sampled nodes will be available in local memory. Extensive evaluations on real-world datasets show the superiority of PSC-GCN over state-of-the-art methods, where we can reduce training time by 72%-80% without sacrificing model accuracy.
Deep learning has gained tremendous success in various fields while training deep neural networks(DNNs) is very compute-intensive, which results in numerous deep learning frameworks that aim to offer better usability ...
详细信息
Deep learning has gained tremendous success in various fields while training deep neural networks(DNNs) is very compute-intensive, which results in numerous deep learning frameworks that aim to offer better usability and higher performance to deep learning practitioners. Tensor Flow and Py Torch are the two most popular frameworks. Tensor Flow is more promising within the industry context, while Py Torch is more appealing in academia. However, these two frameworks differ much owing to the opposite design philosophy:static vs dynamic computation graph. Tensor Flow is regarded as being more performance-friendly as it has more opportunities to perform optimizations with the full view of the computation graph. However, there are also claims that Py Torch is faster than Tensor Flow sometimes, which confuses the end-users on the choice between them. In this paper, we carry out the analytical and experimental analysis to unravel the mystery of comparison in training speed on single-GPU between Tensor Flow and Py Torch. To ensure that our investigation is as comprehensive as possible, we carefully select seven popular neural networks, which cover computer vision, speech recognition, and natural language processing(NLP). The contributions of this work are two-fold. First, we conduct the detailed benchmarking experiments on Tensor Flow and Py Torch and analyze the reasons for their performance difference. This work provides the guidance for the end-users to choose between these two frameworks. Second, we identify some key factors that affect the performance,which can direct the end-users to write their models more efficiently.
Large Language Models (LLMs) have achieved significant performance in various natural language processing tasks but also pose safety and ethical threats, thus requiring red teaming and alignment processes to bolster t...
Third-party libraries (TPLs) play a crucial role in software development. Utilizing TPL recommender systems can aid software developers in promptly finding useful TPLs. A number of TPL recommendation approaches have b...
详细信息
Third-party libraries (TPLs) play a crucial role in software development. Utilizing TPL recommender systems can aid software developers in promptly finding useful TPLs. A number of TPL recommendation approaches have been proposed and among them graph neural network (GNN)-based recommendation is attracting the most attention. However, GNN-based approaches generate node representations through multiple convolutional aggregations, which is prone to introducing noise, resulting in the over-smoothing issue. In addition, due to the high sparsity of labelled data, node representations may be biased in real-world scenarios. To address these issues, this paper presents a TPL recommendation method named Implicit Supervision-assisted Graph Collaborative Filtering (ISGCF). Specifically, it takes the App-TPL interaction relationships as input and employs a popularity-debiased method to generate denoised App and TPL graphs. This reduces the noise introduced during graph convolution and alleviates the over-smoothing issue. It also employs a novel implicitly-supervised loss function to exploit the labelled data to learn enhanced node representations. Extensive experiments on a large-scale real-world dataset demonstrate that ISGCF achieves a significant performance advantage over other state-of-the-art TPL recommendation methods in Recall, NDCG and MAP. The experiments also validate the superiority of ISGCF in mitigating the over-smoothing problem.
With the evolution of self-supervised learning, the pre-training paradigm has emerged as a predominant solution within the deep learning landscape. Model providers furnish pre-trained encoders designed to function as ...
详细信息
ISBN:
(数字)9798350331301
ISBN:
(纸本)9798350331318
With the evolution of self-supervised learning, the pre-training paradigm has emerged as a predominant solution within the deep learning landscape. Model providers furnish pre-trained encoders designed to function as versatile feature extractors, enabling downstream users to harness the benefits of expansive models with minimal effort through fine-tuning. Nevertheless, recent works have exposed a vulnerability in pre-trained encoders, highlighting their susceptibility to downstream-agnostic adversarial examples (DAEs) meticulously crafted by attackers. The lingering question pertains to the feasibility of fortifying the robustness of downstream models against DAEs, particularly in scenarios where the pre-trained encoders are publicly accessible to the *** this paper, we initially delve into existing defensive mechanisms against adversarial examples within the pre-training paradigm. Our findings reveal that the failure of current defenses stems from the domain shift between pre-training data and downstream tasks, as well as the sensitivity of encoder parameters. In response to these challenges, we propose Genetic Evolution-Nurtured Adversarial Fine-tuning (Gen-AF), a two-stage adversarial fine-tuning approach aimed at enhancing the robustness of downstream models. Gen-AF employs a genetic-directed dual-track adversarial fine-tuning strategy in its first stage to effectively inherit the pre-trained encoder. This involves optimizing the pre-trained encoder and classifier separately while incorporating genetic regularization to preserve the model’s topology. In the second stage, Gen-AF assesses the robust sensitivity of each layer and creates a dictionary, based on which the top-k robust redundant layers are selected with the remaining layers held fixed. Upon this foundation, we conduct evolutionary adaptability fine-tuning to further enhance the model’s generalizability. Our extensive experiments, conducted across ten self-supervised training methods and six d
Non-Volatile Main Memories (NVMMs) have recently emerged as a promising technology for future memory systems. Generally, NVMMs have many desirable properties such as high density, byte-addressability, non-volatility, ...
详细信息
Non-Volatile Main Memories (NVMMs) have recently emerged as a promising technology for future memory systems. Generally, NVMMs have many desirable properties such as high density, byte-addressability, non-volatility, low cost, and energy efficiency, at the expense of high write latency, high write power consumption, and limited write endurance. NVMMs have become a competitive alternative of Dynamic Random Access Memory (DRAM), and will fundamentally change the landscape of memory systems. They bring many research opportunities as well as challenges on system architectural designs, memory management in operating systems (OSes), and programming models for hybrid memory systems. In this article, we revisit the landscape of emerging NVMM technologies, and then survey the state-of-the-art studies of NVMM technologies. We classify those studies with a taxonomy according to different dimensions such as memory architectures, data persistence, performance improvement, energy saving, and wear leveling. Second, to demonstrate the best practices in building NVMM systems, we introduce our recent work of hybrid memory system designs from the dimensions of architectures, systems, and applications. At last, we present our vision of future research directions of NVMMs and shed some light on design challenges and opportunities.
Adversarial examples for deep neural networks (DNNs) are transferable: examples that successfully fool one white-box surrogate model can also deceive other black-box models with different architectures. Although a bun...
详细信息
ISBN:
(数字)9798350331301
ISBN:
(纸本)9798350331318
Adversarial examples for deep neural networks (DNNs) are transferable: examples that successfully fool one white-box surrogate model can also deceive other black-box models with different architectures. Although a bunch of empirical studies have provided guidance on generating highly transferable adversarial examples, many of these findings fail to be well explained and even lead to confusing or inconsistent advice for practical *** this paper, we take a further step towards understanding adversarial transferability, with a particular focus on surrogate aspects. Starting from the intriguing "little robustness" phenomenon, where models adversarially trained with mildly perturbed adversarial samples can serve as better surrogates for transfer attacks, we attribute it to a trade-off between two dominant factors: model smoothness and gradient similarity. Our research focuses on their joint effects on transferability, rather than demonstrating the separate relationships alone. Through a combination of theoretical and empirical analyses, we hypothesize that the data distribution shift induced by off-manifold samples in adversarial training is the reason that impairs gradient *** on these insights, we further explore the impacts of prevalent data augmentation and gradient regularization on transferability and analyze how the trade-off manifests in various training methods, thus building a comprehensive blueprint for the regulation mechanisms behind transferability. Finally, we provide a general route for constructing superior surrogates to boost transferability, which optimizes both model smoothness and gradient similarity simultaneously, e.g., the combination of input gradient regularization and sharpness-aware minimization (SAM), validated by extensive experiments. In summary, we call for attention to the united impacts of these two factors for launching effective transfer attacks, rather than optimizing one while ignoring the other, and emphasize the
The challenge of interpretability remains a significant barrier to adopting deep neural networks in healthcare domains. Although tree regularization aims to align a deep neural network’s decisions with a single axis-...
详细信息
ISBN:
(数字)9798350386226
ISBN:
(纸本)9798350386233
The challenge of interpretability remains a significant barrier to adopting deep neural networks in healthcare domains. Although tree regularization aims to align a deep neural network’s decisions with a single axis-aligned decision tree, however, relying on one tree for all inputs often leads to sub-optimal performance and interoperability. To address this limitation, we propose an enhanced tree regularization method that integrates a post-hoc visual explainable model such as Grad-CAM. This approach guides the deep model to be well-approximated by decision trees tailored to the salient regions identified by Grad-CAM in the input space. We rigorously validate the effectiveness of this framework on two cancer cell datasets: CNMC, which focuses on acute lymphoblastic leukemia cells, and ISBI2016, which comprises benign and malignant skin lesions. The results demonstrate that the proposed method delivers simpler and more interpretable explanations without compromising accuracy, thereby advancing the interpretability of deep learning models in critical healthcare applications.
暂无评论