Smart contracts are programs that permanently store and automatically execute on the blockchain system such as Ethereum. Due to the non-tamperable nature of the underlying blockchain, smart contracts are difficult to ...
详细信息
Smart contracts are programs that permanently store and automatically execute on the blockchain system such as Ethereum. Due to the non-tamperable nature of the underlying blockchain, smart contracts are difficult to update once deployed, which requires redeploying the contracts and migrating the data. It means that the observation of smart contract evolution in the real world makes more sense. Hence, in this paper, we conducted the first large-scale empirical study to characterize the evolution of smart contracts in Ethereum. For evolution identification, we presented a contract similarity-based search algorithm, digEvolution, and evaluated its effectiveness with five different search strategies. Then we applied this algorithm to 80,152 on-chain contracts we collected from Ethereum, to dig out the evolution among these contracts. We then explored three research questions. We first studied whether the evolution of smart contracts is common (RQ1), then we studied how do the Gas consumption (RQ2) and the vulnerability (RQ3) of smart contracts vary during the evolution. Our research results show that the evolution of smart contracts is not very common. There are some contract components that have vulnerability but still be called by users. The Gas consumption of most smart contracts doesn’t vary during the evolution, contract is Gas-efficient before and after the evolution. The vulnerability of most smart contracts doesn’t vary during the evolution, both are secure before and after the evolution.
Aligning the multi-modal content and ID embeddings is crucial in multi-modal recommendation systems. Existing solutions typically adopt a bidirectional alignment paradigm. Our prior work, FETTLE, challenges this parad...
详细信息
Aligning the multi-modal content and ID embeddings is crucial in multi-modal recommendation systems. Existing solutions typically adopt a bidirectional alignment paradigm. Our prior work, FETTLE, challenges this paradigm by proposing a one-way directional alignment at the item level, thus reducing the negative impact of low-quality modalities. However, FETTLE leaves two open questions: (1) when is one-way directional alignment optimal, and (2) how to incorporate collaborative signals to enhance alignment? We present CROSS (feedbaCk-oRiented multi-mOdal alignment in recommendation SyStem), a plug-and-play framework that extends FETTLE by introducing three major advancements. First, we introduce Dynamic Item-Level Alignment, which dynamically calibrates the ’strength’ of each modality via a variance-based compensation mechanism, mitigating the risk of overshadowing weaker modalities in the early stages of training. Second, we develop Multi-grained Collaborative Alignment, which introduces a medium-granularity alignment strategy based on neighboring items that share similar user feedback profiles. This neighbor-level alignment effectively balances noisy user interactions and excessive smoothing across items. Third, we conduct extensive experiments on more real-world datasets and show that CROSS significantly boosts the performance of both collaborative filtering (CF) models and multi-modal recommendation (MRS) approaches, achieving 21.52%–70.78% average improvement on CF backbones and 8.70%–20.73% on MRS backbones. Compared with FETTLE, CROSS achieves additional improvements of 3.82%–5.24%.
Session-based recommendation (SBR) methods often rely on user behavior data, which can struggle with the sparsity of session data, limiting performance. Researchers have identified that beyond behavioral signals, rich...
详细信息
Session-based recommendation (SBR) methods often rely on user behavior data, which can struggle with the sparsity of session data, limiting performance. Researchers have identified that beyond behavioral signals, rich semantic information in item descriptions is crucial for capturing hidden user intent. While large language models (LLMs) offer new ways to leverage this semantic data, the challenges of session anonymity, short-sequence nature, and high LLM training costs have hindered the development of a lightweight, efficient LLM framework for *** address the above challenges, we propose an LLM-enhanced SBR framework that integrates semantic and behavioral signals from multiple views. This two-stage framework leverages the strengths of both LLMs and traditional SBR models while minimizing training costs. In the first stage, we use multi-view prompts to infer latent user intentions at the session semantic level, supported by an intent localization module to alleviate LLM hallucinations. In the second stage, we align and unify these semantic inferences with behavioral representations, effectively merging insights from both large and small models. Extensive experiments on two real datasets demonstrate that the LLM4SBR framework can effectively improve model performance. We release our codes along with the baselines at https://***/tsinghua-fib-lab/LLM4SBR.
Deep forest is a non-differentiable deep model that has achieved impressive empirical success across a wide variety of applications, especially on categorical/symbolic or mixed modeling tasks. Many of the application ...
详细信息
Deep forest is a non-differentiable deep model that has achieved impressive empirical success across a wide variety of applications, especially on categorical/symbolic or mixed modeling tasks. Many of the application fields prefer explainable models, such as random forests with feature contributions that can provide a local explanation for each prediction, and Mean Decrease Impurity (MDI) that can provide global feature importance. However, deep forest, as a cascade of random forests, possesses interpretability only at the first layer. From the second layer on, many of the tree splits occur on the new features generated by the previous layer, which makes existing explaining tools for random forests inapplicable. To disclose the impact of the original features in the deep layers, we design a calculation method with an estimation step followed by a calibration step for each layer, and propose our feature contribution and MDI feature importance calculation tools for deep forest. Experimental results on both simulated data and real-world data verify the effectiveness of our methods.
Within the realm of softwareengineering, specialized tasks on code, such as program repair, present unique challenges, necessitating fine-tuning Large language models (LLMs) to unlock state-of-the-art performance. Fi...
详细信息
Within the realm of softwareengineering, specialized tasks on code, such as program repair, present unique challenges, necessitating fine-tuning Large language models (LLMs) to unlock state-of-the-art performance. Fine-tuning approaches proposed in the literature for LLMs on program repair tasks generally overlook the need to reason about the logic behind code changes, beyond syntactic patterns in the data. High-performing fine-tuning experiments also usually come at very high computational costs. With MORepair, we propose a novel perspective on the learning focus of LLM fine-tuning for program repair: we not only adapt the LLM parameters to the syntactic nuances of the task of code transformation (objective ➊), but we also specifically fine-tune the LLM with respect to the logical reason behind the code change in the training data (objective ➋). Such a multi-objective fine-tuning will instruct LLMs to generate high-quality *** apply MORepair to fine-tune four open-source LLMs with different sizes and architectures. Experimental results on function-level and repository-level repair benchmarks show that the implemented fine-tuning effectively boosts LLM repair performance by 11.4% to 56.0%. We further show that our fine-tuning strategy yields superior performance compared to the state-of-the-art approaches, including standard fine-tuning, Fine-tune-CoT, and RepairLLaMA.
Solid State Drives (SSDs) based on the NVMe Zoned Namespaces (ZNS) interface can notably reduce the costs of address mapping, garbage collection, and over-provisioning by dividing the storage space into multiple zones...
详细信息
Solid State Drives (SSDs) based on the NVMe Zoned Namespaces (ZNS) interface can notably reduce the costs of address mapping, garbage collection, and over-provisioning by dividing the storage space into multiple zones for sequential writes and random reads. The Log-Structured Merge (LSM) tree, which is extensively used in key-value storage systems, converts random writes to sequential writes, hence a suitable scenario to utilize ZNS SSDs. However, LSM tree associated data significantly varies in lifetime due to the levels and merging mechanisms of the LSM tree. Therefore, without an accurate method to estimate data lifetime, data with disparate lifetimes may be placed in the same zone, thus causing low space utilization and high write amplification within the *** address these issues, the paper proposes two data overlapping aware optimizations to realize intelligent data placement: a zone allocation scheme and a garbage collection scheme. The key technique of these optimizations is an accurate data-lifetime estimation by considering both the associated tree level of the data and the data overlapping ratio between the data and those in the neighboring level. Using the estimation technique, the zone allocation optimization can place data with similar lifetimes in the same zone. Besides, the garbage collection optimization can reclaim zones in an adaptive manner based on overlapping ratios to reduce the amount of data migration. Experimental results demonstrate that the optimization schemes effectively reduce garbage collection-incurred data copy by average factors of 2.11 × and 1.50 × in comparison to a conventional work and a state-of-the-art work, respectively. Consequently, the proposed work successfully alleviates the write amplification effect by 18% and 6%, compared to the conventional work and the state-of-the-art work, respectively.
Distributed Collaborative Machine Learning (DCML) has emerged in artificial intelligence-empowered edge computing environments, such as the Industrial Internet of Things (IIoT), to process tremendous data generated by...
详细信息
Distributed Collaborative Machine Learning (DCML) has emerged in artificial intelligence-empowered edge computing environments, such as the Industrial Internet of Things (IIoT), to process tremendous data generated by smart devices. However, parallel DCML frameworks require resource-constrained devices to update the entire Deep Neural Network (DNN) models and are vulnerable to reconstruction attacks. Concurrently, the serial DCML frameworks suffer from training efficiency problems due to their serial training nature. In this paper, we propose a Model Pruning-enabled Federated Split Learning framework (MP-FSL) to reduce resource consumption with a secure and efficient training scheme. Specifically, MP-FSL compresses DNN models by adaptive channel pruning and splits each compressed model into two parts that are assigned to the client and the server. Meanwhile, MP-FSL adopts a novel aggregation algorithm to aggregate the pruned heterogeneous models. We implement MP-FSL with a real FL platform to evaluate its performance. The experimental results show that MP-FSL outperforms the state-of-the-art frameworks in model accuracy by up to 1.35%, while concurrently reducing storage and computational resource consumption by up to 32.2% and 26.73%, respectively. These results demonstrate that MP-FSL is a comprehensive solution to the challenges faced by DCML, with superior performance in both reduced resource consumption and enhanced model performance.
Graph pattern mining is essential for deciphering complex networks. In the real world, graphs are dynamic and evolve over time, necessitating updates in mining patterns to reflect these changes. Traditional methods us...
详细信息
Graph pattern mining is essential for deciphering complex networks. In the real world, graphs are dynamic and evolve over time, necessitating updates in mining patterns to reflect these changes. Traditional methods use fine-grained incremental computation to avoid full re-mining after each update, which improves speed but often overlooks potential gains from examining inter-update interactions holistically, thus missing out on overall efficiency *** this paper, we introduce Cheetah, a dynamic graph mining system that processes updates in a coarse-grained manner by leveraging exploration domains. These domains exploit the community structure of real-world graphs to uncover data reuse opportunities typically missed by existing approaches. Exploration domains, which encapsulate extensive portions of the graph relevant to updates, allow multiple updates to explore the same regions efficiently. Cheetah dynamically constructs these domains using a management module that identifies and maintains areas of redundancy as the graph changes. By grouping updates within these domains and employing a neighbor-centric expansion strategy, Cheetah minimizes redundant data accesses. Our evaluation of Cheetah across five real-world datasets shows it outperforms current leading systems by an average factor of 2.63 ×.
暂无评论