Transforming and integrating heterogeneous datasets into structured and semantically enriched datamodels remains a critical challenge in data and knowledge engineering. Addressing this challenge requires a systematic...
详细信息
In-network computing (INC) is a new paradigm that allows applications to be executed within the network, rather than on dedicated servers. Conventionally, INC applications have been exclusively deployed on the data pl...
详细信息
ISBN:
(纸本)9798350383515;9798350383508
In-network computing (INC) is a new paradigm that allows applications to be executed within the network, rather than on dedicated servers. Conventionally, INC applications have been exclusively deployed on the data plane (e.g., programmable ASICs), offering impressive performance capabilities. However, the data plane's efficiency is hindered by limited resources, which can prevent a comprehensive deployment of applications. On the other hand, offloading compute tasks to the control plane, which is underpinned by general-purpose servers with ample resources, provides greater flexibility. However, this approach comes with the tradeoff of significantly reduced efficiency, especially when the system operates under heavy load. To simultaneously exploit the efficiency of data plane and the flexibility of control plane, we propose Carlo, a cross-plane collaborative optimization framework to support the network-wide deployment of multiple INC applications across both the control and data plane. Carlo first analyzes resource requirements of various INC applications across different planes. It then establishes mathematical models for resource allocation in cross-plane and automatically generates solutions using proposed algorithms. We have implemented the prototype of Carlo on Intel Tofino ASIC switches and DPDK. Experimental results demonstrate that Carlo can compute solutions in a short time while avoiding performance degradation caused by the deployment scheme.
This paper details the process of developing the first native large generative language model for the North Germanic languages, GPT-SW3. We cover all parts of the development process, from data collection and processi...
详细信息
This study explores the potentials of utilizing GPS data for Activity-Based models, under the condition that no additional information, such as travel diaries, is required. To extract activity details, we first develo...
详细信息
ISBN:
(数字)9781665468800
ISBN:
(纸本)9781665468800
This study explores the potentials of utilizing GPS data for Activity-Based models, under the condition that no additional information, such as travel diaries, is required. To extract activity details, we first developed a Time-Spatial Centroid-based Clustering algorithm to identify activity locations and times. Then a Home Detection algorithm was used in combination with two APIs from Google Maps, namely Nearby Search and Place Details, to label each identified activity as either `home' or a specific activity type. Next, a Markov Chain Multinomial Logit Choice model was developed for the extracted activities that models the sequential relationship between consecutive activities. The approach was applied to a GPS dataset collected in Japan in 2020. The estimated parameters revealed how background factors, such as activity time and the person's age, and the previous activity associate with the current activity. Thus, GPS data alone can provide certain knowledge about activity-travel, which potentially benefit practices of travel demands forecasting.
Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data. These approaches have demonstrated an enhancement in performance when the predictio...
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. In many domains, adversarial training has proven to be one of the most promising methods to reliably improve ...
In recent years, with the adverse impact of rising sea level become increasingly significant, the issue on environmentally displaced persons (EDPs) and their unique culture has emerged into our vision. Resulting from ...
详细信息
Rough sets is a granular computing model mainly used for knowledge discovery. How to quantify the uncertainty, information content, and relationships between knowledge is an important aspect of rough set research. Alt...
详细信息
The rise of Industry 4.0 and, therefore, the integration of Internet of Things (IoT), cloud computing, dataprocessing and analytics into the factories infrastructures has enabled the possibility to automatically coll...
详细信息
The recent proposed self-supervised learning (SSL) approaches successfully demonstrate the great potential of supplementing learning algorithms with additional unlabeled data. However, it is still unclear whether the ...
详细信息
The recent proposed self-supervised learning (SSL) approaches successfully demonstrate the great potential of supplementing learning algorithms with additional unlabeled data. However, it is still unclear whether the existing SSL algorithms can fully utilize the information of both labelled and unlabeled data. This paper gives an affirmative answer for the reconstruction-based SSL algorithm (Lee et al., 2020) under several statistical models. While existing literature only focuses on establishing the upper bound of the convergence rate, we provide a rigorous minimax analysis, and successfully justify the rate-optimality of the reconstruction-based SSL algorithm under different data generation models. Furthermore, we incorporate the reconstruction-based SSL into the existing adversarial training algorithms and show that learning from unlabeled data helps improve the robustness.
暂无评论