This survey aims to identify hazards associated with the Internet of Things (IoT), focusing on malware that can infiltrate various devices, applications, and systems within the Industrial Internet of Things (IIoT). Su...
详细信息
The turnstile data stream model offers the most flexible framework where data can be manipulated dynamically, i.e., rows, columns, and even single entries of an input matrix can be added, deleted, or updated multiple ...
详细信息
The turnstile data stream model offers the most flexible framework where data can be manipulated dynamically, i.e., rows, columns, and even single entries of an input matrix can be added, deleted, or updated multiple times in a data stream. We develop a novel algorithm for sampling rows ai of a matrix A ∈ Rn×d, proportional to their p norm, when A is presented in a turnstile data stream. Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows ãi ≈ ai, and approximates their sampling probabilities up to Ε relative error. When combined with preconditioning techniques, our algorithm extends to p leverage score sampling over turnstile data streams. With these properties in place, it allows us to simulate subsampling constructions of coresets for important regression problems to operate over turnstile data streams with very little overhead compared to their respective off-line subsampling algorithms. For logistic regression, our framework yields the first algorithm that achieves a (1 + Ε) approximation and works in a turnstile data stream using polynomial sketch/subsample size, improving over O(1) approximations, or exp(1/Ε) sketch size of previous work. We compare experimentally to plain oblivious sketching and plain leverage score sampling algorithms for p and logistic regression. Copyright 2024 by the author(s)
data subsampling is one of the most natural methods to approximate a massively large data set by a small representative proxy. In particular, sensitivity sampling received a lot of attention, which samples points prop...
详细信息
data subsampling is one of the most natural methods to approximate a massively large data set by a small representative proxy. In particular, sensitivity sampling received a lot of attention, which samples points proportional to an individual importance measure called sensitivity. This framework reduces in very general settings the size of data to roughly the VC dimension d times the total sensitivity S while providing strong (1 ± Ε) guarantees on the quality of approximation. The recent work of Woodruff & Yasuda (2023c) improved substantially over the general Õ(Ε−2Sd) bound for the important problem of p subspace embeddings to Õ(Ε−2S2/p) for p ∈ [1, 2]. Their result was subsumed by an earlier Õ(Ε−2Sd1−p/2) bound which was implicitly given in the work of Chen & Derezinski (2021). We show that their result is tight when sampling according to plain p sensitivities. We observe that by augmenting the p sensitivities by 2 sensitivities, we obtain better bounds improving over the aforementioned results to optimal linear Õ(Ε−2(S+d)) = Õ(Ε−2d) sampling complexity for all p ∈ [1, 2]. In particular, this resolves an open question of Woodruff & Yasuda (2023c) in the affirmative for p ∈ [1, 2] and brings sensitivity subsampling into the regime that was previously only known to be possible using Lewis weights (Cohen & Peng, 2015). As an application of our main result, we also obtain an Õ(Ε−2µd) sensitivity sampling bound for logistic regression, where µ is a natural complexity measure for this problem. This improves over the previous Õ(Ε−2µ2d) bound of Mai et al. (2021) which was based on Lewis weights subsampling. Copyright 2024 by the author(s)
This research paper examines the integration of Artificial Intelligence-driven Augmented Reality glasses within wired and wireless systems to facilitate real-time multilingual communication. The synergy between AI and...
详细信息
Being a datascience era, any bulk of data concerned with specific fields need to maintain in data centers with the help of rack server. For reducing storage for data center, the need to implement the data deduplicati...
详细信息
In the distributed computing paradigm, cloud computing provides a wide range of software, platforms and computing infrastructure. It has ability sharable elastic resource pools in dynamic. In this way, cloud computing...
详细信息
Vehicle use and the concept of a "smart city"are developing quickly. As a result of this progression, the Vehicular Ad-Hoc Network (VANET) is a popular network for inter-vehicular communication. The data gat...
详细信息
Electroencephalography (EEG) is a time-series signal containing semantic information that can be used to determine human brain activities. Artifacts within EEG data can interfere with the intrinsic distribution of thi...
详细信息
Nowadays, the ability to process, generate and collect various data in real-time has become more relevant for faster reactions to important events in technical enterprises. The numerous distributed stream processing e...
详细信息
3D medical image segmentation is an essential task in the medical image field, which aims to segment organs or tumours into different labels. A number of issues exist with the current 3D medical image segmentation tas...
详细信息
暂无评论