In this paper, we study randomized reduction methods, which reduce high-dimensional features into low-dimensional space by randomized methods (e.g., random projection, random hashing), for large-scale high-dimensional...
详细信息
ISBN:
(纸本)9781510810587
In this paper, we study randomized reduction methods, which reduce high-dimensional features into low-dimensional space by randomized methods (e.g., random projection, random hashing), for large-scale high-dimensional classification. Previous theoretical results on randomized reduction methods hinge on strong assumptions about the data, e.g., low rank of the data matrix or a large separable margin of classification, which hinder their applications in broad domains. To address these limitations, we propose dual-sparse regularized randomized reduction methods that introduce a sparse regularizer into the reduced dual problem. Under a mild condition that the original dual solution is a (nearly) sparse vector, we show that the resulting dual solution is close to the original dual solution and concentrates on its support set. In numerical experiments, we present an empirical study to support the analysis and we also present a novel application of the dual-sparse regularized randomized reduction methods to reducing the communication cost of distributed learning from large-scale high-dimensional data.
GPT is a large language model (LLM) derived from natural language processing that can generate a human-like text using machine learning. However, these models raise questions about authenticity and reliability of mate...
详细信息
The Ministry of Health of Indonesia has referred to pre-eclampsia as one of the most severe diseases affecting women. As an urgency, it is crucial to administrate pre-eclampsia cases for disease prevention as a long-t...
详细信息
Ecology is a branch of biology that studies the interaction and relationship between organisms and their environment. Abundance, distribution of organisms and patterns of biodiversity are great interests for many ecol...
详细信息
Vehicle use and the concept of a "smart city"are developing quickly. As a result of this progression, the Vehicular Ad-Hoc Network (VANET) is a popular network for inter-vehicular communication. The data gat...
详细信息
This paper describes the NYU submission to the CoNLL–SIGMORPHON 2018 shared task on universal morphological reinflection. Our system participates in the low-resource setting of Task 2, track 2, i.e., it predicts morp...
详细信息
Childhood stunting is a condition anticipated to affect the growth potential of children under the age of five. With numerous stunting researches that have been conducted, stunting datasets are now widely available to...
详细信息
Because of the growth of the business sector dealing in the distribution of movies, software, music, and other contents, a very large amount of contents has accumulated. Accordingly, recommendation systems for inducin...
详细信息
Because of the growth of the business sector dealing in the distribution of movies, software, music, and other contents, a very large amount of contents has accumulated. Accordingly, recommendation systems for inducing user requests for contents are more important. In distribution businesses, accurate content recommendations are required to secure and retain users. To establish a highly accurate recommendation system, the recommended contents must be accurately classified. As classification methods, mainly techniques such as naive Bayes, SGD (stochastic gradient descent), and SVM (support vector machine), are utilized. If all of the information on recommended subjects is applied in the classification process, high-level accuracy can be expected, but heavy calculation, a long service time, and low scalability are incurred. Given this inefficiency, effective classification in which the metadata of contents are used is required. Metadata are expressed in the forms of the domain concept, relation, type, and attribute to allow the complicated relations between multimodal data (text, images, and video) to be processed efficiently. Most classification systems use single modal data to express one piece of knowledge for an item in a domain. Single modal data are limited in terms of improving classification accuracy, because they do not include the useful information provided by different knowledge types. Therefore, in this paper, we propose MMCNet, a deep learning–based multimodal classification model that uses dynamic knowledge. The proposed method consists of a classification model that applies the human learning principle-based CNN (convolution neural network) to multimodal data in combination with text and image knowledge. By using a Web robot agent, multimodal data are collected from the TMDb (The Movie database) data set, which includes a variety of single modal data. In the preprocessing procedures, knowledge integration, knowledge conversion, and knowledge reduction
Spreadsheets contain a lot of valuable data and have many practical *** key technology of these practical applications is how to make machines understand the semantic structure of spreadsheets,e.g.,identifying cell fu...
详细信息
Spreadsheets contain a lot of valuable data and have many practical *** key technology of these practical applications is how to make machines understand the semantic structure of spreadsheets,e.g.,identifying cell function types and discovering relationships between cell *** existing methods for understanding the semantic structure of spreadsheets do not make use of the semantic information of cells.A few studies do,but they ignore the layout structure information of spreadsheets,which affects the performance of cell function classification and the discovery of different relationship types of cell *** this paper,we propose a Heuristic algorithm for Understanding the Semantic Structure of spreadsheets(HUSS).Specifically,for improving the cell function classification,we propose an error correction mechanism(ECM)based on an existing cell function classification model[11]and the layout features of *** improving the table structure analysis,we propose five types of heuristic rules to extract four different types of cell pairs,based on the cell style and spatial location *** experimental results on five real-world datasets demonstrate that HUSS can effectively understand the semantic structure of spreadsheets and outperforms corresponding baselines.
暂无评论