Mining the large volume textual data produced by microblogging services has attracted much attention in recent years. An important preprocessing step of microblog text mining is to convert natural language texts into ...
详细信息
ISBN:
(纸本)9781479975921
Mining the large volume textual data produced by microblogging services has attracted much attention in recent years. An important preprocessing step of microblog text mining is to convert natural language texts into proper numerical representations. Due to the short-length characteristic, finding proper representations of microblog texts is nontrivial. In this paper, we propose to build deep network-based models to learn low-dimensional representations of microblog texts. The proposed models take advantage of the semantic relatedness derived from two types of microblog-specific information, namely the retweet relationship and hashtags. Experiment results show that the deep models perform better than traditional dimensionality reduction methods such as latent semantic analysis and latent Dirichlet allocation topic model, and the use of microblog-specific information can help to learn better representations.
Batik holds profound cultural significance within Indonesia, serving as a tangible expression of the nation's rich heritage and intricate philosophical narratives. This paper introduces the Batik Nitik Sarimbit 12...
详细信息
Batik holds profound cultural significance within Indonesia, serving as a tangible expression of the nation's rich heritage and intricate philosophical narratives. This paper introduces the Batik Nitik Sarimbit 120 dataset, originating from Yogyakarta, Indonesia, as a pivotal resource for researchers and enthusiasts alike. Comprising images of 60 Nitik patterns meticulously sourced from fabric samples, this dataset represents a curated selection of batik motifs emblematic of the region's artistic tradition. The Batik Nitik Sarimbit 120 dataset offers a comprehensive collection of 120 motif pairs distributed across 60 distinct categories. By providing a comprehensive repository of batik motifs, the Batik Nitik Sarimbit 120 dataset facilitates the training and validation of machine learning algorithms, particularly through the utilization of generative method. This enables researchers to explore and innovate in the realm of batik pattern generation, fostering new avenues for creativity and expression within this venerable art form. In essence, the Batik Nitik Sarimbit 120 dataset stands as a testament to the collaborative efforts of cultural institutions and academia in preserving and promoting Indonesia's rich batik heritage. Its accessibility and richness make it a valuable resource for scholars, artists, and enthusiasts seeking to delve deeper into the intricate world of Indonesian batik. (c) 2024 The Author(s). Published by Elsevier Inc.
Insufficient and imbalance data samples often prevent the development of accurate deep learning models for manufacturing defect detection. By applying data augmentation methods - including VAE latent space oversamplin...
详细信息
Insufficient and imbalance data samples often prevent the development of accurate deep learning models for manufacturing defect detection. By applying data augmentation methods - including VAE latent space oversampling and random data generation, and GAN multi-modal complementary data generation, we overcome the dataset limitations and achieve Pass/No-Pass accuracies of over 90%.
Explainable neural models have gained a lot of attention in recent years. However, conventional encoder–decoder models do not capture information regarding the importance of the involved latent variables and rely on ...
详细信息
Explainable neural models have gained a lot of attention in recent years. However, conventional encoder–decoder models do not capture information regarding the importance of the involved latent variables and rely on a heuristic a-priori specification of the dimensionality of the latent space or its selection based on multiple trainings. In this paper, we focus on the efficient structuring of the latent space of encoder–decoder approaches for explainable data reconstruction and compression. For this purpose, we leverage the concept of Shapley values to determine the contribution of the latent variables on the model’s output and rank them according to decreasing importance. As a result, a truncation of the latent dimensions to those that contribute the most to the overall reconstruction allows a trade-off between model compactness (i.e. dimensionality of the latent space) and representational power (i.e. reconstruction quality). In contrast to other recent autoencoder variants that incorporate a PCA-based ordering of the latent variables, our approach does not require time-consuming training processes and does not introduce additional weights. This makes our approach particularly valuable for compact representation and compression. We validate our approach at the examples of representing and compressing images as well as high-dimensional reflectance data.
暂无评论