检索结果-内蒙古大学图书馆

2023 IEEE international conference on Big data, Bigdata 2023

作者： Scriney, Michael Timilsina, Mohan Curry, Edward Porwol, Lukasz Nie, Dongyun Dahley, Darren Fernandez, Jaime B. D'Aquin, Mathieu Roantree, Mark Dublin City University Insight Centre for Data Analytics Dublin Ireland University of Galway Insight Centre for Data Analytics Galway Ireland Dublin City University School of Computing Dublin Ireland Health Protection Surveillance Centre Dublin Ireland Loria University of Lorraine Nancy France Dublin City University Insight Centre for Data Analytics School of Computing Dublin Ireland

ISBN: (纸本)9798350324457

When the global pandemic struck in 2020, most countries established task forces to meet a challenge that impacted governmental resources. It became apparent that data, intelligence gathering, and both modelling and predictive capabilities were required. While artificial intelligence (AI) based solutions had already begun to emerge within the public sector, the Covid-19 pandemic accelerated this process. In particular, modelling of case numbers with the development of predictive algorithms. The development of AI solutions for public sector organizations is inherently multidisciplinary. This is crucial to understanding how solutions can be developed, outputs understood, and the benefits and risks measured. Furthermore, the development of AI solutions often requires data which may not be accessible from a single location. In the case of Covid-19 modelling, data must be extracted from multiple locations to construct data assets. In this research, a collaborative approach to developing machine learning expertise for the public sector is presented. Using Covid-19 as a case study, the role of different government sectors when building data assets is examined along with the use of standard data models, and how this type of cooperation led to the development of a pipeline for data assets to underpin AI solutions for the public sector. © 2023 IEEE.

关键词： COVID-19 data Integration FHIR data Model Pseudonymous data

来源：评论

学校读者我要写书评

暂无评论

Trillion-scale Graph processing Simulation based on Top-Down Graph Upscaling 37

Trillion-scale Graph Processing Simulation based on Top-Down...

引用

37th IEEE international conference on data Engineering (IEEE ICDE)

作者： Park, Himchan Xiong, Jinjun Kim, Min-Soo Korea Adv Inst Sci & Technol Sch Comp Daejeon South Korea IBM Thomas T Watson Res Ctr Cognit Comp Syst Res Yorktown Hts NY USA

ISBN: (纸本)9781728191843

As the number of graph applications increases rapidly in many domains, new graph algorithms (or queries) have become more important than ever before. The current two-step approach to develop and test a graph algorithm is very expensive for trillion-scale graphs required in many industrial applications. In this paper, we propose a concept of graph processing simulation, a single-step approach that generates a graph and processes a graph algorithm simultaneously. It consists of a top-down graph upscaling method called V-Upscaler and a graph processing simulation method following the vertex-centric GAS model called T-GPS. Users can develop a graph algorithm and check its correctness and performance conveniently and cost-efficiently even for trillion-scale graphs. Through extensive experiments, we have demonstrated that our single-step approach of V-Upscaler and T-GPS significantly outperforms the conventional two-step approach, although ours uses only a single machine, while the conventional one uses a cluster of eleven machines.

关键词： conferences Clustering algorithms data engineering data models

来源：评论

学校读者我要写书评

暂无评论

NFT Appraisal Using Machine Learning 23

NFT Appraisal Using Machine Learning

引用

5th Asia Pacific Information Technology conference, APIT 2023

作者： Dawod, Ahmed Dawod Mohammed Munkhdalai, Lkhagvadorj Park, Kwang Ho Ryu, Keun Ho Pham, Van Huy Database and Bioinformatics Laboratory College of Electrical and Computer Engineering Chungbuk National University Cheongju28644 Korea Republic of Data Science Laboratory Faculty of Information Technology Ton Duc Thang University Ho Chi Minh City700000 Viet Nam

ISBN: (纸本)9781450399500

Non-Fungible Tokens (NFTs) are digital assets based on a blockchain and those are characterized as unique cryptographic tokens and non-interchangeable. To date, research into the NFT marketplace has been relatively limited. As it is an emerging platform with many unique elements, The NFT market has been impacted due to recent fluctuations in crypto-asset markets more broadly. This current bear market cycle has shed light on concerns around the value of NFTs, profit-based motivation, and environmental sustainability. However, periods of volatility and cyclicality are to be expected with any nascent technology as it develops a product-market fit. consequently, the appraisal of real-price for NFT collections is essential for individual financial security and investment making. In this study, we evaluate the machine learning algorithms to appraise their real-price based on NFT item's characteristics, market event information, and their rarity score data acquired by retrieved from the biggest marketplace OpenSea. Furthermore, the procedures were applied to meet the objectives of this study we built prediction models based on various machine-learning algorithms ranging from Random Forest, XGBoost, SVM, Lasso, ElasticNet, Ridge, Linear Polynomial Regression, TabNet, CatBoost, and LightGBM models. From the results, LightGBM regression model outperformed the other by RMSE around 0.905. The best R2 is only found in this model, which has a value of 0.917. © 2023 ACM.

关键词： Blockchain

来源：评论

学校读者我要写书评

暂无评论

Readability Classification with Wikipedia data and All-MiniLM Embeddings 19th

Readability Classification with Wikipedia Data and All-MiniL...

引用

19th international conference on Artificial Intelligence Applications and Innovations (AIAI)

作者： Vergou, Elena Pagouni, Ioanna Nanos, Marios Kermanidis, Katia Lida Ionian Univ Dept Informat Corfu Greece

ISBN: (纸本)9783031341731;9783031341717;9783031341700

Evaluating the readability of text has been a critical step in several applications, ranging from text simplification, learning new languages, providing school children with appropriate reading material to conveying important medical information in an easily understandable way. A lot of research has been dedicated to evaluating readability on larger bodies of texts, like articles and paragraphs, but the application on single sentences has received less attention. In this paper, we explore several machine learning techniques - logistic regression, random forest, Naive Bayes, KNN, MLP, XGBoost - on a corpus of sentences from the English and simple EnglishWikipedia. We build and compare a series of binary readability classifiers using extracted features as well as generated all-MiniLM-L6-v2-based embeddings, and evaluate them against standard classification evaluation metrics. To the authors' knowledge, this is the first time this sentence transformer is used in the task of readability assessment. Overall, we found that the MLP models, with and without embeddings, as well as the Random Forest, outperformed the other machine learning algorithms.

关键词： Readability classification Text simplification Embeddings

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Feature Selection via Feature-Grouping and Orthogonal Constraint 26

Unsupervised Feature Selection via Feature-Grouping and Orth...

引用

26th international conference on Pattern Recognition / 8th international Workshop on Image Mining - Theory and Applications (IMTA)

作者： Yuan, Aihong Huang, Jiahao Wei, Chen Zhang, Wenjie Zhang, Naidan You, Mengbo Northwest A&F Univ Coll Informat Engn Yangling Shaanxi Peoples R China

ISBN: (数字)9781665490627

ISBN: (纸本)9781665490627

In the fields of machine learning and data mining, unsupervised feature selection plays an important role in processing large amounts of high-dimensional unlabeled data. This paper proposes an original and novel unsupervised feature selection based on feature grouping and orthogonal constraints. We consider the domain relationship in the original data and reconstruct the similarity matrix based on the correlation between the features. We use a generalized incoherent regression model based on orthogonal constraints. Furthermore, a graph regularization term with local structure preservation constraints is added to ensure that the feature subset does not lose local structural features in the original data space. Besides, an iterative algorithm is proposed to solve the optimization problem by iteratively updating the global similarity matrix, and constructing weight matrix, pseudo-label matrix and transformation matrix. Through experiments on 6 benchmark datasets, the clustering performance of the proposed method outperforms state-of-the-art unsupervised feature selection methods. The source code is available at: https://***/misteru/FGOC.

关键词： Correlation Source coding Machine learning Benchmark testing Feature extraction data structures data models

来源：评论

学校读者我要写书评

暂无评论

Smart data Driven System for Pathological Voices Classification 2nd

Smart Data Driven System for Pathological Voices Classificat...

引用

2nd international conference on Optimization, Learning algorithms and Applications (OL2A)

作者： Fernandes, Joana Candido Junior, Arnaldo Freitas, Diamantino Teixeira, Joao Paulo Inst Politecn Braganca Res Ctr Digitaizat & Intelligent Robot CeDRI P-5300 Braganca Portugal Univ Porto Fac Engn P-4200465 Porto Portugal Sao Paulo State Univ Inst Biosci Language & Phys Sci Sao Jose Do Rio Preto SP Brazil Inst Politecn Braganca Res Ctr Digitaizat & Intelligent Robot CeDRI Appl Management Res Unit UNIAG P-5300 Braganca Portugal Inst Politecn Braganca Lab Sustentabilidade & Tecnol Regioes Montanha Su P-5300 Braganca Portugal

ISBN: (纸本)9783031232350;9783031232367

Classifying and recognizing voice pathologies non-invasively using acoustic analysis saves patient and specialist time and can improve the accuracy of assessments. In this work, we intend to understand which models provide better accuracy rates in the distinction between healthy and pathological, to later be implemented in a system for the detection of vocal pathologies. 194 control subjects and 350 pathological subjects distributed across 17 pathologies were used. Each subject has 3 vowels in 3 tones, which is equivalent to 9 sound files per subject. For each sound file, 13 parameters were extracted (jitta, jitter, Rap, PPQ5, ShdB, Shim, APQ3, APQ5, F0, HNR, autocorrelation, Shannon entropy and logarithmic entropy). For the classification between healthy and pathological, several classifiers were used (Decision Trees, Discriminant Analysis, Logistic Regression Classifiers, Naive Bayes Classifiers, Support Vector Machines, Nearest Neighbor Classifiers, Ensemble Classifiers, Neural Network Classifiers) with various models. For each patient, 118 parameters were used (13 acoustic parameters * 9 sound files per subject, plus the subject's gender). As pre-processing of the input matrix data, the Outliers treatment was used using the quartile method, then the data were normalized and, finally, Principal Component Analysis (PCA) was applied in order to reduce the dimension. As the best model, the Wide Neural Network was obtained, with an accuracy of 98% and AUC of 0.99.

关键词： Speech pathologies Machine learning Speech features Principal component analysis Vocal acoustic analysis

来源：评论

学校读者我要写书评

暂无评论

Preliminary study on innovative design of bamboo furniture based on users' big data

Preliminary study on innovative design of bamboo furniture b...

引用

2020 international conference on data processing algorithms and models, icdpam 2020

作者： Sun, Zhenzhen Shao, Jiahuan Faculty of Forestry Sichuan Agricultural University Chengdu611130 China Key Laboratory of Wood Industry and Furniture Engineering Sichuan Education Department Sichuan Agricultural University Chengdu611130 China

Leverage Python software combined with contrastive analysis, structural analysis and trend analysis to analyze the current bamboo furniture market. Make use of Baidu Index to analyze the future development trend of bamboo furniture and user portrait. The research showcases that the simple modern style, pastoral style and European style furniture have made up a huge part of the market on the bamboo furniture market at present, and the bamboo furniture structure in the market mostly adopts weave structure and scaffold structure. From regional analysis' perspective, the bamboo furniture developed more advanced in southeast coast and furniture industry spreads around Guangdong to its surrounding areas in cluster effect. The evaluation of bamboo furniture users shows that users pay more attention to the height, size, comfort and folding adjustment of bamboo furniture, whilst paying less attention to its design, which also reflected that bamboo chair is relatively lack of design. Through the analysis of the future development trend of bamboo furniture by Baidu Index, the conclusion is drawn that bamboo furniture can expand consumer groups by improving the design, such as, the exploration of intelligent development. In summary, based on the data analysis of the current situation and development trend of bamboo furniture market, this paper puts forward the innovative design of bamboo furniture to improve the competitiveness of bamboo furniture market, and promote the development of bamboo furniture Industry. © Published under licence by IOP Publishing Ltd.

关键词： Bamboo

来源：评论

学校读者我要写书评

暂无评论

A Study on the Application of Algorithm for Translation Template Extraction Based on Sentence Comparison (ATTEBSC) in English Translation

A Study on the Application of Algorithm for Translation Temp...

引用

Computers, Information processing and Advanced Education (CIPAE), 2020 international conference on

作者： Yiping Luo Jie Pan School of Continuing Education Huaihua Normal College Huaihua Hunan China

ISBN: (数字)9798331527662

ISBN: (纸本)9798331527679

Under the background of accelerating globalization, accurate and efficient language translation is very important for cross-cultural communication. In this paper, ATTEBSC (Algorithm for Translation Template Extraction Based on Sentence Comparison) is adopted, aiming at automatically extracting and comparing translation templates from large-scale text data by combining natural language processing technology and machine learning method. The specific methods include using deep learning framework to analyze sentence structure, using syntactic and semantic analysis tools to identify key translation units, and then extracting high-frequency and efficient translation templates through comparative analysis algorithm. In addition, ATTEBSC also introduces a dynamic updating mechanism, which can continuously optimize the translation template library according to new data input. Bleu (Bilingual Evaluation Under Study) scores in all fields are higher than 0.8, and TER is lower than 30%, which indicates that the translation quality of machine translation system in all fields is high. The research results show that ATTEBSC has obvious advantages in improving translation quality and efficiency, especially in dealing with professional or technical text translation, which can significantly improve translation accuracy and fluency.

关键词： Training Accuracy Machine learning algorithms Semantics Training data Syntactics data models data mining Machine translation Cultural differences

来源：评论

学校读者我要写书评

暂无评论

Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big data processing

引用

PROCEEDINGS OF THE VLDB ENDOWMENT 2022年第11期15卷 3098-3111页

作者： Lyu, Chenghao Fan, Qi Song, Fei Sinha, Arnab Diao, Yanlei Chen, Wei Ma, Li Feng, Yihui Li, Yaliang Zeng, Kai Zhou, Jingren Univ Massachusetts Amherst MA 01003 USA Ecole Polytech Palaiseau France Alibaba Grp Hangzhou Peoples R China

Big data processing at the production scale presents a highly complex environment for resource optimization (RO), a problem crucial for meeting performance goals and budgetary constraints of analytical users. The RO problem is challenging because it involves a set of decisions (the partition count, placement of parallel instances on machines, and resource allocation to each instance), requires multi-objective optimization (MOO), and is compounded by the scale and complexity of big data systems while having to meet stringent time constraints for scheduling. This paper presents a MaxCompute based integrated system to support multi-objective resource optimization via fine-grained instance-level modeling and optimization. We propose a new architecture that breaks RO into a series of simpler problems, new fine-grained predictive models, and novel optimization methods that exploit these models to make effective instance-level RO decisions well under a second. Evaluation using production workloads shows that our new RO system could reduce 37-72% latency and 43-78% cost at the same time, compared to the current optimizer and scheduler, while running in 0.02-0.23s.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Optimized Table Tokenization for Table Structure Recognition 17th

Optimized Table Tokenization for Table Structure Recognition

引用

17th international conference on Document Analysis and Recognition (ICDAR)

作者： Lysak, Maksym Nassar, Ahmed Livathinos, Nikolaos Auer, Christoph Staar, Peter IBM Res Zurich Switzerland

ISBN: (纸本)9783031416781;9783031416798

Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs. Popular table structure data-sets will be published in OTSL format to the community.

关键词： Table Structure Recognition data Representation Transformers Optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：