This paper describes our approach to automatically identify paired Discourse Connectives (DCs) in Chinese texts. Discourse Connectives (DCs) are terms that connect two text spans and signal the discourse relations bet...
详细信息
ISBN:
(纸本)9781665482639
This paper describes our approach to automatically identify paired Discourse Connectives (DCs) in Chinese texts. Discourse Connectives (DCs) are terms that connect two text spans and signal the discourse relations between them. Most DCs consist of a consecutive words (eg. as a result);however paired DCs are composed of non-consecutive words that together signal the discourse relation (eg. on one hand ... on the other hand). Although paired DCs are not common in English, they are very frequent in Chinese. The contribution of this paper in two-fold: First, we propose a methodology for the automatic identification of Chinese paired DCs. Second, we present a new corpus based on the Chinese Discourse Treebank (CDTB) [1] annotated with paired DCs. To identify paired DCs, we experimented with two main approaches: hypothesis testing and supervised machinelearning. Although the hypothesis testing approaches led to lower than expected results, the simple machinelearning models achieved F-scores between 72.5%-75.6% with no fine-tuning.
Image caption generation is an emerging field of study for researchers that mainly focuses on developing systems that can generate captions of an image. In today's World, Image captioning is a very useful tool. Mo...
详细信息
Newer perspectives toward technology aim to leverage current methods to achieve even greater heights. machinelearning, a game-changer in the real world, assists machines in progressively improving their performance. ...
详细信息
The cloud operating system plays a crucial role in supporting machinelearning tasks by offering powerful computing capabilities and abundant resources. This enables the implementation of various complex applications,...
详细信息
With the development background of compressed air launcher for single weapon cartridge, the compressed air launch system and internal ballistic calculation simulation model design with different drain diameter is esta...
详细信息
software development comes with a lot of challenges. Developers face various issues with performance and bugs. These issues increase with the scale of the project and if fewer individuals work on the development. It h...
详细信息
In the OS Kernal, how to allocate disk space is a difficult question. Buddy system is one of important way in various allocating disk space way. We can reduce the fragmentation by Buddy System. However, Buddy System c...
详细信息
In the financial markets, gold is a sought-after precious metal due to its use as jewelry and as a trade commodity. Gold is a significant financial asset that can be utilized for evaluating savings in the financial ma...
详细信息
Lightning prediction is the task of predicting the future lightning area through historical meteorological observation data. At present, the mainstream methods of lightning prediction include thunder storm recognition...
详细信息
ISBN:
(纸本)9798400708305
Lightning prediction is the task of predicting the future lightning area through historical meteorological observation data. At present, the mainstream methods of lightning prediction include thunder storm recognition extrapolation, numerical model prediction and machinelearning lightning prediction methods. In the traditional lightning forecast, it is difficult to find the internal relationship between meteorological data and the generation, development and extinction of lightning through artificially designed equations. There are relatively few machinelearning lightning forecasting methods, and the prediction results have the problems of long-time interval and low prediction accuracy. In order to solve the above problems, this paper proposes a gated depthwise separable convolution structure (GDSC) that can realize multi-scale feature fusion. By establishing the global connection of local features, the same effect of attention mechanism is achieved with less calculation. The gated memory unit is used to adaptively determine the importance of each scale feature. On the basis of SimVP network, GDSC is used to improve it, and the SimVP-GDSC model for lightning nowcast is constructed. Experiments are carried out on real lightning data sets, and the results prove the effectiveness of SimVP-GDSC model.
This study aims to investigate the use of text classification techniques to support the audit of municipal accounts by the Court of Accounts in the context of calculating Total Personnel Expenditure. The study contrib...
详细信息
ISBN:
(纸本)9798350326970
This study aims to investigate the use of text classification techniques to support the audit of municipal accounts by the Court of Accounts in the context of calculating Total Personnel Expenditure. The study contributes to the discovery of expenses incorrectly classified by municipal managers due to error or fraud, which would be left out of the calculation. It used data obtained from the Court of Accounts of the Municipalities of Goi ' as, Brazil - TCM. These data were labeled by human experts and then prepared to build expenditure classification models from their description. The TF-IDF algorithm was used for feature engineering, and the Support Vector machines (SVM), Logistic Regression, and Multinomial Naive Bayes were used for classification. The results showed that the proposed method is consistent, having reached an F-Score of 0.91 and 0.97 with the SVM algorithm in the binary and multiclass corpus, respectively. Even dealing with a highly unbalanced dataset in the multiclass approach, the performance can be considered very good. As an innovative study, this work seeks to insert, in the context of the analysis of public expenditure on personnel, the use of text mining techniques to solve problems that would require the effort of many human specialists. The authors understand that there is a practical contribution to the process of analysis of public expenditure on personnel carried out by the Court of Accounts.
暂无评论