检索结果-内蒙古大学图书馆

Restoring Arabic vowels through omission-tolerant dictionary lookup

LANGUAGE RESOURCES AND EVALUATION 2020年第2期54卷 487-551页

作者： Neme, Alexis Amid Paumier, Sebastien Univ Paris Est LIGM UPEM CNRSENPCESIEE F-77454 Marne La Vallee France

Vowels in Arabic are optional orthographic symbols written as diacritics above or below letters. In Arabic texts, typically more than 97 percent of written words do not explicitly show any of the vowels they contain;that is to say, depending on the author, genre and field, less than 3 percent of words include any explicit vowel. Although numerous studies have been published on the issue of restoring the omitted vowels in speech technologies, little attention has been given to this problem in papers dedicated to written Arabic technologies. In this research, we present Arabic-Unitex, an Arabic Language Resource, with emphasis on vowel representation and encoding. Specifically, we present two dozens of rules formalizing a detailed description of vowel omission in written text. They are typographical rules integrated into large-coverage resources for morphological annotation. For restoring vowels, our resources are capable of identifying words in which the vowels are not shown, as well as words in which the vowels are partially or fully included. By taking into account these rules, our resources are able to compute and restore for each word form a list of compatible fully vowelized candidates through omission-tolerant dictionary lookup. In our previous studies, we have proposed a straightforward encoding of taxonomy for verbs (Neme in Proceedings of the international workshop on lexical resources (WoLeR) at ESSLLI, 2011) and broken plurals (Neme and Laporte in Lang Sci, 2013, ). While traditional morphology is based on derivational rules, our description is based on inflectional ones. The breakthrough lies in the reversal of the traditional root-and-pattern Semitic model into pattern-and-root, giving precedence to patterns over roots. The lexicon is built and updated manually and contains 76,000 fully vowelized lemmas. It is then inflected by means of finite-state transducers (FSTs), generating 6 million forms. The coverage of these inflected forms is extended by forma

关键词： Arabic language Arabic language resources Arabic NLP Semtic morphology Root-and-pattern model Pattern-and-root model Finite state transducers Local grammars compression algorithm Vocalisation Vowelization

来源：评论

学校读者我要写书评

暂无评论

A Data-Compressive Wired-OR Readout for Massively Parallel Neural Recording

引用

IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2019年第6期13卷 1128-1140页

作者： Muratore, Dante Gabriel Tandon, Pulkit Wootters, Mary Chichilnisky, E. J. Mitra, Subhasish Murmann, Boris Stanford Univ Dept Elect Engn Wu Tsai Neurosci Inst Stanford CA 94305 USA Stanford Univ Dept Elect Engn Stanford CA 94305 USA Stanford Univ Dept Comp Sci Stanford CA 94305 USA Stanford Univ Expt Phys Lab Dept Neurosurg & Ophthalmol Stanford CA 94305 USA

Neural interfaces of the future will be used to help restore lost sensory, motor, and other capabilities. However, realizing this futuristic promise requires a major leap forward in how electronic devices interface with the nervous system. Next generation neural interfaces must support parallel recording from tens of thousands of electrodes within the form factor and power budget of a fully implanted device, posing a number of significant engineering challenges. In this paper, we exploit sparsity and diversity of neural signals to achieve simultaneous data compression and channel multiplexing for neural recordings. The architecture uses wired-OR interactions within an array of single-slope A/D converters to obtain massively parallel digitization of neural action potentials. The achieved compression is lossy but effective at retaining the critical samples belonging to action potentials, enabling efficient spike sorting and cell type identification. Simulation results of the architecture using data obtained from primate retina ex-vivo with a 512-channel electrode array show average compression rates up to $\sim$40 while missing less than 5 of cells. In principle, the techniques presented here could be used to design interfaces to other parts of the nervous system.

关键词： A/D conversion brain-machine interfaces compression algorithm neural interfaces

来源：评论

学校读者我要写书评

暂无评论

A COMPACT REPRESENTATION OF HIERARCHICAL RELATIONS USING DECIMAL NOTATIONS

引用

INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS 1990年第1-2期33卷 37-54页

作者： AOE, J Department of Information Science and Intelligent Systems The University of Tokushima Minami-Josanjima-Cho Tokushima-City 770 Japan

A decimal notation satisfies many simple mathematical properties. and it is a useful tool in the analysis of trees. A practical method is presented, that compresses the decimal codes while maintaining the fast determination of relations (e.g., ancestor, descendant, brother, etc.). A special node. called akernel node,including many common subcodes of the other codes, is defined, and a compact data structure is presented using the kerne! nodes. Letn(m) be the number of the total (kernel) nodes. It is theoretically proved that encoding a decimal code is a constant time, that the worst-case time complexity of compressing the decimal codes is O(n+m2), and that the size of the data structure is proportional tom. From the experimental results of some hierarchical semantic primitives for natural language processing, it is shown that the ratiom/nbecomes an extremely small value, ranging from 0.047 to 0.13.

关键词： compression algorithm data structure decimal codes semantic primitives tree structures

来源：评论

学校读者我要写书评

暂无评论

Context Modeling in Problems of Compressing Hyperspectral Remote Sensing Data

引用

PATTERN RECOGNITION AND IMAGE ANALYSIS 2020年第2期30卷 217-223页

作者： Pertsau, D. Yu Doudkin, A. A. Belarusian State Univ Informat & Radioelect Minsk 220013 BELARUS Natl Acad Sci Belarus United Inst Informat Problems Minsk 220012 BELARUS

The article is devoted to developing a compression method using context modeling of a sequence of bits and wavelet transform, which make it possible to take into account the specifics and properties of the initial hyperspectral remote sensing data. Two algorithms for compressing hyperspectral data (lossy and lossless) based on wavelet transform are proposed, the distinguishing features of which are reduction in the required memory size, acceleration of the search for significant wavelet coefficients using a pyramid with approximating coefficients, and an increase in the compression coefficient. Recommendations for applying these algorithms are formulated. A distinctive feature of the hyperspectral data compression method is the ability to control the compression coefficient owing to parametric adjustment of the algorithms, application of context modeling and adaptation to the type of initial data (classical cube or Fourier interferogram). The efficiency of the technique has been experimentally confirmed using examples of compression of classical data and real Fourier interferograms with compression ratios of 4.1 and 2.4, corresponding to the level of the best global results, as well as analytically with data distortion in a compressed stream.

关键词： hyperspectral image context modeling parametric adjustment compression algorithm remote sensing

来源：评论

学校读者我要写书评

暂无评论

Development of a Wireless Capsule Endoscope System Based on Field Programmable Gate Array

引用

Journal of Shanghai Jiaotong university(Science) 2017年第2期22卷 156-160页

作者：李四青刘华 School of Electronic Information and Electrical Engineering Shanghai Jiao Tong UniversityShanghai 200240China

A new modular and programmable wireless capsule endoscope is presented in this paper. The capsule system consumes low power and has small physical size. A new image compression algorithm is presented in this paper to reduce power consumption and silicon area. The compression algorithm includes color space transform, uniform quantization, sub-sampling, differential pulse code modulation (DPCM) and Golomb-Rice code. The algorithm is tested in a field programmable gate array (FPGA) development board, and the final result achieves 80% compression rate at 40 dB peak signal to noise ratio (PSNR). The algorithm has high image compression efficiency and low power consumption, compared to other existing works. The system is composed of the following three parts: image capsule endoscope, portable wireless receiver and host computer software. The software and hardware design of the three parts are disscussed in details. © 2017, Shanghai Jiaotong University and Springer-Verlag Berlin Heidelberg.

关键词： capsule endoscope portable receiver compression algorithm field programmable gate array(FPGA)

来源：评论

学校读者我要写书评

暂无评论

RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure

引用

BMC BIOINFORMATICS 2008年第1期9卷 176-176页

作者： Liu, Qi Yang, Yu Chen, Chun Bu, Jiajun Zhang, Yin Ye, Xiuzi Zhejiang Univ Coll Comp Sci Hangzhou 310027 Peoples R China Zhejiang Univ Zhejiang Calif Int Nanosyst Inst Hangzhou 310029 Peoples R China Zhejiang Univ Coll Life Sci Hangzhou 310027 Peoples R China Zhejiang Univ James D Watson Inst Genom Sci Hangzhou 310008 Peoples R China

Background: With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed, none of them are suitable for the compression of RNA sequences with their secondary structures simultaneously. This kind of compression not only facilitates the maintenance of RNA data, but also supplies a novel way to measure the informational complexity of RNA structural data, raising the possibility of studying the relationship between the functional activities of RNA structures and their complexities, as well as various structural properties of RNA based on compression. Results: RNACompress employs an efficient grammar-based model to compress RNA sequences and their secondary structures. The main goals of this algorithm are two fold: ( 1) present a robust and effective way for RNA structural data compression;( 2) design a suitable model to represent RNA secondary structure as well as derive the informational complexity of the structural data based on compression. Our extensive tests have shown that RNACompress achieves a universally better compression ratio compared with other sequence-specific or common text-specific compression algorithms, such as Gencompress, winrar and gzip. Moreover, a test of the activities of distinct GTP-binding RNAs (aptamers) compared with their structural complexity shows that our defined informational complexity can be used to describe how complexity varies with activity. These results lead to an objective means of comparing the functional properties of heteropolymers from the information perspective. Conclusion: A universal algorithm for the compression of RNA secondary structure as well as the evaluation of its informational complexity is discussed in this paper. We have developed RNACompress, as a useful tool for academic users. Exten

关键词： compression algorithm Kolmogorov Complexity Huffman Code Grammar Tree Good compression Ratio

来源：评论

学校读者我要写书评

暂无评论

compression of local slant stacks by the estimation of multiple local slopes and the matching pursuit decomposition

引用

GEOPHYSICS 2015年第6期80卷 WD175-WD187页

作者： Hu, Hao Liu, Yike Osen, Are Zheng, Yingcai Chinese Acad Sci Inst Geol & Geophys Beijing Peoples R China Statoil Beijing Technol Serv Co Ltd Beijing Peoples R China Univ Houston Dept Earth & Atmospher Sci Houston TX 77004 USA

Because local slant stacking increases the data dimension in beam migration, the volume of local slant stacks can be enormous and can obstruct efficient data processing. In addition, a proper beam compression algorithm can reduce the computation of ray tracing and beam mapping. Thus, compressing the local slant stacks with high fidelity can improve the efficiency of beam migration. A new approach is proposed to efficiently compress the local slant stacks. This approach combines the estimation of multiple local slopes based on the structure tensor to reduce the number of slopes, and the sparse representation for the slant stacked data via the matching pursuit decomposition to reduce the number of temporal samples. Furthermore, a new algorithm to estimate multiple local slopes based on the second-order structure tensor is proposed to handle the intersecting events efficiently. Several data examples indicated that the new compression algorithm required much less storage. Meanwhile, the new algorithm can restore the significant events and tolerate some random noise. The migration results determined that this compression algorithm does not obviously degrade the quality of the beam migration result, and it even makes the migration result more clear by suppressing the random noise smearing.

关键词： beam migration compression algorithm local slope estimation matching pursuit slant stack

来源：评论

学校读者我要写书评

暂无评论

compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment

引用

BMC BIOINFORMATICS 2007年第1期8卷 252-252页

作者： Ferragina, Paolo Giancarlo, Raffaele Greco, Valentina Manzini, Giovanni Valiente, Gabriel Univ Palermo Dipartimento Matemat Applicaz Palermo Italy Univ Pisa Dipartimento Informat I-56100 Pisa Italy Tech Univ Catalonia Algorithms Bioinformat Complex & Formal Methods Barcelona Spain

Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal compression Dissimilarity), NCD (Normalized compression Dissimilarity) and CD (compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results: We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis,

关键词： compression algorithm Dissimilarity Matrix Kolmogorov Complexity Arithmetic Code Huffman Code

来源：评论

学校读者我要写书评

暂无评论

Representation and compression of Residual Neural Networks through a multilayer network based approach

引用

EXPERT SYSTEMS WITH APPLICATIONS 2023年 215卷

作者： Amelio, Alessia Bonifazi, Gianluca Cauteruccio, Francesco Corradini, Enrico Marchetti, Michele Ursino, Domenico Virgili, Luca Univ G DAnnunzioof Chieti Pescara INGEO Chieti Italy Polytech Univ Marche DII Ancona Italy

In recent years different types of Residual Neural Networks (ResNets, for short) have been introduced to improve the performance of deep Convolutional Neural Networks. To cope with the possible redundancy of the layer structure of ResNets and to use them on devices with limited computational capabilities, several tools for exploring and compressing such networks have been proposed. In this paper, we provide a contribution in this setting. In particular, we propose an approach for the representation and compression of a ResNet based on the use of a multilayer network. This is a structure sufficiently powerful to represent and manipulate a ResNet, as well as other families of deep neural networks. Our compression approach uses a multilayer network to represent a ResNet and to identify the possible redundant convolutional layers belonging to it. Once such layers are identified, it prunes them and some related ones obtaining a new compressed ResNet. Experimental results demonstrate the suitability and effectiveness of the proposed approach.

关键词： Residual Neural Networks Convolutional Neural Networks Complex networks Multilayer networks compression algorithm Convolutional layer pruning

来源：评论

学校读者我要写书评

暂无评论

Time Series Data compression

NCS 2017 全国计算机会议

引用

NCS 2017 全国计算机会议 2018年 119-123页

作者： Bamouni Dominique Sheau-Ling Hsieh Shyan-Ming Yuan

In recent years, numerous smart meters have been widely installed to aggregate time series engineering parameters over fields; it has led to problems of handling big data. The huge volumes of data need to be transmitted, stored, processed as well as retrieved. Storing and accessing these big data have become expensive in time, space and bandwidth. The aim of the study is to find a solution for the problems. One solution developed in the study is to compress/decompress the engineering parameters. The data format of the variables has three (03) portions: 128-bit Global Unique Identifier (GUID), 64-bit time stamp parameter, and 64-bit floating point value parameter. Three encoding/decoding algorithms have been applied and implemented. The approaches have reduced the original historical data size 40% off as well as the storage cost. The algorithms' performances: the compression ratio, the saving percentage and the compression/decompression time and speed have been measured. The decompression process has been proved faster than the compression process based on the historical data.

关键词： compression algorithm GUID Time stamp parameter Floating point value parameters Performances

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：