Text classification is one of the main issues in the big data analysis and research. In present, however, there is a lack of a universal algorithm model that can fulfill the requirement of both accuracy and efficiency...
详细信息
ISBN:
(纸本)9783319281216;9783319281209
Text classification is one of the main issues in the big data analysis and research. In present, however, there is a lack of a universal algorithm model that can fulfill the requirement of both accuracy and efficiency of text classification. This paper proposes a method of text classification, which combines the Naive Bayes and the similarity computing algorithm. Firstly, the text information is cut into several word segmentation vectors by the Paoding Analyzer;then the Bayesian algorithm is employed to conduct the first-level directory classification to the text information;after that, the improved similarity computing algorithm is adopted to carry out the second-level directory classification. Finally, the algorithm model is tested with actual data, and the results are compared with those of Bayesian algorithm and similarity computing algorithm respectively. The results show that the proposed method achieves a higher precision rate.
Most of the existing algorithms for protein similarity comparison focus on the sequence comparison and structure comparison. However, the 3D structure of a protein is determined by the force field generated by all of ...
详细信息
ISBN:
(纸本)0769524737
Most of the existing algorithms for protein similarity comparison focus on the sequence comparison and structure comparison. However, the 3D structure of a protein is determined by the force field generated by all of its atoms in essence. Thus the similarity of force field implies the similarity of 3D structure. In this paper, we propose a novel approach to compare the similarity of protein's force fields. First, we use blurred map to sample the force field into volumetric data sets. Second, the volume data set is resampled into a unified resolution. Third, the data set is band-pass filtered and quantized to reveal its physical attributes. The resulting voxels are then normalized into a canonical coordinate system concerning the center of mass and scale. Subsequently, a series of uniformly spaced concentric shells around the center of mass are constructed, based on which spherical harmonics analysis (SHA) is applied. The coefficient of SHA constitutes rotation invariant spectrum descriptors which are used to measure the similarity between two data sets. The algorithm has been performed on a set of proteins (taken from PDB) and the preliminary results are fairly inspiring.
暂无评论