Escherichia coli (E. coli) K12 was sequenced in 1997. The 4,639,221-base pair DNA sequence consists of 4288 annotated protein-coding genes, 38 percent of which have no attrib- uted function. One of the major problems ...
详细信息
Escherichia coli (E. coli) K12 was sequenced in 1997. The 4,639,221-base pair DNA sequence consists of 4288 annotated protein-coding genes, 38 percent of which have no attrib- uted function. One of the major problems in predicting prokaryotic promoters is locating the spacers between the -35 box and -10 box and between the -10 box and transcription start site. In this paper, we use the adopted expectation maximization (EM) algorithm to accurately find the localizations of the promoter regions. A brand new purine-pyrimidine encoding method is pro- posed to reduce the dimensions of the training data. The heavy demand on systems for both computation and memory space can then be avoided through the choice of coding factor. The most representative features are used for training learning vector quantization networks. The simulation results of the proposed coding approach reveal that the precision of promoter predic- tion using the proposed approach is approximately the same as the precision using the traditional encoding method.
作者:
Yousra AbudaqqaAhmed PatelaSchool of Computer Science
Centre of Software Technology and Management (SOFTAM) Faculty of Information Science and Technology (FTSM) Universiti Kebangsaan Malaysia 43600 UKM Bangi Selangor Malaysia bSchool of Computer Science
Centre of Software Technology and Management (SOFTAM) Faculty of Information Science and Technology (FTSM) Universiti Kebangsaan Malaysia 43600 UKM Bangi Selangor Malaysia cVisiting Professor School of Computing and Information Systems
Faculty of Science Engineering and Computing Kingston University Kingston upon Thames KT1 2EE United Kingdom
Indisputably, search engines (SEs) abound. The monumental growth of users performing online searches on the Web is a contending issue in the contemporary world nowadays. For example, there are tens of billions of sear...
Indisputably, search engines (SEs) abound. The monumental growth of users performing online searches on the Web is a contending issue in the contemporary world nowadays. For example, there are tens of billions of searches performed everyday, which typically offer the users many irrelevant results which are time consuming and costly to the user. Based on the afore-going problem it has become a herculean task for existing Web SEs to provide complete, relevant and up-to-date information response to users’ search queries. To overcome this problem, we developed the Distributed Search Engine Architecture (DSEA), which is a new means of smart information query and retrieval of the World Wide Web (WWW). In DSEAs, multiple autonomous search engines, owned by different organizations or individuals, cooperate and act as a single search engine. This paper includes the work reported in this research focusing on development of DSEA, based on topic-specific specialised search engines. In DSEA, the results to specific queries could be provided by any of the participating search engines, for which the user is unaware of. The important design goal of using topic-specific search engines in the research is to build systems that can effectively be used by larger number of users simultaneously. Efficient and effective usage with good response is important, because it involves leveraging the vast amount of searched data from the World Wide Web, by categorising it into condensed focused topic -specific results that meet the user’s queries. This design model and the development of the DSEA adopt a Service Directory (SD) to route queries towards topic-specific document hosting SEs. It displays the most acceptable performance which is consistent with the requirements of the users. The evaluation results of the model return a very high priority score which is associated with each frequency of a keyword.
暂无评论