| D. Mladenic. Text learning and related intelligent agents. IEEE Expert, July/August 1999. |
.... finding the nearest neighbors of a document [1] for improving the precision or recall in information retrieval systems [15, 11] for aid in browsing a collection of documents [3, 8] for the organization of search engine results [19] or for the personalization of search engine results [13]. Most current document clustering approaches are based on the vector space model (also called bag of words model or word space) the dimensions of the vector space are constituted by the important words of the document collection. The respective term or word frequencies (TF) in a given document ....
D. Mladenic. Text learning and related intelligent agents. IEEE Expert, July/August 1999.
.... finding the nearest neighbors of a document [1] for improving the precision or recall in information retrieval systems [2] 3] for aid in browsing a collection of documents [4] and for the organization of search engine results [5] and lately for the personalization of search engine results [6]. Most current document clustering approaches work with what is known as the vector space model, where each document is represented by a vector in the termspace. The latter generally consists of the keywords important to the document collection. For instance, the respective term or word ....
....For instance, the respective term or word frequencies (TF) 7]in a given document can be used to form a vector model for this document. In order to discount frequent words with little discriminating power, each term word can be weighted based on its Inverse Document Frequency (IDF) 7] [6] in the document collection. However, the distribution of words in most real document collections can vary drastically from one group of documents to another. Hence relying solely on the IDF for keyword selection can be inappropriate and can severely degrade the results of clustering and or any ....
D. Mladenic, "Text learning and related intelligent agents," IEEE Expert, Jul. 1999.
.... While many learning algorithms have been 72 developed and tested for docu,nent analysis and infor,nation retrieval applications, there see,ns to be strong indication that good docu,nent representation including fea ture selection is ,nore i,nportant than choosing a particular learning algorith,n [69]. Thus in this work our e,nphasis is on identifying features that best capture the charac teristics of a genuine table co,npared to a non genuine one. In particular, we introduce a set of novel features which reflect the layout as well as content characteristics of tables. These features are then ....
D. Mladenic. Text-learning and related intelligent agents. IEEE Expert special issue on Applications of Intelligent Information Retrieval, July-August 1999.
No context found.
D. Mladenic. Text learning and related intelligent agents. IEEE Expert, July/August 1999.
No context found.
Dunja Mladenic. Text-learning and related intelligent agents. IEEE Expert Special Issue on Applications of Intelligent Information Retrieval, pages 44--54, July-August 1999.
No context found.
D. Mladenic. Text learning and related intelligent agents. IEEE Expert, July/August 1999.
Online articles have much greater impact More about CiteSeer.IST at NUS Add search form to your site Submit documents Feedback
CiteSeer.IST at NUS - Copyright Penn State and NEC. Hosted by the School of Computing, National University of Singapore.