GLIMPSE

GLIMPSE is a text indexing and retrieval software program originally developed at the University of Arizona by Udi Manber, Sun Wu, and Burra Gopal. It was released under the ISC license in September 2014.

Glimpse
Developer(s)Internet WorkShop
Initial releaseSeptember 2014 (2014-09)
Stable release
4.18.7 (source) / 4.18.5 (binary) / November 27, 2015 (2015-11-27)
Repository
Written inC
Operating systemCross-platform
TypeSearch and index
Websitewebglimpse.net

GLIMPSE stands for GLobal IMPlicit SEarch. While many text indexing schemes create quite large indexes (usually around 50% of the size of the original text), a GLIMPSE-created index is only 2-4% of the size of the original text.

GLIMPSE uses and takes a great deal of inspiration from Agrep, which was also developed at the University of Arizona, but GLIMPSE uses a high level index whereas Agrep parses all the text each time.

The basic algorithm is similar to other text indexing and retrieval engines, except that the text records in the index are huge, consisting of multiple files each. This index is searched using a boolean matching algorithm like most other text indexing and retrieval engines. After one or more of these large text records is matched, Agrep is used to actually scan for the exact text desired. While this is slower than traditional totally indexed approaches, the advantage of the smaller index is seen to be advantageous to the individual user. This approach would not work particularly well across websites, but it would work reasonably well for a single site, or a single workstation. In addition, the smaller index can be created more quickly than a full index.

References

    As of January 2020, the above links seem to be dead. GLIMPSE itself can no longer be found on the U Arizona site. Only Webglimpse appears to be living on GitHub, deeply buried in the projects, though not updated since 2015.

    This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.