Adding Ispell support to UdmSearch ================================== Version 3.0 UdmSearch can store ispell files both in SQL database like in 2.x versions and can load ispell files from the disc. Currently only search frontends (both CGI and PHP) can use ispell stored in SQL database. When UdmSearch is used with ispell support all words are normalized by both indexer and search frontend. It allows to find the same words with different endings. For example, if the words "testing" or "tests" are found in the document, the word "test" will be stored by indexer instead. Search frontend will also try to find the word "test" if "testing" or "tests" is given in search query. Note that this schema loose exact search possibility, but usually reduces the size of database and makes search faster. Only suffixes are supported by now. Prefixes are usually change the word meanings, for example if somebody search for the word "tested" he hardly wants "untested" to be found. To make UdmSearch support ispell you must specify Affix and Spell commands in both indexer.conf and search.htm files. Note, that you can store ispell data in SQL database using # indexer -L lang -A affix.file to load affixes and # indexer -L lang -D dict.file to load dictionary. Search.cgi and PHP frontend can be switched to use SQL to normalize words by specifying IspellMode db in search.htm. In this case Affix and Spell commands are nor nesessary. Note that search time ispell support is not implemented in frontend yet and works in search.cgi and PHP frontend only. Note that ispell commands MUST be given after LocalCharset definition in both search.htm and indexer.conf in UdmSearch versions before 3.0.15 The format of commands: Affix Spell The first parameter of both commands is two letters language abbrevation. The second one is filename. File name are relative to UdmSearch /etc directory. Absolute paths can be also specified. Note that loading of several languages is supported at the same time. For example, Affix en en.aff Spell en en.dict Addix de de.aff Spell de de.dict will load ispell support for both English anf German languages. Ispell affixes file contains rules for words and has the following format: flag V: E > -E,IVE # As in create > creative [^E] > IVE # As in prevent > preventive flag *N: E > -E,ION # As in create > creation Y > -Y,ICATION # As in multiply > multiplication [^EY] > EN # As in fall > fallen Ispell dicitonary file contains words themselfs and has format like this: wop/S word/DGJMS wordage/S wordbook wordily wordless/P Note that if you add ispell support to already existing database, reindexing is required. In other case non-normalized words will not be found at all. Checking site against correct spelling ====================================== You may change the factors of word weight depending on whether word is found in Ispell dictionaries or not. There ars two indexer.conf commands are available (with default value 1): IspellCorrectFactor 1 IspellIncorrectFactor 1 Setting the "IspellCorrectFactor" to 0 will prevent indexer from storing words with correct spelling in database. The only incorrect words will be stored in database in this case. Then you may easily find incorrect words and correspondent URLs where those words are found. If no ispell files are used all word are considered as "incorrect". There is possible that several rare word will be found in your site which are not in ispell dictionaries. You may create the list of such words in plain text file of this format (on word per line): rare.dict: ---------- webmaster intranet ....... www http --------- You may also use ispell flags in this file if you know how to :-) This will allow not to write the same word with different endings to the rare words file, for example "webmaster" and "webmasters". You may choose the word which have the same changing rules from existing ispell dictionary and just to copy flags from it. For example, English dictionary has this line: postmaster/MS So, webmaster with MS flags will be probably OK: webmaster/MS Then copy this file to /etc directory of UdmSearch and add this file by Spell command, for example: Spell en rare.dict During next reindexing new words will be considered as words with correct spelling. The only really incorrect words will remain.