Obs. search "->" to locate names. May 2 2002 - Fixed local binarizer bug (see the new variable "eaten" at pbm2bm()). Fixed a tutorial error. Fixed -geometry (thanks -> "groggy"). All names cited by the CHANGELOG were acknowledged by the main documentation (end of CREDITS session). April 30 2002 - Fixed the hyperlink edit/n. Switched off partial matching during binarization for faster operation (see step_5()). Copied the development files to the net (20020430). Fixed presentation problem. -> Brian G. is trying a win32 port. April 29 2002 - Added a more informative message for some non-supported file formats (thanks -> "groggy"). As the PAGE behaviour for graymaps is confusing, now it enters three-state automatically after segmentation. Dropped -A (use -allow_pres instead). Removed heuristic 5 from skeleton auto-tune (see the CSP array). Developer's guide fixed. April 27 2002 - Added clipping to draw_rb(). Added symbol display on HTML windows: use IMG SRC=symbol/n, see an example at mk_pattern_action(). Added dynamic registration of GUI buttons. To see the test button, use the command-line switch -pp_test. Finished 'PATTERN (action)'. April 27 2002 - Added "avoid-links" feature in segmentation step. Changed reference to hqbin() in clara.c. Reactivated Test button (only for my purposes :-). Wrote pm_fl() in html.c (not used!). Added "threshold factor" and "try avoid links" parameters in Tune tab. Improved balance code (now works fine); other internal changes in preproc.c Added 3 obs. to book.c [-> Giulio Lunati] April 26 2002 - Dropped -C (use dmmode= instead). Dropped -S (use RW= instead). Initial window now tries to approach height/width ratio 3/4. Added SL status. Dropped -G (use TC=, SL=, etc). Implemented 'PATTERN (action)' window. April 25 2002 - Added status tskel_ready. Now all menus (not only those on the menu bar) dismiss when unfocused. Added check to skel() to detect code that change skeleton params without consisting them. Added auto-tune checkbox to TUNE (skel). Disabled auto classification of patterns by type. Removed st_auto_tune status (using the old status st_auto instead). Removed skeleton auto-tune entry from TUNE tab (the new checkbox and submit button at TUNE (skel) tab do a much better work). April 24 2002 - Applied -> Giulio's patch (interpolation and others). Fixed skeleton bug: not calling consist_skel() after some changes of skeleton parameters. Changed skeleton quality coverage criterion, and documented carefully skel_quality(). Fixed skeleton auto-tune bug: not calling pskel(). April 23 2002 - Fixed pp_thresh() (thanks -> Giulio Lunati). Copied the development files to the net (20020423). April 22 2002 - Finished updating the user's manual. Updated preproc.c (thanks -> Giulio Lunati). Added (naive) double resolution, interpolation to come. April 20 2002 - Double resolution tests. April 16 2002 - Copied the development files to the net (20020416). Added translation of long options and a command-line interface to define internal variables (see checkvar() and process_cl()). April 15 2002 - Removed buttons "bold", "italic" and "test". Finished fixing the tutorial. Added balance checkbox to the tune tab. fixed a thresholding bug (the code 20020409 was unable to handle PBMs). The link "nullify and remove" is back to the PATTERN tab (so now it's possible to remove one pattern). TUNE tab entries reorganized in to same sequence of OCR steps. The four binarization methods were collected into one radio at the TUNE tab (manual global, histogram, local and local strong). "How many types" removed from TUNE tab (now the array of types grows automatically when needed). Fixed bugs related to the Debug menu (the PAGE menu became broken and the Debug menu wasn't working). Reorganized various menus and changed some labels. Constant MAX_MT now is 45. Fixed segfault related to "show words" option. April 9 2002 - Added compositions / /, / a, / o, / A, / O, " s (thanks -> Jeroen Ruigrok). Changed some hardcoded constants to try avoiding partial matching mistakes (thanks -> Thomas Klausner). Copied the development files to the net (20020402). April 6 2002 - Applied -> Giulio Lunati's patch (renamed deskew.c to preproc.c, added balance relative black, test button, and code cleanup). Removed deskewer name, as requested. Added "Debug" menu. Added "Show unaligned symbols" feature. Moved "Show lines (geometrical)" to "Debug" menu. April 2 2002 - Copied the development files to the net (20020402). April 1 2002 - Added field "bs" to symbol structure to store the base symbol of accents. Support for double quotes enhanced, partially working. Added macro trans() March 30 2002 - Added lfa reset at rmvotes(). Partial matching restricted to symbol classification (mode 1 at classify()). Fixed a bug on the local binarizer (training from grayscale was not working). Tested -> Giulio's code on a 24-bit padded display (works). Fixed seed search code at pbm2bm(). Some letters were being dropped by the local binarizer due to this bug. Implemented auto-tune of skeleton parameters (global), it's the function tune_skel_global(). Changed the semantics of -a: the st_auto_global status instead of st_auto. Same for "auto tune skeleton parameters" on the TUNE tab. Integrated tune_skel_global() to the engine (see prepare_patterns()) and dropped the old calls to tune_skel(). Linked -> Giulio Lunati's histogram-based thresholder with Clara OCR. It's used by default when converting grayscale to black-and-white. March 28 2002 - -> Giulio Lunati added support to sparse 24bpp displays. Such displays have a depth of 24bpp, but pixel colors are padded with an additional (unused) byte to provide 32-bit alignment. Added automatic conversion from PBM to PGM on loading, basically to simplify the program at the expense of system memory (now black-and-white images are internally handled as 8bpp graymaps before segmentation). Fixed a problem concerning partial matches (dashes being recognized as a sequence of dots). Diagnosed and fixed a re-training bug: every time a class was re-trained, class data was lost. Thanks -> De Clarke for reporting it. One rule for validation of dots based on alignment was commented out (see the function recog_validation()). The usage of the term "lineart" was carefully reviewed on the manuals. Removed some test code that was causing problems to spyhole(). March 26 2001 - Added code to handle double quotes (unfinished). Added -o command-line switch and the ability of generate text output (thanks -> De Clarke). Now using rintf() instead of roundf() (thanks -> Thomas Klausner). Copied the development files to the net (20020326). March 21 2002 - Scanned Anchieta's Grammar, p. 48-117 March 18 2002 - Fixed a bug at classify(). Copied the development files to the net (20020318). Scanned Ancheta's Grammar, p. 1-48. March 16 2002 - Finished linking -> Giulio's deskewer. Changed the copyright notice displayed by the GUI. Added the modified version of selthresh.pl (by -> Tyler Akins) to the tarball. Now partial matching can be called by the OCR engine. Robustified cmpln(). March 11 2002 - Fixed some partial matching bugs. Copied the development files to the net (20020311). March 9 2002 - Partial matching now works. March 8 2002 - Started writing code to perform partial matching (to solve horizontal links). March 6 2002 - Fixed a segmentation bug, at new_cl() (thanks -> Giulio Lunati). March 4 2002 - Trying to link -> Giulio's deskewer with Clara OCR. February 28 2002 - -> Giulio Lunati contributed some filters. February 27 2002 - -> Tyler Akins reported various problems. February 13 2002 - Added auto-classify, as suggested some time ago by -> Adriano Nagelschmidt Rodrigues. Added "Auto classify" entry to the Edit menu. Copied the development files to the net (20020213). February 11 2002 - Changed the directory scanning code. Now the PAGE (list) tab sorts the filenames as numbers (e.g. "2.pgm" before "10.pgm"). February 9 2002 - Applied -> Giulio Lunati's patch to handle apostrophe, colon, semicolon and multichar alignment. February 7 2002 - Added checkboxes "use local binarizer" and "strong mode" to the TUNE tab. Copied the development files to the net (20020207). February 6 2002 - Local weak threshold tests. February 5 2002 - Added relaxed mode to spyhole(). February 4 2002 - The local binarizer is now integrated to the main OCR engine in "weak" and "strong" fashions. Added operation mode 4 to spyhole(), it's a faster version of mode 0. Local binarizer measurements and bugfixes. February 2 2002 - Wrote find_thing() to integrate the local binarizer to the OCR engine. Prepared new_cl() to accept fat bitmaps from find_thing(). January 30 2002 - Replaced the screenshot 7. Copied the development files to the net (20020130). January 29 2002 - Added more entries to the glossary. January 28 2002 - Added "expected threshold" parameter to spyhole(). Now the "best" threshold is first searched around the expected threshold. Added to spyhole() an heuristic to compute an alternative, larger threshold (called "next"), based on detection of merging of fragments. It's a tentative to solve segmentation problems (broken symbols). The pixels added by this larger threshold are displayed gray by the spyhole. Added relaxed mode to border_path(). Documented carefully the detection of merging of fragments (referred as "spare pixels" by the source code), see the large comment before the main loop of spyhole(). January 26 2002 - Destroyed the binding between pattern and symbol ID (field "e" of structure pdesc, it's still there but out of use). Added the concept of pattern transliteration submission (REV_PATT and review_patt()), because this cannot be done anymore through symbol transliteration submission. Changed update_pattern() to accept a bitmap submission, besides symbol submissions. Added the ability to include a pattern from the bitmap stored by the spyhole buffer (just press the key corresponding to the symbol transliteration while the spyhole is active). Started writing a small glossary specific to Clara OCR, Added initial support to mkdoc.pl generate the glossary. January 24 2002 - Received "Que sais-je? - La Terminologie. Noms et Notions" (Alain Rey), sent by -> Daniel Merigoux (Clara OCR is strongly related to dictionaries and vocabularies). -> Ron Young reported more mirror problems. January 23 2002 - Copied the development files to the net (20020123). January 22 2002 - Fixed minor bugs at spyhole() and finished a first strategy for choosing per-symbol thresholds. New screenshot spyhole.jpeg added. January 21 2002 - Added thresholding loop to spyhole(). January 19 2002 - More documentation for border_path(). Adapted border_path() to be used to span connected components. Now spyhole() uses border_path() to select the connected component close to the pointer. January 18 2002 - Added "What is PBM/PGM/PPM/PNM?" to the FAQ. Splitted the service avoid() into avoid_geo() and avoid_context(). Moved the context avoidance tests from bmpcmp_skel() to classify_symbol() (in fact, this is a bugfix). More documentation for bmpcmp_skel() and classify(). bmpcmp_skel() now supports the comparison of any given bitmap with a given pattern (feature added to implement the new binarizer). Changed the service classify() to support patterns and any bitmap. Dropped compare_patterns(). January 15 2002 - First implementation of spyhole(). January 14 2002 - Re-scanned CF pages 49-110 to work on a new version of the case study http://www.claraocr.org/cf-test/ Copied the development files to the net (20020115). December 26 2001 - Finished attaching to each menu item its availability conditions. December 24 2001 - Implemented availability tests, short help and diagnostics for menu items. -> Charles Davant fixed the mirrors. December 21 2001 - Implemented the unavailable state for menu items. -> Imre Simon donated 200 CD-R medias. -> Erich Mueller reported mirror problems. December 15 2001 - Detection of extremities tests. Good results for sans-serif. Copied the development files to the net (2001217). December 14 2001 - Finished detection of extremities. Detached is_extr() from dx(). December 13 2001 - Returned the new motherboard (failure on keyboard detection). December 10 2001 - Copied the development files to the net (2001210). December 7 2001 - Purchased a new motherboard and Duron 1GHz CPU to replace an old, damaged board. -> Sergei Andrievskii provided some explanations about Russian and Ukrainian Cyrillic. December 6 2001 - Implemented the service dx() and the circ* stuff. Separated add-closure() from pixel_mlist(). Added "detect extremities" menu option (PAGE_FATBITS). December 3 2001 - Copied the development files to the net (2001203). November 30 2001 - Implemented manager.c test mode (no operator, have_oper==0). November 29 2001 - Wrote and tested burn_cd(). Added a very crude sound interface to manager.c. November 28 2001 - bandwidth tests. It's a hard problem to make the scanner station apt to scan and write CDs at the same time. November 27 2001 - Scanning tests using manager.c. Finished preparing the hardware of the scanner station. November 26 2001 - Fixed the regeneration of PAGE (symbol) to follow the current symbol when arrow keys are used. Fixed a small webclip-related regeneration problem. Tested carefully the full PGM cycle and zones support. Small documentation updates to reflect the new features. Copied the development files to the net (20011126). Began preparing the scanner station manager (manager.c). November 24 2001 - Finished the multiple zones stuff. November 23 2001 - Added "Deskewing" OCR step (currently empty). Changed position of "detect blocks" OCR step. By now using it to activate the CF PGM blockfind. Added "Segmentation" OCR step (by now, it starts pbm2bm reading from the PGM buffer). Added PBM support to pgmload(). Now the PBM loader pbm2bm() will be used only to perform segmentation, and the same handling applies to both PGM and PBM files. November 22 2001 - Added zfgetc. Added to the z* internal I/O API the ability of thresholding and "reading" the PGM buffer as if it were a PBM file. November 21 2001 - Changed draw_zone to support non-rectangular zones. Added zfread and zfwrite in order to prepare the z* internal I/O API to be a more featured I/O selector to support file, compressed file, internal buffer and TCP I/O. November 20 2001 - Replaced 'button 2' with 'button 3' along the source code. Added handler for mouse button 2. By now, it toggles the max/min view on some windows. Disabled show_hint on waiting_key state. Implemented "Instant thresholding" feature. Added the geometric service inside(). November 19 2001 - Finished reimplementing the CF blockfinder (by now, it can be requested using "C-x p" after loading a PGM file). Changed search_barcode to use the service clusterize(). Added initial support for multiple zones. Copied the development files to the net (20011119). Conformed loadpgm() to the partial execution model. November 17 2001 - Reimplemented partially the old CF blockfinder. Wrote the service clusterize(), to be used by the new blockfinder and also by the barcode searcher. November 16 2001 - Now pgmblock.c is linked with Clara OCR. Implemented PGM visualization. Now any mouse buttonpress clears the message line. November 15 2001 - Scanned the pages 1-32 from the Candido de Figueiredo Dictionary using the HR5 scanner and SANE 1.0.4 (600 dpi). Fixed floating comparison problems due to inexact binary representation at selthresh.pl (see the warning #6 on the script). November 14 2001 - Added "Show pixel" and "Show pattern type" to the "View" menu. Tests using the HR5 scanner. The bundled controller (Domex) didn't work for us (defining SANE_DEBUG_UMAX=128, scanimage stops on the message "waiting scanner"). Using a 2940 instead. November 13 2001 - Fixed background redraw on the junction PAGE/PAGE_OUTPUT. Added mode "PAGE only" (see menu "Options"). When active, the windows "PAGE (output)" and "PAGE (symbol)" become hidden (useful when you need to visualize a larger portion of the scanned image). Fixed a bug on the computation of the bar medium skew. November 12 2001 - Wrote search_barcode. Added "Search barcode" to the "Edit" menu. Enhanced closure_at (added the parameter u). Implemented the laserbeam. Changed the behaviour of button ZOOM at tab PAGE (now it'll zomm the PAGE window almost always). Copied the development files to the net (20011112). November 10 2001 - Fixed a symbol selection problem at PAGE window: those black margins on scanned documents (if any) act as a gigantic symbol that contains almost any pixel. As the symbol selection was based only on checking if the pixel is inside the bounding box, the frames were selected instead of the desired symbol (reported by -> Harold van Oostrom). Added isbar() service. Fixed menu placement problem (the status line was being drawn over the PAGE_FATBITS context menu). November 9 2001 - Enhanced closure_at (no more simple bounding box inclusion, bit state is also tested). Various tests of straight borderlines detection using skewed barcodes. November 8 2001 - Finished segment extension at closure_border_slines(). Finished pixel extension at closure_border_slines(). Now the window PAGE_FATBITS displays on the message line the closure-relative pixel coordinates, its parameter (correlation or distance to the interpolated line) and slope (after requesting "search straight lines"). November 7 2001 - Finished correlation code at closure_border_slines(). Splitted "Search straight lines on border" into options "linear" (based on linear distances) and "quadratic" (based on correlation). November 6 2001 - Fixed display bug (PAGE_FATBITS scrolling). Added "Centralize" and "Search straight lines on border" to the PAGE_FATBITS context menu. Began writing closure_border_slines(). The pbm2cl.c code was reviewed to avoid allocating too much memory on dark pages, or pages with large images. On some tests, the new code allocates 6 times less memory and runs 50% faster. November 5 2001 - Added all three contributed specfiles to the distribution tarball (not a very good idea, but it's the only thing I can do by now). See the file README.RPM for details. fixed a small regeneration bug (TUNE_PATTERN window). Copied the development files to the net (20011105). November 3 2001 - Finished border_path(). 3-bit optimization (see border_path()). November 1 2001 - A service to compute a border path is available (see border_path()). New context menu: PAGE_FATBITS options (pops up when the mouse button 2 is pressed on the PAGE_FATBITS window). Added "See in fatbits" item to PAGE options menu. Added regeneration control to PAGE_FATBITS in order to make easier to implement visualization of pixel-level heuristics. Added "the flea", a visualization feature. If a "flea path" is defined and the fun code is 3, an 'x' will be drawn by the interface walking along the flea path. October 30 2001 - Began implementation of barcode detection. October 29 2001 - -> R P Herrold contributed a spec file for RH 7.2. October 27 2001 - More compression tests. Tried to adapt tic98. Copied the development files to the net (20011026). October 26 2001 - Tested various remappings to try to achieve better PGM compression rates (code available at pgmblock.c). Beats gzip but not bzip2. October 24 2001 - Dumper bug fixed (reported by -> Stuart Yeates). Crash when allocating large buffers using alloca (reported by -> Stuart Yeates). Fixed, now using malloc instead. Added checkings to review_tr() to refuse entering symbols too large as patterns. Added the answer to -> Ho Chak Hung to the Developer's Guide. October 23 2001 - Crash on doubleclicking an anchor at PAGE_LIST (fixed). Fixed a HTML parse problem ('>' as part of the value, reported by -> Stuart Yeates). October 22 2001 - Applied -> Harold van Oostrom's patch to initialize the grid separation and avoid crashing on some unexpected user actions. -> Harold van Oostrom also contributed a spec file for RH and SuSE. October 20 2001 - Added "test" target to the Makefile. Added sselect(), fselect() and bselect(). These are required to implement an auto-test feature. A prototype is already available (try "make test"). Fixed a display bug when filling text input fields. Release 0.9.8. October 18 2001 - Copied the development files to the net (20011018). Fixed an i64 bug. October 17 2001 - Fixed a bug at recog_validation() (reported by -> Stuart Yeates). Fixed a bug at event.c (zoom- now correctly repositionates the page). October 16 2001 - Skeleton code became (more) 20% faster due to optimizations at skel() and cb_border() (the W and H parameters, and the i64 optimization flag, used used there to handle 8 pixels at a time when converting to 8bpp). Added BIG_ENDIAN compilation flag (see clara.c), however this is a work in progress. October 15 2001 - Changed BC to MBB along the manual. Skeleton code became 20% faster due to optimizations at bmcmp_skel(). October 12 2001 - Gave up using the Brazilian Constitution (BC) files as example. Due to the small clearance, it's hard to obtain good results (a larger resolution should solve the problems). Now trying Manuel Bernardes Branco Dictionary (MBB). Added a new strategy to selthresh.pl (see the variables 'clean' and 'small'). October 8 2001 - Finished "A first OCR Project". copied the development files to the net (20011008). October 6 2001 - -> Romeu's RPMs added to the download page. Fixed a pattern comparison bug. Optimized memory copies at bmpcmp_skel. October 5 2001 - Added ab_mem and test_ab_mem. Added switches -l and -y to selthresh.pl. Added -T command-line switch. Robustified selthresh.pl. October 4 2001 - Purchased a Genius HR5 scanner. This scanner is supported by the SANE UMAX backend. October 3 2001 - -> Romeu Mantovani Jr contributed a clara.spec file (to produce RPMs). October 2 2001 - Fixed a problem at mk_page_output, reported by -> Laurent-jan. October 1 2001 - Computed the thresholds for "Exclamations" (Therese of Avila) using selthresh.pl. Began reviewing the documentation to release 0.9.8. September 28 2001 - copied the development files to the net (20010928). September 27 2001 - Tests using -> Emile's binarizer. September 26 2001 - Uploaded the new results for the Candido de Figueiredo Dictionary. Various optimizations at bmpcmp_pd. September 25 2001 - Fixed bug at cml.c (the release 20010924 is unable to read dumps). Adopted per-classifier minimum scores. New feature: "re-scan all paterns" ("Edit" menu). Tests using the pixel distance classifier to handle symbols not classified using skeletons (good results, but slow). September 24 2001 - copied the development files to the net (20010924). New feature: "Show line (geometrical)" ("View" menu). New feature: "Display boxes instead of symbols" ("Options" menu). As the "View" menu became too large, we've moved some entries to the "Options" menu. September 21 2001 - word alignment tests robustified. September 20 2001 - Tested -> Emile Snider deskewer. Added the pattern bitmaps to the Pattern (types) tab (first step to create a manual baseline adjusting tool). Added (horizontal) support to CELLSPACING attribute of TABLE elements. September 19 2001 - New feature: "set pattern type" (Edit menu). Fixed some renderization problems when dismissing menus. Changed the structure ptdesc. September 18 2001 - Four buttons became read-only (alphabet, pattern type, bold and italic). The service enter_wait now supports mode 4 to read a string. September 17 2001 - Fixed the web interface. The section "how to use the web interface" became more detailed (thanks -> Erich Mueller). Skeleton parameters are global again, but it's still possible to use the per-pattern behaviour (see the PATT_SKEL compilation macro). September 15 2001 - Case study based on the recent tests available at http://www.claraocr.org/cf-test/ copied the development files to the net (20010915). September 10-15 2001 - Tests using Candido de Figueiredo Dictionary, 4th edition. Various small fixes or adjustments: pgmblock improved, display the type 0 absent symbols, avoidance of common false positives (the classification now must be performed at least two times), alignment problems diagnosed, etc. September 8 2001 - Fixed: segfaults caused by changing properties of untransliterated symbols (reported by -> Erich Mueller). copied the development files to the net (20010903). September 7 2001 - Added "search unexpected mismatches" feature to prepare_patterns and to the Options menu. This is a tool to detect cases where the behaviour of the classificator is bad. Fixed a classificator bug. This bug was producing occasional false positives. September 6 2001 - -> Bruno Barbieri Gnecco will adapt pgmblock to be used through libgocr. This is a first tentative to make Clara OCR compatable with libgocr. September 5 2001 - pgmblock, a simple text block locator for PGM files, is working. September 4 2001 - Added -a command line option. selthresh.pl, a simple script for selecting the best threshold when converting PGM to PBM, is working. September 3 2001 - -> Terran Melconian reported a bug on the Makefile and contributed a bugfix. copied the development files to the net (20010903). August 31 2001 - Added new skeleton heuristic (#6), based on removing the border until remain only isolated pixels. August 30 2001 - Now it's possible to inhibit the usage of a given classifier for some letters through the pattern types form. The reset service became partially operational. Added block separation heuristic based on the detection of vertical separation lines. Fixed bad behaviour of from_asc when the string field to read is absent. August 28 2001 - -> Erich Mueller reported geometry problems on dumps. August 27 2001 - Added new classifier, based on pixel distances. August 24 2001 - Changed 'setmode' to 'setview' (thanks -> Bruce Momjian). Tests using FreeBSD. August 22 2001 - Finished pattern types form. August 15 2001 - Changed geometry of "tune (skel)" window. August 14 2001 - Added distance-based skeleton heuristic (#5). Better version of skel_qualitu, based on distance from the pixel to the border. Skeleton parameters now are per-pattern. August 13 2001 - Tested -> Adriano's bitmap. August 10 2001 - First version of skel_quality, based on distance from the pixel to the skeleton. August 8 2001 - Service compare_patterns finished. Partially fixed the behaviour of "display comparisons". August 7 2001 - Began implementation of skeleton auto-tune. Fixed a bug in bmpcmp_skel. August 4 2001 - Manual adjustment of pattern types now works. August 3 2001 - Started reworking pattern types. August 2 2001 - -> Nathalie Vielmas told us about blind people needs. July 26 2001 - -> Tim McNerney told us about the NIST OCR. July 18 2001 - Added service dump_cb. Added detection and handling of C-x prefix. July 16 2001 - Release 0.9.7 (first release announced at large). July 4 2001 - Release 0.9.6. June 22 2001 - Release 0.9.5. ** historic - recovered from my agenda ** April 2000 - Version 0.9.c available on the web. November 11-12 1999 - Showed it to various friends. September 19 1999 - First version of the Xlib interface. September 18 1999 - Named it "clara". February 15 1999 - Tests trying to write a web interface, running as a CGI. December 18 1998 - Tests trying to write an interface based on GTK. November 9 1998 - First brute force tests, using manually-built fonts.