2014-06-02: Hunspell 1.3.3 release: - OpenDocument (ODF and Flat ODF) support (ODF needs unzip program) - various bug fixes 2011-02-02: Hunspell 1.3.2 release: - fix library versioning - improved manual 2011-02-02: Hunspell 1.3.1 release: - bug fixes 2011-01-26: Hunspell 1.2.15/1.3 release: - new features: MAXDIFF, ONLYMAXDIFF, MAXCPDSUGS, FORBIDWARN, see manual - bug fixes 2011-01-21: - new features: FORCEUCASE and WARN, see manual - new options: -r to filter potential mistakes (rare words signed by flag WARN in the dictionary) - limited and optimized suggestions 2011-01-06: Hunspell 1.2.14 release: - bug fix 2011-01-03: Hunspell 1.2.13 release: - bug fixes - improved compound handling and other improvements supported by OpenTaal Foundation, Netherlands 2010-07-15: Hunspell 1.2.12 release 2010-05-06: Hunspell 1.2.11 release: - Maintenance release bug fixes 2010-04-30: Hunspell 1.2.10 release: - Maintenance release bug fixes 2010-03-03: Hunspell 1.2.9 release: - Maintenance release bug fixes and warnings - MAP support for composed characters or character sequences 2008-11-01: Hunspell 1.2.8 release: - Default BREAK feature and better hyphenated word suggestion to accept and fix (compound) words with hyphen characters by spell checker instead of by work breaking code of OpenOffice.org. With this feature it's possible to accept hyphenated compound words, such as "scot-free", where "scot" is not a correct English word. - ICONV & OCONV: input and output conversion tables for optional character handling or using special inner format. Example: # Accepting de facto replacements of the Romanian comma acuted letters SET UTF-8 ICONV 4 ICONV ÅŸ È™ ICONV Å£ È› ICONV Åž Ș ICONV Å¢ Èš Typical usage of ICONV/OCONV is to manage an inner format for a segmental writing system, like the Ethiopic script of the Amharic language. - Extended CHECKCOMPOUNDPATTERN to handle conpound word alternations, like sandhi feature of Telugu and other writing systems. - SIMPLIFIEDTRIPLE compound word feature: allow simplified Swedish and Norwegian compound word forms, like tillÃ¥ta (till|lÃ¥ta) and bussjÃ¥før (buss|sjÃ¥før) - wordforms: word generator script for dictionary developers (Hunspell version of unmunch). - bug fixes 2008-08-15: Hunspell 1.2.7 release: - FULLSTRIP: new option for affix handling. With FULLSTRIP, affix rules can strip full words, not only one less characters. - COMPOUNDRULE works with all flag types. (COMPOUNDRULE is for pattern matching. For example, en_US dictionary of OpenOffice.org uses COMPOUNDRULE for ordinal number recognition: 1st, 2nd, 11th, 12th, 22nd, 112th, 1000122nd etc.). - optimized suggestions: - modified 1-character distance suggestion algorithms: search a TRY character in all position instead of all TRY characters in a character position (it can give more readable suggestion order, also better suggestions in the first positions, when TRY characters are sorted by frequency.) For example, suggestions for "moze": ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6), maze, more, mote, ooze, mole etc. (Hunspell 1.2.7). - extended compound word checking for better COMPOUNDRULE related suggestions, for example English ordinal numbers: 121323th -> 121323rd (it needs also a th->rd REP definition). - bug fixes 2008-07-15: Hunspell 1.2.6 release: - bug fix release (fix affix rule condition checking of sk_SK dictionary, iconv support in stemming and morphological analysis of the Hunspell utility, see also Changelog) 2008-07-09: Hunspell 1.2.5 release: - bug fix release (fix affix rule condition checking of en_GB dictionary, also morphological analysis by dictionaries with two-level suffixes) 2008-06-18: Hunspell 1.2.4-2 release: - fix GCC compiler warnings 2008-06-17: Hunspell 1.2.4 release: - add free_list() for C, C++ interfaces to deallocate suggestion lists - bug fixes 2008-06-17: Hunspell 1.2.3 release: - extended XML interface to use morphological functions by standard spell checking interface, spell() and suggest(). See hunspell.3 manual page. - default dash suggestions for compound words: newword-> new word and new-word - new manual pages: hunspell.3, hzip.1, hunzip.1. - bug fixes 2008-04-12: Hunspell 1.2.2 release: - extended dictionary (dic file) support to use multiple base and special dictionaries. - new and improved options of command line hunspell: -m: morphological analysis or flag debug mode (without affix rule data it signs the flag of the affix rules) -s: stemming mode -D: list available dictionaries and search path -d: support extra dictionaries by comma separated list. Example: hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt - forbidding in personal dictionary (with asterisk, / signs affixation) - optional compressed dictionary format "hzip" for aff and dic files usage: hzip example.aff example.dic mv example.aff example.dic /tmp hunspell -d example hunzip example.aff.hz >example.aff hunzip example.dic.hz >example.dic - new affix compression tool "affixcompress": compression tool for large (millions of words) dictionaries. - support encrypted dictionaries for closed OpenOffice.org extensions or other commercial programs - improved manual - bug fixes 2007-11-01: Hunspell 1.2.1 release: - new memory efficient condition checking algorithm for affix rules - new morphological functions: - stem() for stemming - analyze() for morphological analysis - generate() for morphological generation - new demos: - analyze: stemming, morphological analysis and generation - chmorph: morphological conversion of texts 2007-09-05: Hunspell 1.1.12 release: - dictionary based phonetic suggestion for words with special or foreign pronounciation or alternative (bad) transliteration (see Changelog, tests/phone.* and manual). - improved data structure and memory optimization for dictionaries with variable count fields - bug fixes for Unicode encoding dictionaries and ngram suggestions - improved REP suggestions with space: it works without dictionary modification - updated and new project files for Windows API 2007-08-27: Hunspell 1.1.11 release: - portability fixes 2007-08-23: Hunspell 1.1.10 release: - pronounciation based suggestion using Björn Jacke's original Aspell phonetic transcription algorithm (http://aspell.net), relicensed under GPL/LGPL/MPL tri-license with the permission of the author - keyboard base suggestion by KEY (see manual) - better time limits for suggestion search - test environment for suggestion based on Wikipedia data - bug fixes for non standard Mozilla platforms etc. 2007-07-25: Hunspell 1.1.9 release: - better tokenization: - for URLs, mail addresses and directory paths (default: skip these tokens) - for colons in words (for Finnish and Swedish) - new examples: - affixation of personal dictionary words - digits in words - bug fixes (see ChangeLog) 2007-07-16: Hunspell 1.1.8 release: - better Mac OS X/Cygwin and Windows compatibility - fix Hunspell's Valgrind environment and memory handling errors detected by Valgrind - other bug fixes (see ChangeLog) 2007-07-06: Hunspell 1.1.7 release: - fix warning messages of OpenOffice.org build 2007-06-29: Hunspell 1.1.6 release: - check capitalization of the following word forms - words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG - allcap words and suffixes: UNICEF's - UNICEF'S - prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA - suggestion for missing sentence spacing: something.The -> something. The - Hunspell executable: improved locale support - -i option: custom input encoding - use locale data for default dictionary names. - tools/hunspell.cxx: fix 8-bit tokenization (letters without casing, like ß or Hebrew characters now are handled well) - dictionary search path (automatic detection of OpenOffice.org directories) - DICPATH environmental variable - -D option: show directory path of loaded dictionary - patches and bug fixes for Mozilla, OpenOffice.org. 2007-03-19: Hunspell 1.1.5 release: - optimizations: 10-100% speed up, smaller code size and memory footprint (conditional experimental code and warning messages) - extended Unicode support: - non BMP Unicode characters in dictionary words and affixes (except affix rules and conditions) - support BOM sequence in aff and dic files - IGNORE feature for Arabic diacritics and other optional characters - New edit distance suggestion methods: - capitalisation: nasa -> NASA - long swap: permenant -> permanent - long move: Ghandi -> Gandhi, greatful -> grateful - double two characters: vacacation -> vacation - spaces in REP sug.: REP alot a_lot (NOTE: "a lot" must be a dictionary word) - patches and bug fixes for Mozilla, OpenOffice.org, Emacs, MinGW, Aqua, German and Arabic language, etc. 2006-02-01: Hunspell 1.1.4 release: - Improved suggestion for typical OCR bugs (missing spaces between capitalized words). For example: "aNew" -> "a New". http://qa.openoffice.org/issues/show_bug.cgi?id=58202 - tokenization fixes (fix incomplete tokenization of input texts on big-endian platforms, and locale-dependent tokenization of dictionary entries) 2006-01-06: Hunspell 1.1.3.2 release: - fix Visual C++ compiling errors 2006-01-05: Hunspell 1.1.3 release: - GPL/LGPL/MPL tri-license for Mozilla integration - Alias compression of flag sets and morphological descriptions. (For example, 16 MB Arabic dic file can be compressed to 1 MB.) - Improved suggestion. - Improved, language independent German sharp s casing with CHECKSHARPS declaration. - Unicode tokenization in Hunspell program. - Bug fixes (at new and old compound word handling methods), etc. 2005-11-11: Hunspell 1.1.2 release: - Bug fixes (MAP Unicode, COMPOUND pattern matching, ONLYINCOMPOUND suggestions) - Checked with 51 regression tests in Valgrind debugging environment, and tested with 52 OOo dictionaries on i686-pc-linux platform. 2005-11-09: Hunspell 1.1.1 release: - Compound word patterns for complex compound word handling and simple word-level lexical scanning. Ideal for checking Arabic and Roman numbers, ordinal numbers in English, affixed numbers in agglutinative languages, etc. http://qa.openoffice.org/issues/show_bug.cgi?id=53643 - Support ISO-8859-15 encoding for French (French oe ligatures are missing from the latin-1 encoding). http://qa.openoffice.org/issues/show_bug.cgi?id=54980 - Implemented a flag to forbid obscene word suggestion: http://qa.openoffice.org/issues/show_bug.cgi?id=55498 - Checked with 50 regression tests in Valgrind debugging environment, and tested with 52 OOo dictionaries. - other improvements and bug fixes (see ChangeLog) 2005-09-19: Hunspell 1.1.0 release * complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta) * improved ngram suggestion with swap character detection and case insensitivity ------ examples for ngram improvement (input word and suggestions) ----- 1. pernament (instead of permanent) MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented, ornament, ornamentals, ornamental, ornamentally Hunspell 1.0.9: ornamental, ornament, tournament Hunspell 1.1.0: permanent Note: swap character detection 2. PERNAMENT (instead of PERMANENT) MySpell 3.2: - Hunspell 1.0.9: - Hunspell 1.1.0: PERMANENT 3. Unesco (instead of UNESCO) MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's, Frescoed, Fresco, Escorts, Escorting Hunspell 1.0.9: Genesco, Ionesco, Fresco Hunspell 1.1.0: UNESCO 4. siggraph's (instead of SIGGRAPH's) MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's, physiography, digraphs, serigraph, stratigraphy's, stratigraphy epigraphs Hunspell 1.0.9: serigraph's, epigraph's, digraph's Hunspell 1.1.0: SIGGRAPH's --------------- end of examples -------------------- * improved testing environment with suggestion checking and memory debugging memory debugging of all tests with a simple command: VALGRIND=memcheck make check * lots of other improvements and bug fixes (see ChangeLog) 2005-08-26: Hunspell 1.0.9 release * improved related character map suggestion * improved ngram suggestion ------ examples for ngram improvement (O=old, N = new ngram suggestions) -- 1. Permenant (instead of Permanent) O: Endangerment, Ferment, Fermented, Deferment's, Empowerment, Ferment's, Ferments, Fermenting, Countermen, Weathermen N: Permanent, Supermen, Preferment Note: Ngram suggestions was case sensitive. 2. permenant (instead of permanent) O: supermen, newspapermen, empowerment, endangerment, preferments, preferment, permanent, preferment's, permanently, impermanent N: permanent, supermen, preferment Note: new suggestions are also weighted with longest common subsequence, first letter and common character positions 3. pernemant (instead of permanent) O: pimpernel's, pimpernel, pimpernels, permanently, permanents, permanent, supernatant, impermanent, semipermanent, impermanently N: permanent, supernatant, pimpernel Note: new method also prefers root word instead of not relevant affixes ('s, s and ly) 4. pernament (instead of permanent) O: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented, ornament, ornamentals, ornamental, ornamentally N: ornamental, ornament, tournament Note: Both ngram methods misses here. 5. obvus (instad of obvious): O: obvious, Corvus, obverse, obviously, Jacobus, obtuser, obtuse, obviates, obviate, Travus N: obvious, obtuse, obverse Note: new method also prefers common first letters. 6. unambigus (instead of unambiguous) O: unambiguous, unambiguity, unambiguously, ambiguously, ambiguous, unambitious, ambiguities, ambiguousness N: unambiguous, unambiguity, unambitious 7. consecvence (instead of consequence) O: consecutive, consecutively, consecutiveness, nonconsecutive, consequence, consecutiveness's, convenience's, consistences, consistence N: consequence, consecutive, consecrates An example in a language with rich morphology: 8. Misisipiben (instead of Mississippiben [`in Mississippi' in Hungarian]): O: Misikédéiben, Pisisedéiben, Misikéiéiben, Pisisekéiben, Misikéiben, Misikéidéiben, Misikékéiben, Misikéikéiben, Misikéiméiben, Mississippiiben N: Mississippiben, Mississippiiben, Misiiben Note: Suggesting not relevant affixes was the biggest fault in ngram suggestion for languages with a lot of affixes. --------------- end of examples -------------------- * support twofold prefix cutting * lots of other improvements and bug fixes (see ChangeLog) * test Hunspell with 54 OpenOffice.org dictionaries: source: ftp://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries testing shell script: ------------------------------------------------------- for i in `ls *zip | grep '^[a-z]*_[A-Z]*[.]'` do dic=`basename $i .zip` mkdir $dic echo unzip $dic unzip -d $dic $i 2>/dev/null cd $dic echo unmunch and test $dic unmunch $dic.dic $dic.aff 2>/dev/null | awk '{print$0"\t"}' | hunspell -d $dic -l -1 >$dic.result 2>$dic.err || rm -f $dic.result cd .. done -------------------------------------------------------- test result (0 size is o.k.): $ for i in *_*/*.result; do wc -c $i; done 0 af_ZA/af_ZA.result 0 bg_BG/bg_BG.result 0 ca_ES/ca_ES.result 0 cy_GB/cy_GB.result 0 cs_CZ/cs_CZ.result 0 da_DK/da_DK.result 0 de_AT/de_AT.result 0 de_CH/de_CH.result 0 de_DE/de_DE.result 0 el_GR/el_GR.result 6 en_AU/en_AU.result 0 en_CA/en_CA.result 0 en_GB/en_GB.result 0 en_NZ/en_NZ.result 0 en_US/en_US.result 0 eo_EO/eo_EO.result 0 es_ES/es_ES.result 0 es_MX/es_MX.result 0 es_NEW/es_NEW.result 0 fo_FO/fo_FO.result 0 fr_FR/fr_FR.result 0 ga_IE/ga_IE.result 0 gd_GB/gd_GB.result 0 gl_ES/gl_ES.result 0 he_IL/he_IL.result 0 hr_HR/hr_HR.result 200694989 hu_HU/hu_HU.result 0 id_ID/id_ID.result 0 it_IT/it_IT.result 0 ku_TR/ku_TR.result 0 lt_LT/lt_LT.result 0 lv_LV/lv_LV.result 0 mg_MG/mg_MG.result 0 mi_NZ/mi_NZ.result 0 ms_MY/ms_MY.result 0 nb_NO/nb_NO.result 0 nl_NL/nl_NL.result 0 nn_NO/nn_NO.result 0 ny_MW/ny_MW.result 0 pl_PL/pl_PL.result 0 pt_BR/pt_BR.result 0 pt_PT/pt_PT.result 0 ro_RO/ro_RO.result 0 ru_RU/ru_RU.result 0 rw_RW/rw_RW.result 0 sk_SK/sk_SK.result 0 sl_SI/sl_SI.result 0 sv_SE/sv_SE.result 0 sw_KE/sw_KE.result 0 tet_ID/tet_ID.result 0 tl_PH/tl_PH.result 0 tn_ZA/tn_ZA.result 0 uk_UA/uk_UA.result 0 zu_ZA/zu_ZA.result In en_AU dictionary, there is an abbrevation with two dots (`eqn..'), but `eqn.' is missing. Presumably it is a dictionary bug. Myspell also haven't accepted it. Hungarian dictionary contains pseudoroots and forbidden words. Unmunch haven't supported these features yet, and generates bad words, too. * check affix rules and OOo dictionaries. Detected bugs in cs_CZ, es_ES, es_NEW, es_MX, lt_LT, nn_NO, pt_PT, ro_RO, sk_SK and sv_SE dictionaries). Details: -------------------------------------------------------- cs_CZ warning - incompatible stripping characters and condition: SFX D us ech [^ighk]os SFX D us y [^i]os SFX Q os ech [^ghk]es SFX M o ech [^ghkei]a SFX J ém ej ám SFX J ém ejme ám SFX J ém ejte ám SFX A ou¾it up oupit SFX A ou¾it upme oupit SFX A ou¾it upte oupit SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy es_ES warning - incompatible stripping characters and condition: SFX W umar úse [ae]husar SFX W emir iñáis eñir es_NEW warning - incompatible stripping characters and condition: SFX I unan únen unar es_MX warning - incompatible stripping characters and condition: SFX A a ote e SFX W umar úse [ae]husar SFX W emir iñáis eñir lt_LT warning - incompatible stripping characters and condition: SFX U ti siuosi tis SFX U ti siuosi tis SFX U ti siesi tis SFX U ti siesi tis SFX U ti sis tis SFX U ti sis tis SFX U ti simës tis SFX U ti simës tis SFX U ti sitës tis SFX U ti sitës tis nn_NO warning - incompatible stripping characters and condition: SFX D ar rar [^fmk]er SFX U Øre orde ere SFX U Øre ort ere pt_PT warning - incompatible stripping characters and condition: SFX g ãos oas ão SFX g ãos oas ão ro_RO warning - bad field number: SFX L 0 le [^cg] i SFX L 0 i [cg] i SFX U 0 i [^i] ii warning - incompatible stripping characters and condition: SFX P l i l [<- there is an unnecessary tabulator here) SFX I a ii [gc] a warning - bad field number: SFX I a ii [gc] a SFX I a ei [^cg] a sk_SK warning - incompatible stripping characters and condition: SFX T µa» olú kla» SFX T µa» olúc kla» SFX T sµa» ¹lú sla» SFX T sµa» ¹lúc sla» SFX R µc» lèiem åc» SFX R iás» ätie mias» SFX R iez» iem [^i]ez» SFX R iez» ie¹ [^i]ez» SFX R iez» ie [^i]ez» SFX R iez» eme [^i]ez» SFX R iez» ete [^i]ez» SFX R iez» ú [^i]ez» SFX R iez» úc [^i]ez» SFX R iez» z [^i]ez» SFX R iez» me [^i]ez» SFX R iez» te [^i]ez» sv_SE warning - bad field number: SFX C 0 net nets [^e]n -------------------------------------------------------- 2005-08-01: Hunspell 1.0.8 release - improved compound word support - fix German S handling - port MySpell files and MAP feature 2005-07-22: Hunspell 1.0.7 release 2005-07-21: new home page: http://hunspell.sourceforge.net