2007-11-01 Németh László : * hunspell/*: new feature: morphological generation, also fix experimental morphological analysis and stemming. - new API functions and improved API: - analyze(word): (instead of morph()) morphological analysis - stem(word): stemming - stem(list): stemming based on the result of an analysis - generate(word, word2): morphological generation - generate(word, list): morphological generation - add(word): add word to the run-time dictionary (renamed put_word()) - add_with_affix(word, word2): (renamed put_word_pattern()): add word to the run-time dictionary with affix flags of the second parameter: all affixed forms of the user words will be recognised by the spell checker. Especially useful for agglutinative languages. - remove(word): remove word from the run-time dictionary (not implemented) - see manual and hunspell/hunspell.hxx header and tests/morph.* * tests/morph.*: test data, example for morphological analysis, stemming and generation * tools/analyze, tools/chmorph: extended and new demo applications: - analyze (originally hunmorph): analyses and stems input words, generates word forms from input word pairs. - chmorph: morphological transformation filter * configure.ac, hunspell/makefile.am: set library version number. Bug reported by Rene Engelhard. * affentry.cxx, affixmgr.cxx: new pattern matching algorithm in condition checking of affix rules instead of the Dömölki-algorithm: - Unlimited condition length (instead of max. 8 characters). - Less memory consumption, especially useful for affix rich languages: 5,4 MB memory savings with hu_HU dictionary. - Speed change depends from dictionaries and CPU caches: English spell checking is 4% faster on Linux words with en_US dictionary, Hungarian spell checking is 25% slower on most frequent words of Hungarian Webcorpus. * tests/sug.*, sugutf.*: updated test data (use "a" and "lot" dictionary items instead of "a lot".) * src/hunspell/hunspell.cxx: free(csconv) instead of delete csconv. Report and patch by Sylvain Paschein in Mozilla Issue 398268. * suggestmgr.cxx, tools/hunspell.cxx: bad spelling of "misspelled". Ubuntu Bug #134792, patch by Malcolm Parsons. * tests/base_utf.*: use Unicode apostrophe instead of 8-bit one. * hunspell.cxx, hashmgr.cxx: add(): use HashMgr::add() 2007-10-25 Pavel Janík : * hunspell/csutil.cxx: Fix type cast warnings on 64bit Linux in printing of character positions in u8_u16(). OOo issue 82984. 2007-09-05 Németh László : * win_api/Hunspell.vproj, parsers/testparser.cxx,textparser.hxx: warning fixes and removing unnecessary Windows project file. Reported by Ingo H. De Boer. * hashmgr.*, {affixmgr,suggestmgr}.cxx: optimized data structure for variable-count fields (only "ph" transliteration field in this version, see next item). Also less memory consumption: -13% (0.75 MB) with en_US dictionary, -6% (1 MB) with hu_HU. * suggestmgr.cxx: dictionary based phonetic suggestion for special or foreign pronounciation (see also rule-based PHONE in manual). Usage: tab separated field in dictionary lines, started with "ph:". The field contains a phonetic transliteration of the word: Marseille ph:maarsayl * tests/phone.*: test data for dictionary and rule based phonetic suggestion. * hunspell.cxx: fix potential bad memory access in allcap word capitalization in suggest() (bug of previous version). * hunspell.cxx, atypes.hxx: set correct limit for UTF-8 encoded input words (256 byte). * suggestmgr.cxx: improved REP suggestions with spaces: it works without dictionary modification. OOo issue 80147, reported by Davide Prina. * tests/rep.*: new test data: higher priority for "alot" -> "a lot", and Italian suggestion "un'alunno" -> "un alunno". * affixmgr.cxx: fix Unicode ngram suggestions in expand_rootword(). (Suggestions with bad affixes.) Bug reported by Vitaly Piryatinksy . * tests/ngram_utf_fix.*: test based on Vitaly Piryatinksy's data. * suggestmgr.cxx: fix twowords() for last UTF-8 multibyte character. (conditional jump or move depended on uninitialised value). 2007-08-29 Ingo H. De Boer : * win_api/{hunspell,libhunspell, testparser}.vcproj: new project files for the library and the executables. * Hunspell.rc, Hunspell.sln, config.h: updated versions. Version number problem also reported by András Tímár. 2007-08-27 Németh László : * suggestmgr.hxx: put fixed version. Bug report by Ingo H. De Boer. * suggestmgr.cxx: remove variable-length local character array reported by Ingo H. De Boer. 2007-08-27 Németh László : * suggestmgr.hxx: change bad time_t to clock_t in header, too. Bug reports or patches by Ingo H. De Boer under SF.net Bug ID 1781951, János Mohácsi and Gábor Zahemszky, András Tímár, OMax3 at SF.net under SF.net Bug ID 1781592. * phonet.*: change variable-length local character array to portable fixed size character array. Problem reported by Ingo H. De Boer under SF.net Bug ID 1781951 and Ryan VanderMeulen. * suggestmgr.cxx: remove debug message (also by Ingo H. De Boer). 2007-08-26 Ingo H. De Boer : * win_api/Hunspell.vcproj: updated version (with phonet.*) 2007-08-23 Németh László : * phonet.{c,h}xx, suggestmgr.cxx: PHONE parameter: pronounciation based suggestion using Björn Jacke's original Aspell phonetic transcription algorithm (http://aspell.net), relicensed under GPL/LGPL/MPL tri-license with the permission of the author. Usage: see manual. * affixmgr,suggestmgr.cxx: add KEY parameter for keyboard and input method error related suggestions. Example: KEY qwertyuiop|asdfghjkl|zxcvbnm * man/hunspell.4: description about PHONE and KEY suggestion parameters. * suggestmgr.cxx: enhancements for better suggestions: - Set ngram suggestions for badchar-type errors and only two word and compound word suggestions, too. - Separate not compound and compound word suggestions for MAP suggestion, too. - Double swap suggestions for short words. For example: ahev -> have, hwihc -> which. - Better time limits using clock() instead of time() (tenths of a second resolution instead of second ones). - leftcommonsubstring() weigth function. * htype.hxx, hashmgr.cxx: blen (byte length) and clen (character length) fields instead of wlen * affixmgr.cxx: fix get_syllable() for bad Unicode inputs. * tests/suggestiontest/*: test environment for suggestions 2007-08-07 Martijn Wargers: * csutil.cxx: fix Mingw build error associated with ToUpper() call. Report and patch in Mozilla Issue 391447. 2007-08-07 Robert Longson: * atypes.cxx: use empty inline function HUNSPELL_WARNING instead of variadic macros to switch of Hunspell warnings. Reported by Gavin Sharp in Mozilla Issue 391147. 2007-08-05 Ginn Chen: * hashmgr.cxx: Hunspell failed to compile on OpenSolaris (use stdio instead of csdio). Report and patch in Mozilla Issue 391040. 2007-07-25 Németh László : * parsers/*.cxx: Hunspell executable recognises and accepts URLs, e-mail addresses, directory paths, reported by Jeppe Bundsgaard. * src/tools/hunspell.cxx: --check-url: new option of Hunspell program. Use --check-url, if you want check URLs, e-mail addresses and paths. * parsers/textparser.cxx: strip colon at end of words for Finnish and Swedish (colon may be in words in Finnish and Swedish). Problem reported by Lars Aronsson. * tests/colons_in_words.*: test data * tests/digits_in_words.*: example for using digits in words (eg. 1-jährig, 112-jährig etc. in German), reported by Lars Aronsson. * hashmgr.cxx: Hunspell accepts allcaps forms of mixed case words of personal dictionaries (+allcaps custom dictionary words with allcaps affixes). Sf.net Bug ID 1755272, reported by Ellis Miller. * hashmgr.cxx: fix small memory leaks with alias compressed dictionaries (free flag vectors of affixed personal dictionary words and flag vectors of hidden capitalized forms of mixed case and allcaps words). * affixmgr.cxx: fix COMPOUNDRULE checking with affixed compounds. Sf.net Bug ID 1706659, reported by Björn Jacke. Also fixing for OOo Issue 76067 (crash-like deceleration for hexadecimal numbers with long FFFFFF sequence using en_US dictionary). * tools/hunspell.cxx: add missing return to save_privdic(). * man/hunspell.4: add information about affixation of personal words: "Personal dictionaries are simple word lists, but with optional word patterns for affixation, separated by a slash: foo Foo/Simpson In this example, "foo" and "Foo" are personal words, plus Foo will be recognised with affixes of Simpson (Foo's etc.)." 2007-07-18 Németh László : * src/win_api/: add missing resource files, reported by Ingo H. De Boer. 2007-07-16 Németh László : * hunspell.cxx: fix dot removing from UTF-8 encoded words in cleanword2() (Capitalised words with dots, as "Something." were not recognised using Unicode encoded dictionaries.) * tests/{base.*,base_utf.*}: extended and new test files for dot removing and Unicode support. * tools/hunspell.cxx: fix Cygwin, OS X compatibility using platform specifics iconv() header by ICONV_CONST macro of Autoconf. Sf.net Bug ID 1746030, reported by Mike Tian-Jian Jiang. Sf.net Bug ID 1753939, reported by Jean-Christophe Helary. * tools/hunspell.cxx: fix missing global path setting with -d option. * tests/test.sh: fix broken Valgrind checking (missing warnings with VALGRIND=memcheck make check). * csutil.cxx: fix condition in u8_u16() to avoid invalid read of not null-terminated character arrays (detected by Valgrind in Hunspell executable: associated with 8-bit character table conversion in tools/hunspell.cxx). * csutil.cxx: free_utf_tbl(): use utf_tbl_count-- instead of utf_tbl--. Memory leak in Hunspell executable detected by Valgrind. * hashmgr.cxx: add missing free_utf_tbl(), memory leak in Hunspell executable detected by Valgrind. * hashmgr.cxx: load_tables(): fix memory error in spec. capitalization. Use sizeof(unsigned short) instead of bad sizeof(unsigned short*). Invalid memory read detected by Valgrind. * hashmgr.cxx: add_word(): fix memory error in spec. capitalization. Update also affix array length of capitalized homonyms. Invalid memory read detected by Valgrind. * hunspell.cxx: suggest(): fix invalid memory write and leak. Bad realloc() and missing free() detected by Valgrind associated with suggestions for "something.The" type spelling errors. * {dictmgr,csutil,hashmgr,suggestmgr}.cxx: check memory allocation. Sf.net Bug ID 1747507, based on the patch by Jose da Silva. 2007-07-13 Ingo H. De Boer : * atypes.cxx: fix Visual C compatibility: Using "HUNSPELL_WARNING(a,b,...} {}" macro instead of empty "X(a,b...)". * hunspell.cxx: changes for Windows API. * win_api/Hunspell.*: new resource files * win_api/hunspelldll.*: set optional Hunspell and Borland spec. codes Sf.net Bug ID 1753802, patch by Ingo H. de Boer. See also Sf.net Bug ID 1751406, patch by Mike Tian-Jian Jiang. 2007-07-09 Caolan McNamara : * {hunspell,hashmgr,affentry}.cxx: fix warnings of Coverity program analyzer. Sf.net Bug ID, 1750219. 2007-07-06 Németh László : * atypes.cxx: warning-free swallowing of conditional warning messages and their parameters using empty HUNSPELL_WARNING(a,b...) macro. * {affixmgr,atypes,csutil}.cxx: fix unused variable warnings using WARNVAR macro for conditionally named variables. * hashmgr.cxx: fix unused variable warning in add_word() by cond. name * hunspell.cxx: fix shadowed declaration of captype var. in suggest() 2006-06-29 Caolan McNamara : * hunspell.cxx: patch to fix possible memory leak in analyze() of experimental morphological analyzer code. Sf.net Bug ID 1745263. 2007-06-29 Németh László : improvements: * src/hunspell/hunspell.cxx: check bad capitalisation of Dutch letter IJ. - Sf.net Feature Request ID 1640985, reported by Frank Fesevur. - Solution: FORBIDDENWORD for capitalised word forms (need an improved Dutch dictionary with forbidden words: Ijs/*, etc.). * tests/IJ.*: test data and example. * hashmgr.cxx, hunspell.cxx: check capitalization of special word forms - words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG Sf.net Bug ID 1398550, reported by Dmitri Gabinski. - allcap words and suffixes: UNICEF's - UNICEF'S - prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA For Catalan, French and Italian languages. Reported by Davide Prina in OOo Issue 68568. * tests/allcaps*: tests for OPENOFFICE.ORG, UNICEF'S capitalization. * tests/i68568*: tests for SANT'ELIA capitalization. * hunspell/hunspell.cxx: suggestion for missing sentence spacing: something.The -> something. The * tools/hunspell.cxx: multiple character encoding support - -i option: custom input encoding Sf.net Bug ID 1610866, reported by Thobias Schlemmer. Sf.net Bug ID 1633413, reported by Dan Kenigsberg. See also hunspell-1.1.5-encoding.patch of Fedora from Caolan Mc'Namara. * tests/*.test: add input encodings * tools/hunspell.cxx: use locale data for default dictionary names. Sf.net Bug ID 1731630, report and patch from Bernhard Rosenkraenzer, See also hunspell-1.1.4-defaultdictfromlang.patch of Fedora Linux from Caolan McNamara. * tools/hunspell.cxx: fix 8-bit tokenization (letters without casing, like ß or Hebrew characters now are handled well) * tools/hunspell.cxx: dictionary search path - DICPATH environmental variable - -D option: show directory path of loaded dictionary - automatic detection of OpenOffice.org directories fixes: * affixmgr.cxx: fault-tolerant patch for REP and other affix table data problems. Problem with Hunspell and en_GB dictionary reported by Thomas Lange in OOo Issue 76098 and Stephan Bergmann in OOo Issue 76100. Sf.net Bug ID 1698240, reported by Ingo H. de Boer. * csutil.cxx: fix mkallcap_utf() for allcaps suggestion in UTF-8. * suggestmgr.cxx: fix bad movechar_utf() (missing strlen()). * hunspell.cxx: fix bad degree sign detection in Unicode hu_HU environment. * hunspell/hunspell.cxx: free allocated memory of csconv in ported Mozilla code. - Mozilla Bugzilla Bug 383564, report and Mozilla MySpell patch by Andrew Geul. Reported by Ryan VanderMeulen for Hunspell. * suggestmgr.cxx: fix minor difference in Unicode suggestion (ngram suggestion of allcaps words in Unicode). * hashmgr.cxx: close file handle after errors. Sf.net Bug ID 1736286, reported by John Nisly. * configure.ac: syntax error (shell variable with spaces). Sf.net Bug ID 1731625, reported by Bernhard Rosenkraenzer. * hunspell.cxx: check_word(): fix bad usage of info pointer. * hashmgr.cxx: fix de_DE related bug (accept words with leading dash). Sf.net Bug ID 1696134, reported by Björn Jacke. * suggestmgr.cxx, tests/1695964.*: fix NEEDAFFIX homonym suggestion. Sf.net Bug ID 1695964, reported by Björn Jacke. * tests/1463589*: capitalized ngram suggestion test data for Sf.net Bug ID 1463589, reported by Frederik Fouvry. * csutil.cxx, affixmgr.cxx: fix possible heap error with multiple instances of utf_tbl. Sf.net Bug ID 1693875, reported by Ingo H. de Boer. * affixmgr.cxx, suggestmgr.cxx, license.hunspell: convert to ASCII. Locale dependent compiling problems. Sf.net Bug ID 1694379, reported by Mike Tian-Jian Jiang. OOo Issue 78018 reported by Thomas Lange. * tests/test.sh: compatibility issues - fix Valgrind support (check shared library instead of shell wrapper) - remove deprecated "tail +2" syntax - set 8-bit locale for testing (LC_ALL=C) * hunspell.hxx: remove license.* and config.h dependencies. - hunspell-1.1.5-badheader.patch from Caolan McNamara 2007-03-21 Németh László : * tools/Makefile.am, munch.h, unmunch.h: add missing munch.h and unmunch.h Reported by Björn Jacke and Khaled Hosny (sf.net Bug ID 1684144) * hunspell/hunspell.cxx, hunspell.hxx: fix --with-ui compliling error (add get_csconv()) Reported by Khaled Hosny (sf.net Bug ID 1685010) 2007-03-19 Németh László : * csutil.cxx, hunspell/hunspell.cxx: Unicode non BMP area (>65K character range) support (except conditional patterns and strip characters of affix rules) * tests/utf8_nonbmp*: test data * src/hunspell/*: add Mozilla patches from David Einstein - run-time generated 8-bit character tables - other Mozilla related changes (see Mozilla Bugzilla Bug 319778) * csutil.cxx, affixmgr.cxx, hashmgr.cxx: optimized version of IGNORE feature - IGNORE works with affixes (except strip characters and affix conditions) * tests/ignore*: test data with latin characters * tests/ignoreutf*: Unicode test data with Arabic diacritics (Harakat) * src/hunspell/suggestmgr.cxx: new edit distance suggestion methods - capitalization: nasa -> NASA - long swap: permenant -> permanent - long mov.: Ghandi -> Gandhi - double two characters: vacacation -> vacation * tests/sug.*: test data * src/hunspell/affixmgr.cxx: space in REP strings (alot -> a lot) Note: Underline character signs the space in REP strings: REP alot a_lot, and put the expression with space ("a lot") into the dic file (see tests/sug). * hashmgr.cxx, affixmgr.cxx: ignore Unicode byte order mark (BOM sequence) * tests/utf8_bom*: test data * hunspell/*.cxx: OOo Issue 68903 - Make lingucomponent warning-free on wntmsci10 - fix Hunspell related warning messages on Windows platform (except some assignment within conditional expressions). Reported and started by Stephan Bergmann. * hunspell/affixmgr.cxx: fix OOo Issue 66683 - hunspell dmake debug=x fails - Reported by Stephan Bergmann. * src/hunspell/hunspell.[ch]xx: thread safe API for Hunspell executable (removing prev*() functions, new spell(word, info, root) function) * configure.ac, src/hunspell/*: HUNSPELL_EXPERIMENTAL code --with-experimental configure option (conditional compiling of morphological analyser and stemmer tools) * configure.ac, src/hunspell/*: conditional Hunspell warning messages --with-warnings configure option * affixmgr.cxx: new, optimized parsing functions * affixmgr.cxx: fix homonym handling for German dictionary project, reported by Björn Jacke (sf.net Bug ID 1592880). * tests/1592880.*: test data by Björn Jacke * src/hunspell/affixmgr.cxx: fix CIRCUMFIX suggestion Bug reported by Erdal Ronahi. * hunspell.cxx: reverse root word output (complex prefixes) Bug reported by Munzir Taha. * tools/hunspell.cxx: fix Emacs compatibility, patch by marot at sf.net - no % command in PIPE mode (SourceForge BugTracker 1595607) - fix HUNSPELL_VERSION string * suggestmgr.[hc]xx: rename check() functions to checkword() (OOo Issue 68296) adopt MySpell patch by Bryan Petty (tierra at ooo) for Hunspell source * csutil.cxx, munch.c, unmunch.c: adopt relevant parts of the MinGW patch (OOo Issue 42504) by tonal at ooo * affigmgr.cxx: remove double candidate_check() call, reported by Bram Moolenaar * tests/test.sh: add LC_ALL="C" environment. Locale dependency of make check reported by Gentoo project. * src/tools/hunspell.cxx: UTF-8 highlighting fix for console UI (not solved: breaking long UTF-8 lines) * src/tools/unmunch.c: fix bad generation if strip is shorter than condition, reported by Davide Prina * src/tools/unmunch.h: increase 5000 -> 500000 * src/tools/hunspell.cxx: fix memory error in suggestion (uninitialized parameter), Bug also reported by Björn Jacke in SourceForge Bug 1469957 * csutil.cxx, affixmgr.cxx: fix Caolan McNamara's patch for non OOo environment 2006-11-11 Caolan McNamara : * csutil.cxx, affixmgr.cxx: UTF-8 table patch (OOo Issue 71449) Description: memory optimization (OOo doesn't use the large UTF-8 table). * Makefile.am: shared library patch (Sourceforge ID 1610756) * hunspell.h, hunspell.cxx: C API patch (Sourceforge ID 1616353) * hunspell.pc: pkgconfig patch (Sourceforge ID 1639128) 2006-10-17 Ryan Jones : * affixmgr.cxx: missing fclose(affixlst) calls Reported by in OOo Issue 70408 2007-07-11 Taha Zerrouki : * affixmgr.cxx, hunspell.cxx, hashmgr.cxx, csutil.cxx: IGNORE feature to remove optional Arabic and other characters from input and dictionary words. * src/hunspell/langnum.hxx: add Arabic language number, lang_ar=96 * tests/ignore.*: test data 2006-05-28 Miha Vrhovnik : * src/win_api/*: C API for Windows DLLs - also Delphi text editor example (see on Hunspell Sourceforge page) 2006-05-18 Kevin F. Quinn : * utf_info.cxx: struct -> static struct Shared library patch also developed by Gentoo developers (Hanno Meyer-Thurow, Diego Pettenò, Kevin F. Quinn) 2006-02-02 Németh László : * src/hunspell/hunspell.cxx: suggest(): replace "fooBar" -> "foo bar" suggestions with "fooBar" ->"foo Bar" (missing spaces are typical OCR bugs). Bug reported by stowrob at OOo in Issue 58202. * src/hunspell/suggestmgr.cxx: twowords(): permit 1-character words. (restore MySpell's original behavior). Here: "aNew" -> "a New". * tests/i58202.*: test data * src/parsers/textparser.cxx: fix Unicode tokenization in is_wordchar() (extra word characters (WORDCHARS) didn't work on big-endian platforms). * src/hunspell/{csutil,affixmgr}.cxx: inline isSubset(), isRevSubset(): little speed optimalization for languages with rich morphology. * src/tools/hunspell.cxx: fix bad --with-ui and --with-readline compiling when (N)curses is missing. Reported by Daniel Naber. 2006-01-19 Tor Lillqvist * src/hunspell/csutil.cxx: mystrsep(): fix locale-dependent isspace() tokenization 2006-01-06 András Tímár * src/hunspell/{hashmgr.hxx,hunspell.cxx}: fix Visual C++ compiling errors 2006-01-05 Németh László : * COPYING: set GPL/LGPL/MPL tri-license for Mozilla integration. Rationale: Mozilla source code contains an old MySpell version with GPL/LGPL/MPL tri-license. (MPL license is a copyleft license, similar to the LGPL, but it acts on file level.) * COPYING.LGPL: GNU Lesser General Public License 2.1 (LGPL) * COPYING.MPL: Mozilla Public License 1.1 (MPL) * license.hunspell, src/hunspell/license.hunspell: GPL/LGPL/MPL tri-license * src/hunspell/{affixmgr,hashmgr}.*: AF, AM alias definitions in affix file: compression of flag sets and morphological descriptions (see manual, and tests/alias* test files). Rationale: Alias compression is also good for loading time and memory efficiency, not only smaller resources. * src/tools/makealias: alias compression utility (usage: ./makealias file.dic file.aff) * tests/alias{,2,3}: AF, AM tests * man/hunspell.4: add AF, AM documentation * src/hunspell/affentry.cxx, atypes.hxx: add new opts bits (aeALIASM, aeALIASF) * tools/hunspell, src/parser/*, src/hunspell/*: Hunspell program tokenizes Unicode texts (only with UTF-8 encoded dictionaries). Missing Unicode tokenization reported by Björn Jacke, Egmont Koblinger, Jess Body and others. Note: Curses interactive interface hasn't worked perfectly yet. * tests/*.tests: remove -1 parameters of Hunspell * tests/*.{good,wrong}: remove tabulators * src/hunspell/{hunspell,affixmgr}.cxx: BREAK option: break words at specified break points and checking word parts separately (see manual). Note: COMPOUNDRULE is better (or will be better) for handling dashes and other compound joining characters or character strings. Use BREAK, if you want check words with dashes or other joining characters and there is no time or possibility to describe precise compound rules with COMPOUNDRULE. * tests/break.*: BREAK example. * src/hunspell/{affixmgr,hunspell}.cxx: add CHECKSHARPS declaration instead of LANG de_DE definitions to handle German sharp s in both spelling and suggestion. * src/hunspell/hunspell.cxx: With CHECKSHARPS, uppercase words are valid with both lower sharp s (it's is optional for names in German legal texts) and SS (MÜßIG, MÜSSIG). Missing lower sharp s form reported by Björn Jacke. * src/hunspell/hunspell.cxx: KEEPCASE flag on a sharp s word has a special meaning with CHECKSHARPS declaration: KEEPCASE permits capitalisation and SS upper casing of a sharp s word (Müßig and MÜSSIG), but forbids the upper cased form with lower sharp s character(s): *MÜßIG. * tests/germancompounding*: add CHECKSHARPS, remove LANG * tests/checksharps*: add CHECKSHARPS and KEEPCASE, remove LANG * src/hunspell/hunspell.cxx: improved suggestions: - suggestions for pressed Caps Lock problems: macARONI -> macaroni - suggestions for long shift problems: MAcaroni -> Macaroni, macaroni - suggestions for KEEPCASE words: KG -> kg * src/hunspell/csutil.cxx: fix mystrrep() function: - suggestions for lower sharp s in uppercased words: MÜßIG -> MÜSSIG * tests/checksharps{,utf}.sug: add tests for mystrrep() fix * src/hunspell/hashmgr.cxx: Now dictionary words can contain slashes with the "\/" syntax. Problem reported by Frederik Fouvry. * src/hunspell/hunspell.cxx: fix bad duplicate filter in suggest(). (Suggesting some capitalised compound words caused program crash with Hungarian dictionary, OOo Issue 59055). * src/hunspell/affixmgr.cxx: fix bad defcpd_check() call in compound_check(). (Overlapping new COMPOUNDRULE and old compounding methods caused program crash at suggestion.) * src/hunspell/affixmgr.{cxx,hxx}: check affix flag duplication at affix classes. Suggested by Daniel Naber. * src/hunspell/affentry.cxx: remove unused variable declarations (OOo i58338). Compiler warnings reported by András Tímár and Martin Hollmichel. * src/hunspell/hunspell.cxx: morph(): not analyse bad mixed uppercased forms (fix Arabic morphological analysis with Buckwalter's Arabic transliteration) * src/hunspell/affentry.{cxx,hxx}, atypes.hxx: little memory optimization in affentry: - using unsigned char fields instead of short (stripl, appndl, numconds) - rename xpflg field to opts - removing utf8 field, use aeUTF8 bit of opts field * configure.ac: set tests/maputf.test to XFAILED on ARM platform. Fail reported by Rene Engelhard. * configure.ac: link Ncursesw library, if exists. * BUGS: add BUGS file * tests/complexprefixes2.*: test for morphological analysis with COMPLEXPREFIXES * src/hunspell/affixmgr.cxx: use "COMPOUNDRULE" instead of "COMPOUND". The new name suggested by Bram Moolenaar. * tests/compoundrule*: modified and renamed compound.* test files * man/hunspell.4: AF, AM, BREAK, CHECKSHARPS, COMPOUNDRULE, KEEPCASE. - also new addition to the documentation: Header of the dictionary file define approximate dictionary size: ``A dictionary file (*.dic) contains a list of words, one per line. The first line of the dictionaries (except personal dictionaries) contains the _approximate_ word count (for optimal hash memory size).'' Asked by Frederik Foudry. One-character replacements in REP definitions: ``It's very useful to define replacements for the most typical one-character mistakes, too: with REP you can add higher priority to a subset of the TRY suggestions (suggestion list begins with the REP suggestions).'' 2005-11-11 Németh László : * src/hunspell/affixmgr.*: fix Unicode MAP errors (sorted only n-1 characters instead of n ones in UTF-16 MAP character lists). Bug reported by Rene Engelhard. * src/hunspell/affixmgr.*: fix infinite COMPOUND matching (default char type is unsigned on PowerPC, s390 and ARM platforms and it will never be negative). Bug reported by Rene Engelhard. * src/hunspell/{affixmgr,suggestmgr}.cxx: fix bad ONLYINCOMPOUND word suggestions. * tests/onlyincompound.sug: empty test file to check this fix. Bug reported by Björn Jacke. * src/hunspell/affixmgr.cxx: fix backtracking in COMPOUND pattern matching. * tests/compound6.*: test files to check this fix. * csutil.cxx: set bigger range types in flag_qsort() and flag_bsearch(). * affixmgr.hxx: set better type for cont_classes[] Boolean data (short -> char) * configure.ac, tests/automake.am: set platform specific XFAIL test (flagutf8.test on ARM platform) 2005-11-09 Németh László : improvements: * src/hunspell/affixmgr.*: new and improved affix file parameters: - COMPOUND definitions: compound patterns with regexp-like matching. See manual and test files: tests/compound*.* Suggested by Bram Moolenaar. Also useful for simple word-level lexical scanning, for example analysing numbers or words with numbers (OOo Issue #53643): http://qa.openoffice.org/issues/show_bug.cgi?id=53643 Examples: tests/compound{4,5}.*. - NOSUGGEST flag: words signed with NOSUGGEST flag are not suggested. Proposed flag for vulgar and obscene words (OOo Issue #55498). Example: tests/nosuggest.*. Problem reported by bobharvey at OOo: http://qa.openoffice.org/issues/show_bug.cgi?id=55498 - KEEPCASE flag: Forbid capitalized and uppercased forms of words signed with KEEPCASE flags. Useful for special ortographies (measurements and currency often keep their case in uppercased texts) and other writing systems (eg. keeping lower case of IPA characters). - CHECKCOMPOUNDCASE: Forbid upper case characters at word bound in compounds. Examples: tests/checkcompoundcase* and tests/germancompounding.* - FLAG UTF-8: New flag type: Unicode character encoded with UTF-8. Example: tests/flagutf8.*. Rationale: Unicode character type can be more readable (in a Unicode text editor) than `long' or `num' flag type. bug fixes: * src/hunspell/hunspell.cxx: accept numbers and numbers with separators (i53643) Bug reported by skelet at OOo: http://qa.openoffice.org/issues/show_bug.cgi?id=53643 * src/hunspell/csutil.cxx: fix casing data in ISO 8859-13 character table. * src/hunspell/csutil.cxx: add ISO-8859-15 character encoding (i54980) Rationale: ISO-8859-15 is the default encoding of the French OpenOffice.org dictionary. ISO-8859-15 is a modified version of ISO-8859-1 (latin-1) character encoding with French œ ligatures and euro symbol. Problem reported by cbrunet at OOo in OOo Issue 54980: http://qa.openoffice.org/issues/show_bug.cgi?id=54980 * src/hunspell/affixmgr.cxx: fix zero-byte malloc after a bad affix header. Patch by Harri Pitkänen. * src/hunspell/suggestmgr.cxx: fix bad NEEDAFFIX word suggestion in ngram suggestions. Reported by Daniel Naber and Friedel Wolff. * src/hunspell/hashmgr.cxx: fix bad white space checking in affix files. src/hunspell/{csutil,affixmgr}.cxx: add other white space separators. Problems with tabulators reported by Frederik Fouvry. * src/hunspell/*: replace system-dependent #include parameters with quoted ones. Problem reported by Dafydd Jones. * src/hunspell/hunspell.cxx: fix missing morphological analysis of dot(s) Reported by Trón Viktor. changes: * src/hunspell/affixmgr.cxx: rename PSEUDOROOT to NEEDAFFIX. Suggested by Bram Moolenaar. * src/hunspell/suggestmgr.hxx: Increase default maximum of ngram suggestions (3->5). Suggested by Kevin Hendricks. * src/hunspell/htypes.hxx: Increase MAXDELEN for long affix flags. * src/hunspell/suggestmgr.cxx: modify (perhaps fix) Unicode map suggestion. tests/maputf test fail on ARM platform reported by Rene Engelhard. * src/hunspell/{affentry.cxx,atypes.hxx}: remove [PREFIX] and MISSING_DESCRIPTION messages from morphological analysis. Problems reported by Trón Viktor. * tests/germancompounding.{aff,good}: Add "Computer-Arbeit" test word. Suggested by Daniel Naber. * doc/man/hunspell.4: Proof-reading patch by Goldman Eleonóra. * doc/man/hunspell.4: Fix bad affix example (replace `move' with `work'). Bug reported by Frederik Fouvry. * tests/*: new test files: affixes.*: simple affix compression example from Hunspell 4 manual page checkcompoundcase.*, checkcompoundcase2.*, checkcompoundcaseutf.* compound.*, compound2.*, compound3.*, compound4.*, compound5.* compoundflag.* (former compound.*) flagutf8.*: test for FLAG UTF-8 germancompounding.*: simplification with CHECKCOMPOUNDCASE. germancompoundingold.* (former germancompounding.*) i53643.*: check numbers with separators i54980.*: ISO8859-15 test keepcase.*: test for KEEPCASE needaffix*.* (former pseudoroot*.* tests) nosuggest.*: test for NOSUGGEST 2005-09-19 Németh László : * src/hunspell/suggestmgr.cxx: improved ngram suggestion: - detect not neighboring swap characters (pernament -> permanent) Rationale: ngram method has a significant error with not neighboring swap characters, especially when swap is in the middle of the word. - suggest uppercase forms (unesco -> UNESCO, siggraph's -> SIGGRAPH's) - suggest only ngram swap character and uppercase form, if they exist. Rationale: swap character and casing equivalence give mutch better suggestions as any other (weighted) ngram suggestions. - add uppercase suggestion (PERMENANT -> PERMANENT) * src/hunspell/*: complete comparison with MySpell 3.2 (in OOo beta 2): - affixmgr.cxx: add missing numrep initialization - hashmgr.cxx: add_word(): don't allocate temporary records - hunspell.cxx: in suggest(): - check capitalized words first (better sug. order for proper names), - check pSMgr->suggest() return value - set pSMgr->suggest() call to not optional in HUHCAP - csutil.cxx: fix bad KOI8-U -> koi8r_tbl reference in enc_entry encds - csutil.cxx: fix casing data in ISO 8859-2, Windows 1251 and KOI8-U encoding tables. Bug reported by Dmitri Gabinski. * src/hunspell/affixmgr.*: improved compound word and other features - generalize hu_HU specific compound word features with new affix file parameters, suggested by Bram Moolenaar: - CHECKCOMPOUNDDUP: forbid word duplication in compounds (eg. foo|foo) - CHECKCOMPOUNDTRIPLE: forbid triple letters in compounds (eg. foo|obar) - CHECKCOMPOUNDPATTERN: forbid patterns at word bounds in compounds - CHECKCOMPOUNDREP: using REP replacement table, forbid presumably bad compounds (useful for languages with unlimited number of compounds) - ONLYINCOMPOUND flag works also with words (see tests/onlyincompound.*) Suggested by Daniel Naber, Björn Jacke, Trón Viktor & Bram Moolenaar. - PSEUDOROOT works also with prefixes and prefix + suffix combinations (see tests/pseudoroot5.*). Suggested by Trón Viktor. - man/hunspell.4: updated man page * src/hunspell/affixmgr.*: fix incomplete prefix handling with twofold suffixes (delete unnecessary contclasses[] conditions in prefix_check_twosfx() and prefix_check_twosfx_morph()). Bug reported by Trón Viktor. * src/hunspell/affixmgr.*: complete also *_morph() functions with conditions of new Hunspell features (circumfix, pseudoroot etc.). * src/hunspell/suggestmgr.cxx: - fix missing suggestions for words with crossed prefix and suffix - fix redundant non compound word checking - fix losing suggestions problem. Bug reported by Dmitri Gabinski. * src/hunspell/dictmgr.*: - add new dictionary manager for Hunspell UNO modul Problems with eo_ANY Esperanto locale reported by Dmitri Gabinski. * src/hunspell/*: use precise constant sizes for 8-bit and 16-bit character arrays with MAXWORDUTF8LEN and MAXSWUTF8L macros. * src/hunspell/affixmgr.cxx: fix bad MAXNGRAMSUGS parameter handling * src/hunspell/affixmgr.cxx, src/tools/{un}munch.*: fix GCC 4.0 warnings on fgets(), reported by Dvornik László * po/hu.po: improved translation by Dvornik László * tests/test.sh: improved test environment - add suggestion testing (see tests/*.sug) - add memory debugging environment, based on the excellent Valgrind debugger. Usage on Linux and experimental platforms of Valgrind: VALGRIND=memcheck make check - rename test_hunmorph to test.sh * tests/*: new tests: - base.*: base example based on MySpell's checkme.lst. - map{,utf}.*, rep{,utf}: MAP and REP suggestion examples - tests on new CHECKCOMPOUND, ONLYINCOMPOUND and PSEUDOROOT features - i54633.*: capitalized suggestion test for Issue 54633 from OOo's Issuezilla - i35725.*: improved ngram suggestion test for Issue 35725 2005-08-26 Németh László : improvements: * src/hunspell/suggestmgr.cxx: Unicode support in related character map suggestion * src/hunspell/suggestmgr.cxx: Unicode support in ngram suggestion * src/hunspell/{suggestmgr,affixmgr,hunspell}.cxx: improve ngram suggestion. Fix http://qa.openoffice.org/issues/show_bug.cgi?id=35725. See release notes for examples. This problem reported by beccablain at OOo. - ngram suggestions now are case insensitive (see `Permenant' bug in Issuezilla) - weight ngram suggestions (with the longest common subsequent algorithm, also considering lengths of bad word and suggestion, identical first letters and almost completely identical character positions) - set strict affix congruency in expand_rootword(). Now ngram suggestions are good for languages with rich morphology and also better for English. Rationale: affixed forms of the first ngram suggestion very often suppress the second and subsequent root word suggestions. But faults in affixes are more uncommon, and can be fix without suggestions. We must prefer the more informative second and subsequent root word suggestions instead of the suggestions for bad affixes. - a better suggestion may not be substring of a less good suggestion Rationale: Suggesting affixed forms of a root word is unnecessary, when root word has got better weighted ngram value. (Checking substrings is a good approximation for this refinement.) - lesser ngram suggestions (default 3 maximum instead of 10) Rationale: For users need a big extra effort to check a lot of bad ngram suggestions, nine times out of ten unnecessarily. It is very distracting, because ngram suggestions could be very different. Usually Myspell and Hunspell suggest one or two suggestions with the old suggestion algorithms (maximum is 15), with ngram algorithm often gives maximum number suggestions. With strict affix congruency and other refinements, the good suggestion there is usually among the first three elements. - new affix parameter: MAXNGRAMSUG * src/hunspell/*: support agglutinative languages with rich prefix morphology or with right-to-left writing system (for example, Turkic and Austronesian languages with (modified) Arabic scripts). - new affix parameter: COMPLEXPREFIXES Set twofold prefix stripping (but single suffix stripping) * src/hunspell/affixmgr.cxx: - speed up prefix loading with tree sorting algorithm. * tests/complexprefixes.*, tests/complexprefixesutf.*: Coptic example posted by Moheb Mekhaiel * src/hunspell/hashmgr.cxx: check size attribute in dic file suggested by Daniel Naber Rationale: With missing size attribute Hunspell allocates too small and more slower hash memory, and Hunspell can lose first dictionary word. * src/hunspell/affixmgr.cxx: check stripping characters and condition compatibility in affix rules (bugs detected in cs_CZ, es_ES, es_NEW, es_MX, lt_LT, nn_NO, pt_PT, ro_RO and sk_SK dictionaries). See release notes of Hunspell 1.0.9 in NEWS. * src/hunspell/affixmgr.cxx: check unnecessary fields in affix rules (bugs detected in ro_RO and sv_SE dictionaries). See release notes. * src/hunspell/affixmgr.cxx: remove redundant condition checking in affix rules with stripping characters (redundancy in OpenOffice.org dictionaries reported by Eleonóra Goldman) Rationale: this is a little optimization, but it was excellent for detect the bad ngram affixation with bad or weak affix conditions. * tests/germancompounding.aff: improve compound definition - use dash prefix instead of language specific tokenizer Rationale: Using uniform approach is the right way to check and analyze compound words. Language specific word breaking is deprecated, need a sophisticated grammar checking for word-like word pairs (for example in Hungarian there is a substandard, but accepted syntax with dash for word pairs: cats, dogs -> kutyák-macskák (like cats/dogs in English). * test Hunspell with 54 OpenOffice.org dictionaries: see release notes bug fixes: * src/hunspell/suggestmgr.*: add time limit to exponential algorithm of the related character map suggestion Rationale: a long word in agglutinative languages or a special pattern (for example a horizontal rule) made of map characters can `crash' the spell checker. * src/hunspell/affentry.cxx: add() functions: fix bad word generation checking stripping characters (see similar bug in unmunch) * src/hunspell/affixmgr.cxx: parse_file(): fix unconditional getNext() call for ~AffixMgr() when affix file is corrupt. * src/hunspell/affixmgr.*: AffixMgr(), parse_cpdsyllable(): fix missing string duplications for ~AffixMgr() when affix file is corrupt. * src/hunspell/affixmgr.*: parse_affix(): fix fprintf() call when affix file is corrupt. Bug reported by Daniel Naber. * suggestmgr.cxx: replace single usage of 'strdup' with 'mystrdup' patch by Chris Halls (debian.org) * src/hunspell/makefile.mk: add makefile.mk for compiling in OpenOffice.org See README in Hunspell UNO modul. Problems with separated compiling reported by Rene Engelhard * src/hunspell/hunspell.cxx: fix pseudoroot support - search a not pseudoroot homonym in check() * tests/pseudoroot4.*: test this fix * src/tools/unmunch.c: fix bad word generation when conditions are shorter or incompatible with stripping characters in affix rules * src/tools/unmunch.c: fix mychomp() for de_AT.dic and other dic files without last new line character. other changes: * src/hunspell/suggestmgr.*: erase ACCENT suggestion Rationale: ACCENT suggestion was the same as Kevin Hendrick's map suggestion algorithm, but with a less good interface in affix file. * src/hunspell/suggestmgr.*: combine cycle number limit in badchar(), and forgotchar() with a time limit. * src/hunspell/affixmgr.*: remove NOMAPSUGS affix parameter * src/hunspell/{suggestmgr,hunspell}.*: strip periods from suggestions (restore MySpell's original behaviour) Rationale: OpenOffice.org has an automatic period handling mechanism and suggestions look better without periods. - new affix file parameter: SUGSWITHDOTS Add period(s) to suggestions, if input word terminates in period(s). (No need for OpenOffice.org dictionaries.) * tests/germancompounding.aff: improve bad german affix in affix example (computeren->computern). Suggested by Daniel Naber. * src/tools/example.cxx: add Myspell's example * src/tools/munch.cxx: add Myspell's munch * man{,/hu}/hunspell.4: refresh manual pages 2005-08-01 Németh László : * add missing MySpell files and features: - add MySpell license.readme, README and CONTRIBUTORS ({license,README,AUTHORS}.myspell) - add MySpell unmunch program (src/tools/unmunch.c) - add licenses to source (src/hunspell/license.{myspell,hunspell}) - port MAP suggestion (with imperfect UTF-8 support) - add NOSPLITSUGS affix parameter - add NOMAPSUGS affix parameter * src/man/man.4: MAP, COMPOUNDPERMITFLAG, NOSPLITSUGS, NOMAPSUGS * src/hunspell/aff{entry,ixmgr}.cxx: - improve compound word support - new affix parameter: COMPOUNDPERMITFLAG (see manual) * src/tests/compoundaffix{,2}.*: examples for COMPOUNDPERMITFLAG * src/tests/germancompounding.*: new solution for German compounding Problems with German compounding reported by Daniel Naber * src/hunspell/hunspell.cxx: fix German uppercase word spelling with the spellsharps() recursive algorithm. Default recursive depth is 5 (MAXSHARPS). * src/tests/germansharps*: extended German sharp s tests * src/tools/hunspell.cxx: fix fatal memory bug in non-interactive subshells without HOME environmental variable Bug detected with PHP by András Izsók. 2005-07-22 Németh László : * src/hunspell/csutil.hxx: utf16_u8() - fix 3-byte UTF-8 character conversion 2005-07-21 Németh László : * src/hunspell/csutil.hxx: hunspell_version() for OOo UNO modul 2005-07-19 Németh László : * renaming: - src/morphbase -> src/hunspell - src/hunspell, src/hunmorph -> src/tools - src/huntokens -> src/parsers * src/tools/hunstem.cxx: add stemmer example 2005-07-18 Németh László : * configure.ac: --with-ui, --with-readline configure options * src/hunspell/hunspell.cxx: fix conditional compiling * src/hunspell/hunspell.cxx: set HunSPELL.bak temporaly file in the same dictionary with the checked file. * src/morphbase/morphbase.cxx: - handling German sharp s (ß) - fix (temporaly) analyize() * tests: a lot of new tests * po/, intl/, m4/: add gettext from GNU hello * po/hu.po: add Hungarian translation * doc/, man/: rename doc to man 2005-07-04 Németh László : * src/morphbase/hashmgr.cxx: set FLAG attributum instead of FLAG_NUM and FLAG_LONG * doc/hunspell.4: manual in English 2005-06-30 Németh László : * src/morphbase/csutil.cxx: add character tables from csutil.cxx of OOo 1.1.4 * src/morphbase/affentry.cxx: fix Unicode condition checking * tests/{,utf}compound.*: tests compounding 2005-06-27 Németh László : * src/morphbase/*: fix Unicode compound handling 2005-06-23 Halácsy Péter: * src/hunmorph/hunmorph.cxx: delete spelling error message and suggest_auto() call 2005-06-21 Németh László : * src/morphbase: Unicode support * tests/utf8.*: SET UTF-8 test * src/morphbase: checking and fixing with Valgrind Memory handling error reported by Ferenc Szidarovszky 2005-05-26 Németh László : * suggestmgr.cxx: fix stemming * AUTHORS, COPYING, ChangeLog: set CC-LGPL free software license 2004-05-25 Varga Dániel * src/stemtool: new subproject 2005-05-25 Halácsy Péter * AUTHORS, COPYING: set CC Attribution license 2004-05-23 Varga Dániel * src: - modifications for compiling with Visual C++ * src/hunmorph/csutil.cxx: correcting header of flag_qsort(), * src/hunmorph/*: correct csutil include 2005-05-19 Németh László * csutil.cxx: fix loop condition in lineuniq() bug reported by Viktor Nagy (nagyv nyelvtud hu). * morphbase.cxx: handle PSEUDOROOT with zero affixes bug reported by Viktor Nagy (nagyv nyelvtud hu). * tests/zeroaffix.*: add zeroaffix tests 2005-04-09 Németh László * config.h.in: reset with autoheader * src/hunspell/hunspell.cxx: set version 2005-04-06 Németh László * tests: tests * src/morphbase: New optional parameters in affix file: - PSEUDOROOT: for forbidding root with not forbidden suffixed forms. - COMPOUNDWORDMAX: max. words in compounds (default is no limit) - COMPOUNDROOT: signs compounds in dictionary for handling special compound rules - remove COMPOUNDWORD, ONLYROOT 2005-03-21 Németh László * src/morphbase/*: - 2-byte flags, FLAG_NUM, FLAG_LONG - CIRCUMFIX: signed suffixes and prefixes can only occur together - ONLYINCOMPOUND for fogemorpheme (Swedish, Danish) or Flute-elements (German) - COMPOUNDBEGIN: allow signed roots, and roots with signed suffix in begin of compounds - COMPOUNDMIDDLE: like before, but middle of compounds - COMPOUNDEND: like before, but end of compounds - remove COMPOUNDFIRST, COMPOUNDLAST