| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845 |
- 2022-08-22: Hunspell 1.7.1 release:
- - Merge chromium fix for #714 OOB string write in hunspell
- - Merge firefox fix for #756 various issues parsing incomplete aff files
- - Fix #492 crash with hunspell -l -r
- - Merge in weblate translations
- 2018-11-12: Hunspell 1.7.0 release:
- New features and bug fixes by László Németh, supported by FSF.hu Foundation:
- - No annoying suggestion times any more, especially in languages with
- compound word handling and complex morphology. By adding balanced
- multi-level time limits, now the guaranteed suggestion time is there
- within half a second, not seconds (nor dozen of seconds or more
- in extreme cases) for longer misspellings, too.
- - add SPELLML support for run-time dictionary extension with optional
- affixation of user words. See new "Grammar By" feature of
- language-specific user dictionaries of LibreOffice 6.0:
- News: https://wiki.documentfoundation.org/ReleaseNotes/6.0#.E2.80.9CGrammar_By.E2.80.9D_spell_checking
- Screencast with English example: https://www.youtube.com/watch?v=EsS3gaBTfOo
- Screencast with German example: https://www.youtube.com/watch?v=aYVFDqCUb6I
- - Improved, highly customizable suggestions on level of dictionary words:
- Pronunciations and typical misspellings defined by optional "ph:" fields of
- the dictionary words are used not only in n-gram suggestions, but as
- elements of the REP replacement list getting the highest priority in normal
- suggestions, also giving the best suggestions for short words, too.
- More information: see "ph:" in man 5 hunspell.
- - Handling multiple word suggestions is much more easier. Like in a
- traditional spelling dictionary, for example, to get the correct suggestion
- "a lot" for the typical misspelling "alot" at the first place, now it's
- enough to put the following line to the dic(tionary) file:
- a lot
- - Limit compound overgeneration by dictionary based word pairs:
- Now it's possible to filter bad compound words by listing
- the correct word pairs with space in the dictionary, as in a traditional
- spelling dictionary.
- - clean-up suggestion:
- - no n-gram and compound word suggestions, if "good" suggestion
- exists, ie. uppercase, REP, ph: or dictionary word pair suggestions
- - word pairs are always suggested, if they exist in the dic file
- - word pairs have top priority in suggestions, and
- these are the only suggestions if there is no other good suggestion.
- - also dictionary word pairs separated by dash instead of space
- are handled specially in two-word suggestion (depending from the
- language)
- - limit bad suggestions by improved n-gram suggestion rules:
- don't suggest capitalized dictionary words for lower
- case misspellings in n-gram suggestions, except
- - PHONE usage, or
- - in the case of German, where not only proper
- nouns are capitalized, or
- - the capitalized word has special pronunciation
- and don't suggest if the difference of lengths of misspellings and
- suggestions is 5 or more characters.
- - Extend dotless i and dotted I rules to Crimean Tatar language
- Allow dotted I in dictionary, and disable bad capitalization of i.
- - BREAK: extended recursive word breaking algorithm to handle words or
- words with suffixes when they already contain word break characters,
- for example, "e-mail" is a dictionary word with a word break character, and
- it wasn't accepted before in compounds in some languages.
- - FORBIDDENWORD precedes BREAK: Now it's possible to forbid compound
- forms recognized by BREAK word breaking by adding the bad compounds to
- the dictionary with FORBIDDENWORD flags.
- - lower limit for "doubletwochars" suggestion algorithm:
- one of the typical misspellings recognized by Hunspell suggestion
- mechanism is the syllable duplication. Along the old pattern
- ABABA -> ABA, for example nutrITITIon -> nutrITIon, now also the
- simpler ABAB -> AB pattern is recognized in non-starting position,
- for example, regretTETEd -> regretTEd.
- - lower limit for longswapchar and movechar: recognized only max.
- 4-character distances to avoid slow and bad suggestions.
- - fix compound handling for new Hungarian orthography reform
- - Allow suggestion search for prefix + *two suffixes*:
- Remove artificial performance limit to get correct
- suggestions for relatively simple misspellings in
- Hungarian, etc., when the word form contains prefix
- and both derivative and inflectional suffixes, too:
- lefikszálása -> lefixálása
- Improvements for command-line Hunspell:
- - Remove false alarms during checking OpenDocument (ODF)
- documents by ignoring <text:span> elements. (LibreOffice
- creates a lot of <text:span> elements also within words
- during text reediting, resulted often huge amount of broken
- words before this fix.)
- - List filenames during filtering multiple files in command-line:
- Examples:
- $ hunspell -l *.odt
- a.odt: mispelling
- b.odt: egzample
- $ hunspell -l -G *.odt
- a.odt: good
- b.odt: words
- - Dictionary search by option -D doesn't wait for the standard input
- (fixed by Siva Mahadevan)
- Other improvements:
- - makealias dictionary compression: add option --minimize-diff
- to reuse free positions of alias lists to create minimal and
- readable diffs for alias compressed dictionaries stored in
- revision control systems, as dictionaries of LibreOffice.
- - Brazilian-Portuguese translation by Rafael Fontenelle
- - Catalan translation by robert dot buj at gmail
- - Minor bug fixes by several contributors, see git log
- 2017-09-03: Hunspell 1.6.2 release:
- - Library changes: no. Same as 1.6.1.
- - Command line tool:
- - Added German translation
- - Fixed bug with wrong output encoding, not respecting system locale.
- 2017-03-25: Hunspell 1.6.1 release:
- - Library changes:
- - Performance improvements in suggest()
- - Fixes regressions for Hungarian related to compounding.
- - Fixes regressions for Korean related to ICONV.
- - Command line tool:
- - Added Tajik translation
- - Fix regarding serching of OOo dicts installed in user folder
- - Manpages:
- - Fix microsoft-cp1251 to cp1251. Dicts should not use the first.
- - Typos.
-
- 2016-12-22: Hunspell 1.6.0 release:
- - Library changes:
- - Performance improvement in ngsuggest(), suggestions should be faster.
- - Revert MAXWORDLEN to 100 as in 1.3.3 for performance reasons.
- - MAXWORDLEN can be set during build time with -D defines.
- - Fix crash when word with 102 consecutive X is spelled.
- - Command line tool:
- - -D shows all loaded dictionares insted of only the first.
- - -D properly lists all available dictionaries on Windows.
- 2016-11-30: Hunspell 1.5.4 release:
- - Fixes the command COMPOUNDSYLLABLE used in Hungarian dictionary.
- 2016-11-28: Hunspell 1.5.3 release:
- - Removed a #include from hunspell.hxx that was creating trouble
- 2016-11-27: Hunspell 1.5.2 release:
- - Reverted full backward compatibility with 1.4 public API, again
- 2016-11-27: Hunspell 1.5.1 release:
- - Reverted full backward compatibility with 1.4 public API
- 2016-11-18: Hunspell 1.5.0 release:
- - Lot of stability fixes
- - Fixed compilation errors on various systems (Windows, FreeBSD)
- - Small performance improvement compared to 1.4.0
- - The C++ API is updated to use modern C++ types (string, vector).
- Backward compatibility is kept for most of the functions except for
- the following:
- - get_wordchars();
- - get_version();
- - input_conv(string, string);
- - removed get_csconv();
- 2016-04-15: Hunspell 1.4.0 release:
- - various abi changes due to moving away from char* to std::string
- 2014-06-02: Hunspell 1.3.3 release:
- - OpenDocument (ODF and Flat ODF) support (ODF needs unzip program)
- - various bug fixes
- 2011-02-02: Hunspell 1.3.2 release:
- - fix library versioning
- - improved manual
- 2011-02-02: Hunspell 1.3.1 release:
- - bug fixes
- 2011-01-26: Hunspell 1.2.15/1.3 release:
- - new features: MAXDIFF, ONLYMAXDIFF, MAXCPDSUGS, FORBIDWARN, see manual
- - bug fixes
- 2011-01-21:
- - new features: FORCEUCASE and WARN, see manual
- - new options: -r to filter potential mistakes (rare words
- signed by flag WARN in the dictionary)
- - limited and optimized suggestions
- 2011-01-06: Hunspell 1.2.14 release:
- - bug fix
- 2011-01-03: Hunspell 1.2.13 release:
- - bug fixes
- - improved compound handling and
- other improvements supported by OpenTaal Foundation, Netherlands
- 2010-07-15: Hunspell 1.2.12 release
- 2010-05-06: Hunspell 1.2.11 release:
- - Maintenance release bug fixes
- 2010-04-30: Hunspell 1.2.10 release:
- - Maintenance release bug fixes
- 2010-03-03: Hunspell 1.2.9 release:
- - Maintenance release bug fixes and warnings
- - MAP support for composed characters or character sequences
- 2008-11-01: Hunspell 1.2.8 release:
- - Default BREAK feature and better hyphenated word suggestion to accept
- and fix (compound) words with hyphen characters by spell checker
- instead of by work breaking code of OpenOffice.org. With this feature
- it's possible to accept hyphenated compound words, such as "scot-free",
- where "scot" is not a correct English word.
- - ICONV & OCONV: input and output conversion tables for optional character
- handling or using special inner format. Example:
- # Accepting de facto replacements of the Romanian comma acuted letters
- SET UTF-8
- ICONV 4
- ICONV ÅŸ È™
- ICONV ţ ț
- ICONV Ş Ș
- ICONV Ţ Ț
- Typical usage of ICONV/OCONV is to manage an inner format for a segmental
- writing system, like the Ethiopic script of the Amharic language.
- - Extended CHECKCOMPOUNDPATTERN to handle conpound word alternations, like
- sandhi feature of Telugu and other writing systems.
- - SIMPLIFIEDTRIPLE compound word feature: allow simplified Swedish and
- Norwegian compound word forms, like tillåta (till|låta) and
- bussjåfør (buss|sjåfør)
- - wordforms: word generator script for dictionary developers (Hunspell
- version of unmunch).
- - bug fixes
- 2008-08-15: Hunspell 1.2.7 release:
- - FULLSTRIP: new option for affix handling. With FULLSTRIP, affix rules can
- strip full words, not only one less characters.
- - COMPOUNDRULE works with all flag types. (COMPOUNDRULE is for pattern
- matching. For example, en_US dictionary of OpenOffice.org uses COMPOUNDRULE
- for ordinal number recognition: 1st, 2nd, 11th, 12th, 22nd, 112th, 1000122nd
- etc.).
- - optimized suggestions:
- - modified 1-character distance suggestion algorithms: search a TRY character
- in all position instead of all TRY characters in a character position
- (it can give more readable suggestion order, also better suggestions
- in the first positions, when TRY characters are sorted by frequency.)
- For example, suggestions for "moze":
- ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6),
- maze, more, mote, ooze, mole etc. (Hunspell 1.2.7).
- - extended compound word checking for better COMPOUNDRULE related
- suggestions, for example English ordinal numbers: 121323th -> 121323rd
- (it needs also a th->rd REP definition).
- - bug fixes
- 2008-07-15: Hunspell 1.2.6 release:
- - bug fix release (fix affix rule condition checking of sk_SK dictionary,
- iconv support in stemming and morphological analysis of the Hunspell
- utility, see also Changelog)
- 2008-07-09: Hunspell 1.2.5 release:
- - bug fix release (fix affix rule condition checking of en_GB dictionary,
- also morphological analysis by dictionaries with two-level suffixes)
- 2008-06-18: Hunspell 1.2.4-2 release:
- - fix GCC compiler warnings
- 2008-06-17: Hunspell 1.2.4 release:
- - add free_list() for C, C++ interfaces to deallocate suggestion lists
-
- - bug fixes
- 2008-06-17: Hunspell 1.2.3 release:
- - extended XML interface to use morphological functions by standard
- spell checking interface, spell() and suggest(). See hunspell.3 manual page.
- - default dash suggestions for compound words: newword-> new word and new-word
- - new manual pages: hunspell.3, hzip.1, hunzip.1.
-
- - bug fixes
- 2008-04-12: Hunspell 1.2.2 release:
- - extended dictionary (dic file) support to use multiple base and
- special dictionaries.
-
- - new and improved options of command line hunspell:
- -m: morphological analysis or flag debug mode (without affix
- rule data it signs the flag of the affix rules)
- -s: stemming mode
- -D: list available dictionaries and search path
- -d: support extra dictionaries by comma separated list. Example:
-
- hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt
- - forbidding in personal dictionary (with asterisk, / signs affixation)
- - optional compressed dictionary format "hzip" for aff and dic files
- usage:
- hzip example.aff example.dic
- mv example.aff example.dic /tmp
- hunspell -d example
- hunzip example.aff.hz >example.aff
- hunzip example.dic.hz >example.dic
- - new affix compression tool "affixcompress": compression tool for
- large (millions of words) dictionaries.
- - support encrypted dictionaries for closed OpenOffice.org extensions or
- other commercial programs
- - improved manual
- - bug fixes
- 2007-11-01: Hunspell 1.2.1 release:
- - new memory efficient condition checking algorithm for affix rules
-
- - new morphological functions:
- - stem() for stemming
- - analyze() for morphological analysis
- - generate() for morphological generation
- - new demos:
- - analyze: stemming, morphological analysis and generation
- - chmorph: morphological conversion of texts
- 2007-09-05: Hunspell 1.1.12 release:
- - dictionary based phonetic suggestion for words with
- special or foreign pronounciation or alternative (bad) transliteration
- (see Changelog, tests/phone.* and manual).
- - improved data structure and memory optimization for dictionaries
- with variable count fields
- - bug fixes for Unicode encoding dictionaries and ngram suggestions
-
- - improved REP suggestions with space: it works without dictionary
- modification
- - updated and new project files for Windows API
- 2007-08-27: Hunspell 1.1.11 release:
- - portability fixes
- 2007-08-23: Hunspell 1.1.10 release:
- - pronounciation based suggestion using Björn Jacke's original Aspell
- phonetic transcription algorithm (http://aspell.net), relicensed under
- GPL/LGPL/MPL tri-license with the permission of the author
- - keyboard base suggestion by KEY (see manual)
- - better time limits for suggestion search
- - test environment for suggestion based on Wikipedia data
- - bug fixes for non standard Mozilla platforms etc.
- 2007-07-25: Hunspell 1.1.9 release:
- - better tokenization:
- - for URLs, mail addresses and directory paths (default: skip these tokens)
- - for colons in words (for Finnish and Swedish)
-
- - new examples:
- - affixation of personal dictionary words
- - digits in words
- - bug fixes (see ChangeLog)
- 2007-07-16: Hunspell 1.1.8 release:
- - better Mac OS X/Cygwin and Windows compatibility
- - fix Hunspell's Valgrind environment and memory handling errors
- detected by Valgrind
- - other bug fixes (see ChangeLog)
- 2007-07-06: Hunspell 1.1.7 release:
- - fix warning messages of OpenOffice.org build
- 2007-06-29: Hunspell 1.1.6 release:
- - check capitalization of the following word forms
- - words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG
- - allcap words and suffixes: UNICEF's - UNICEF'S
- - prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA
- - suggestion for missing sentence spacing: something.The -> something. The
- - Hunspell executable: improved locale support
- - -i option: custom input encoding
- - use locale data for default dictionary names.
- - tools/hunspell.cxx: fix 8-bit tokenization (letters without
- casing, like ß or Hebrew characters now are handled well)
- - dictionary search path (automatic detection of OpenOffice.org directories)
- - DICPATH environmental variable
- - -D option: show directory path of loaded dictionary
- - patches and bug fixes for Mozilla, OpenOffice.org.
- 2007-03-19: Hunspell 1.1.5 release:
- - optimizations: 10-100% speed up, smaller code size and memory footprint
- (conditional experimental code and warning messages)
- - extended Unicode support:
- - non BMP Unicode characters in dictionary words and affixes (except
- affix rules and conditions)
- - support BOM sequence in aff and dic files
- - IGNORE feature for Arabic diacritics and other optional characters
- - New edit distance suggestion methods:
- - capitalisation: nasa -> NASA
- - long swap: permenant -> permanent
- - long move: Ghandi -> Gandhi, greatful -> grateful
- - double two characters: vacacation -> vacation
- - spaces in REP sug.: REP alot a_lot (NOTE: "a lot" must be a dictionary word)
- - patches and bug fixes for Mozilla, OpenOffice.org, Emacs, MinGW, Aqua,
- German and Arabic language, etc.
- 2006-02-01: Hunspell 1.1.4 release:
- - Improved suggestion for typical OCR bugs (missing spaces between
- capitalized words). For example: "aNew" -> "a New".
- http://qa.openoffice.org/issues/show_bug.cgi?id=58202
- - tokenization fixes (fix incomplete tokenization of input texts on big-endian
- platforms, and locale-dependent tokenization of dictionary entries)
- 2006-01-06: Hunspell 1.1.3.2 release:
- - fix Visual C++ compiling errors
- 2006-01-05: Hunspell 1.1.3 release:
- - GPL/LGPL/MPL tri-license for Mozilla integration
-
- - Alias compression of flag sets and morphological descriptions.
- (For example, 16 MB Arabic dic file can be compressed to 1 MB.)
-
- - Improved suggestion.
-
- - Improved, language independent German sharp s casing with CHECKSHARPS
- declaration.
- - Unicode tokenization in Hunspell program.
-
- - Bug fixes (at new and old compound word handling methods), etc.
- 2005-11-11: Hunspell 1.1.2 release:
- - Bug fixes (MAP Unicode, COMPOUND pattern matching, ONLYINCOMPOUND
- suggestions)
- - Checked with 51 regression tests in Valgrind debugging environment,
- and tested with 52 OOo dictionaries on i686-pc-linux platform.
- 2005-11-09: Hunspell 1.1.1 release:
- - Compound word patterns for complex compound word handling and
- simple word-level lexical scanning. Ideal for checking
- Arabic and Roman numbers, ordinal numbers in English, affixed
- numbers in agglutinative languages, etc.
- http://qa.openoffice.org/issues/show_bug.cgi?id=53643
- - Support ISO-8859-15 encoding for French (French oe ligatures are
- missing from the latin-1 encoding).
- http://qa.openoffice.org/issues/show_bug.cgi?id=54980
-
- - Implemented a flag to forbid obscene word suggestion:
- http://qa.openoffice.org/issues/show_bug.cgi?id=55498
- - Checked with 50 regression tests in Valgrind debugging environment,
- and tested with 52 OOo dictionaries.
- - other improvements and bug fixes (see ChangeLog)
- 2005-09-19: Hunspell 1.1.0 release
- * complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta)
- * improved ngram suggestion with swap character detection and
- case insensitivity
- ------ examples for ngram improvement (input word and suggestions) -----
- 1. pernament (instead of permanent)
- MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
- ornament, ornamentals, ornamental, ornamentally
- Hunspell 1.0.9: ornamental, ornament, tournament
- Hunspell 1.1.0: permanent
- Note: swap character detection
- 2. PERNAMENT (instead of PERMANENT)
- MySpell 3.2: -
- Hunspell 1.0.9: -
- Hunspell 1.1.0: PERMANENT
- 3. Unesco (instead of UNESCO)
- MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's,
- Frescoed, Fresco, Escorts, Escorting
- Hunspell 1.0.9: Genesco, Ionesco, Fresco
- Hunspell 1.1.0: UNESCO
- 4. siggraph's (instead of SIGGRAPH's)
- MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's,
- physiography, digraphs, serigraph, stratigraphy's, stratigraphy
- epigraphs
- Hunspell 1.0.9: serigraph's, epigraph's, digraph's
- Hunspell 1.1.0: SIGGRAPH's
- --------------- end of examples --------------------
- * improved testing environment with suggestion checking and memory debugging
- memory debugging of all tests with a simple command:
-
- VALGRIND=memcheck make check
- * lots of other improvements and bug fixes (see ChangeLog)
- 2005-08-26: Hunspell 1.0.9 release
- * improved related character map suggestion
- * improved ngram suggestion
- ------ examples for ngram improvement (O=old, N = new ngram suggestions) --
- 1. Permenant (instead of Permanent)
- O: Endangerment, Ferment, Fermented, Deferment's, Empowerment,
- Ferment's, Ferments, Fermenting, Countermen, Weathermen
- N: Permanent, Supermen, Preferment
- Note: Ngram suggestions was case sensitive.
- 2. permenant (instead of permanent)
- O: supermen, newspapermen, empowerment, endangerment, preferments,
- preferment, permanent, preferment's, permanently, impermanent
- N: permanent, supermen, preferment
- Note: new suggestions are also weighted with longest common subsequence,
- first letter and common character positions
- 3. pernemant (instead of permanent)
- O: pimpernel's, pimpernel, pimpernels, permanently, permanents, permanent,
- supernatant, impermanent, semipermanent, impermanently
- N: permanent, supernatant, pimpernel
- Note: new method also prefers root word instead of not
- relevant affixes ('s, s and ly)
- 4. pernament (instead of permanent)
- O: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
- ornament, ornamentals, ornamental, ornamentally
- N: ornamental, ornament, tournament
- Note: Both ngram methods misses here.
- 5. obvus (instad of obvious):
- O: obvious, Corvus, obverse, obviously, Jacobus, obtuser, obtuse,
- obviates, obviate, Travus
- N: obvious, obtuse, obverse
- Note: new method also prefers common first letters.
- 6. unambigus (instead of unambiguous)
- O: unambiguous, unambiguity, unambiguously, ambiguously, ambiguous,
- unambitious, ambiguities, ambiguousness
- N: unambiguous, unambiguity, unambitious
- 7. consecvence (instead of consequence)
- O: consecutive, consecutively, consecutiveness, nonconsecutive, consequence,
- consecutiveness's, convenience's, consistences, consistence
- N: consequence, consecutive, consecrates
- An example in a language with rich morphology:
- 8. Misisipiben (instead of Mississippiben [`in Mississippi' in Hungarian]):
- O: Misikédéiben, Pisisedéiben, Misikéiéiben, Pisisekéiben, Misikéiben,
- Misikéidéiben, Misikékéiben, Misikéikéiben, Misikéiméiben, Mississippiiben
- N: Mississippiben, Mississippiiben, Misiiben
- Note: Suggesting not relevant affixes was the biggest fault in ngram
- suggestion for languages with a lot of affixes.
- --------------- end of examples --------------------
- * support twofold prefix cutting
- * lots of other improvements and bug fixes (see ChangeLog)
- * test Hunspell with 54 OpenOffice.org dictionaries:
- source: ftp://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries
- testing shell script:
- -------------------------------------------------------
- for i in `ls *zip | grep '^[a-z]*_[A-Z]*[.]'`
- do
- dic=`basename $i .zip`
- mkdir $dic
- echo unzip $dic
- unzip -d $dic $i 2>/dev/null
- cd $dic
- echo unmunch and test $dic
- unmunch $dic.dic $dic.aff 2>/dev/null | awk '{print$0"\t"}' |
- hunspell -d $dic -l -1 >$dic.result 2>$dic.err || rm -f $dic.result
- cd ..
- done
- --------------------------------------------------------
- test result (0 size is o.k.):
- $ for i in *_*/*.result; do wc -c $i; done
- 0 af_ZA/af_ZA.result
- 0 bg_BG/bg_BG.result
- 0 ca_ES/ca_ES.result
- 0 cy_GB/cy_GB.result
- 0 cs_CZ/cs_CZ.result
- 0 da_DK/da_DK.result
- 0 de_AT/de_AT.result
- 0 de_CH/de_CH.result
- 0 de_DE/de_DE.result
- 0 el_GR/el_GR.result
- 6 en_AU/en_AU.result
- 0 en_CA/en_CA.result
- 0 en_GB/en_GB.result
- 0 en_NZ/en_NZ.result
- 0 en_US/en_US.result
- 0 eo_EO/eo_EO.result
- 0 es_ES/es_ES.result
- 0 es_MX/es_MX.result
- 0 es_NEW/es_NEW.result
- 0 fo_FO/fo_FO.result
- 0 fr_FR/fr_FR.result
- 0 ga_IE/ga_IE.result
- 0 gd_GB/gd_GB.result
- 0 gl_ES/gl_ES.result
- 0 he_IL/he_IL.result
- 0 hr_HR/hr_HR.result
- 200694989 hu_HU/hu_HU.result
- 0 id_ID/id_ID.result
- 0 it_IT/it_IT.result
- 0 ku_TR/ku_TR.result
- 0 lt_LT/lt_LT.result
- 0 lv_LV/lv_LV.result
- 0 mg_MG/mg_MG.result
- 0 mi_NZ/mi_NZ.result
- 0 ms_MY/ms_MY.result
- 0 nb_NO/nb_NO.result
- 0 nl_NL/nl_NL.result
- 0 nn_NO/nn_NO.result
- 0 ny_MW/ny_MW.result
- 0 pl_PL/pl_PL.result
- 0 pt_BR/pt_BR.result
- 0 pt_PT/pt_PT.result
- 0 ro_RO/ro_RO.result
- 0 ru_RU/ru_RU.result
- 0 rw_RW/rw_RW.result
- 0 sk_SK/sk_SK.result
- 0 sl_SI/sl_SI.result
- 0 sv_SE/sv_SE.result
- 0 sw_KE/sw_KE.result
- 0 tet_ID/tet_ID.result
- 0 tl_PH/tl_PH.result
- 0 tn_ZA/tn_ZA.result
- 0 uk_UA/uk_UA.result
- 0 zu_ZA/zu_ZA.result
- In en_AU dictionary, there is an abbrevation with two dots (`eqn..'), but
- `eqn.' is missing. Presumably it is a dictionary bug. Myspell also
- haven't accepted it.
- Hungarian dictionary contains pseudoroots and forbidden words.
- Unmunch haven't supported these features yet, and generates bad words, too.
- * check affix rules and OOo dictionaries. Detected bugs in cs_CZ,
- es_ES, es_NEW, es_MX, lt_LT, nn_NO, pt_PT, ro_RO, sk_SK and sv_SE dictionaries).
- Details:
- --------------------------------------------------------
- cs_CZ
- warning - incompatible stripping characters and condition:
- SFX D us ech [^ighk]os
- SFX D us y [^i]os
- SFX Q os ech [^ghk]es
- SFX M o ech [^ghkei]a
- SFX J ém ej ám
- SFX J ém ejme ám
- SFX J ém ejte ám
- SFX A ou¾it up oupit
- SFX A ou¾it upme oupit
- SFX A ou¾it upte oupit
- SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy
- SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy
- es_ES
- warning - incompatible stripping characters and condition:
- SFX W umar úse [ae]husar
- SFX W emir iñáis eñir
- es_NEW
- warning - incompatible stripping characters and condition:
- SFX I unan únen unar
- es_MX
- warning - incompatible stripping characters and condition:
- SFX A a ote e
- SFX W umar úse [ae]husar
- SFX W emir iñáis eñir
- lt_LT
- warning - incompatible stripping characters and condition:
- SFX U ti siuosi tis
- SFX U ti siuosi tis
- SFX U ti siesi tis
- SFX U ti siesi tis
- SFX U ti sis tis
- SFX U ti sis tis
- SFX U ti simës tis
- SFX U ti simës tis
- SFX U ti sitës tis
- SFX U ti sitës tis
- nn_NO
- warning - incompatible stripping characters and condition:
- SFX D ar rar [^fmk]er
- SFX U Øre orde ere
- SFX U Øre ort ere
- pt_PT
- warning - incompatible stripping characters and condition:
- SFX g ãos oas ão
- SFX g ãos oas ão
- ro_RO
- warning - bad field number:
- SFX L 0 le [^cg] i
- SFX L 0 i [cg] i
- SFX U 0 i [^i] ii
- warning - incompatible stripping characters and condition:
- SFX P l i l [<- there is an unnecessary tabulator here)
- SFX I a ii [gc] a
- warning - bad field number:
- SFX I a ii [gc] a
- SFX I a ei [^cg] a
- sk_SK
- warning - incompatible stripping characters and condition:
- SFX T µa» olú kla»
- SFX T µa» olúc kla»
- SFX T sµa» ¹lú sla»
- SFX T sµa» ¹lúc sla»
- SFX R µc» lèiem åc»
- SFX R iás» ätie mias»
- SFX R iez» iem [^i]ez»
- SFX R iez» ie¹ [^i]ez»
- SFX R iez» ie [^i]ez»
- SFX R iez» eme [^i]ez»
- SFX R iez» ete [^i]ez»
- SFX R iez» ú [^i]ez»
- SFX R iez» úc [^i]ez»
- SFX R iez» z [^i]ez»
- SFX R iez» me [^i]ez»
- SFX R iez» te [^i]ez»
- sv_SE
- warning - bad field number:
- SFX C 0 net nets [^e]n
- --------------------------------------------------------
- 2005-08-01: Hunspell 1.0.8 release
- - improved compound word support
- - fix German S handling
- - port MySpell files and MAP feature
- 2005-07-22: Hunspell 1.0.7 release
- 2005-07-21: new home page: http://hunspell.sourceforge.net
|