NEWS 28 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845
  1. 2022-08-22: Hunspell 1.7.1 release:
  2. - Merge chromium fix for #714 OOB string write in hunspell
  3. - Merge firefox fix for #756 various issues parsing incomplete aff files
  4. - Fix #492 crash with hunspell -l -r
  5. - Merge in weblate translations
  6. 2018-11-12: Hunspell 1.7.0 release:
  7. New features and bug fixes by László Németh, supported by FSF.hu Foundation:
  8. - No annoying suggestion times any more, especially in languages with
  9. compound word handling and complex morphology. By adding balanced
  10. multi-level time limits, now the guaranteed suggestion time is there
  11. within half a second, not seconds (nor dozen of seconds or more
  12. in extreme cases) for longer misspellings, too.
  13. - add SPELLML support for run-time dictionary extension with optional
  14. affixation of user words. See new "Grammar By" feature of
  15. language-specific user dictionaries of LibreOffice 6.0:
  16. News: https://wiki.documentfoundation.org/ReleaseNotes/6.0#.E2.80.9CGrammar_By.E2.80.9D_spell_checking
  17. Screencast with English example: https://www.youtube.com/watch?v=EsS3gaBTfOo
  18. Screencast with German example: https://www.youtube.com/watch?v=aYVFDqCUb6I
  19. - Improved, highly customizable suggestions on level of dictionary words:
  20. Pronunciations and typical misspellings defined by optional "ph:" fields of
  21. the dictionary words are used not only in n-gram suggestions, but as
  22. elements of the REP replacement list getting the highest priority in normal
  23. suggestions, also giving the best suggestions for short words, too.
  24. More information: see "ph:" in man 5 hunspell.
  25. - Handling multiple word suggestions is much more easier. Like in a
  26. traditional spelling dictionary, for example, to get the correct suggestion
  27. "a lot" for the typical misspelling "alot" at the first place, now it's
  28. enough to put the following line to the dic(tionary) file:
  29. a lot
  30. - Limit compound overgeneration by dictionary based word pairs:
  31. Now it's possible to filter bad compound words by listing
  32. the correct word pairs with space in the dictionary, as in a traditional
  33. spelling dictionary.
  34. - clean-up suggestion:
  35. - no n-gram and compound word suggestions, if "good" suggestion
  36. exists, ie. uppercase, REP, ph: or dictionary word pair suggestions
  37. - word pairs are always suggested, if they exist in the dic file
  38. - word pairs have top priority in suggestions, and
  39. these are the only suggestions if there is no other good suggestion.
  40. - also dictionary word pairs separated by dash instead of space
  41. are handled specially in two-word suggestion (depending from the
  42. language)
  43. - limit bad suggestions by improved n-gram suggestion rules:
  44. don't suggest capitalized dictionary words for lower
  45. case misspellings in n-gram suggestions, except
  46. - PHONE usage, or
  47. - in the case of German, where not only proper
  48. nouns are capitalized, or
  49. - the capitalized word has special pronunciation
  50. and don't suggest if the difference of lengths of misspellings and
  51. suggestions is 5 or more characters.
  52. - Extend dotless i and dotted I rules to Crimean Tatar language
  53. Allow dotted I in dictionary, and disable bad capitalization of i.
  54. - BREAK: extended recursive word breaking algorithm to handle words or
  55. words with suffixes when they already contain word break characters,
  56. for example, "e-mail" is a dictionary word with a word break character, and
  57. it wasn't accepted before in compounds in some languages.
  58. - FORBIDDENWORD precedes BREAK: Now it's possible to forbid compound
  59. forms recognized by BREAK word breaking by adding the bad compounds to
  60. the dictionary with FORBIDDENWORD flags.
  61. - lower limit for "doubletwochars" suggestion algorithm:
  62. one of the typical misspellings recognized by Hunspell suggestion
  63. mechanism is the syllable duplication. Along the old pattern
  64. ABABA -> ABA, for example nutrITITIon -> nutrITIon, now also the
  65. simpler ABAB -> AB pattern is recognized in non-starting position,
  66. for example, regretTETEd -> regretTEd.
  67. - lower limit for longswapchar and movechar: recognized only max.
  68. 4-character distances to avoid slow and bad suggestions.
  69. - fix compound handling for new Hungarian orthography reform
  70. - Allow suggestion search for prefix + *two suffixes*:
  71. Remove artificial performance limit to get correct
  72. suggestions for relatively simple misspellings in
  73. Hungarian, etc., when the word form contains prefix
  74. and both derivative and inflectional suffixes, too:
  75. lefikszálása -> lefixálása
  76. Improvements for command-line Hunspell:
  77. - Remove false alarms during checking OpenDocument (ODF)
  78. documents by ignoring <text:span> elements. (LibreOffice
  79. creates a lot of <text:span> elements also within words
  80. during text reediting, resulted often huge amount of broken
  81. words before this fix.)
  82. - List filenames during filtering multiple files in command-line:
  83. Examples:
  84. $ hunspell -l *.odt
  85. a.odt: mispelling
  86. b.odt: egzample
  87. $ hunspell -l -G *.odt
  88. a.odt: good
  89. b.odt: words
  90. - Dictionary search by option -D doesn't wait for the standard input
  91. (fixed by Siva Mahadevan)
  92. Other improvements:
  93. - makealias dictionary compression: add option --minimize-diff
  94. to reuse free positions of alias lists to create minimal and
  95. readable diffs for alias compressed dictionaries stored in
  96. revision control systems, as dictionaries of LibreOffice.
  97. - Brazilian-Portuguese translation by Rafael Fontenelle
  98. - Catalan translation by robert dot buj at gmail
  99. - Minor bug fixes by several contributors, see git log
  100. 2017-09-03: Hunspell 1.6.2 release:
  101. - Library changes: no. Same as 1.6.1.
  102. - Command line tool:
  103. - Added German translation
  104. - Fixed bug with wrong output encoding, not respecting system locale.
  105. 2017-03-25: Hunspell 1.6.1 release:
  106. - Library changes:
  107. - Performance improvements in suggest()
  108. - Fixes regressions for Hungarian related to compounding.
  109. - Fixes regressions for Korean related to ICONV.
  110. - Command line tool:
  111. - Added Tajik translation
  112. - Fix regarding serching of OOo dicts installed in user folder
  113. - Manpages:
  114. - Fix microsoft-cp1251 to cp1251. Dicts should not use the first.
  115. - Typos.
  116. 2016-12-22: Hunspell 1.6.0 release:
  117. - Library changes:
  118. - Performance improvement in ngsuggest(), suggestions should be faster.
  119. - Revert MAXWORDLEN to 100 as in 1.3.3 for performance reasons.
  120. - MAXWORDLEN can be set during build time with -D defines.
  121. - Fix crash when word with 102 consecutive X is spelled.
  122. - Command line tool:
  123. - -D shows all loaded dictionares insted of only the first.
  124. - -D properly lists all available dictionaries on Windows.
  125. 2016-11-30: Hunspell 1.5.4 release:
  126. - Fixes the command COMPOUNDSYLLABLE used in Hungarian dictionary.
  127. 2016-11-28: Hunspell 1.5.3 release:
  128. - Removed a #include from hunspell.hxx that was creating trouble
  129. 2016-11-27: Hunspell 1.5.2 release:
  130. - Reverted full backward compatibility with 1.4 public API, again
  131. 2016-11-27: Hunspell 1.5.1 release:
  132. - Reverted full backward compatibility with 1.4 public API
  133. 2016-11-18: Hunspell 1.5.0 release:
  134. - Lot of stability fixes
  135. - Fixed compilation errors on various systems (Windows, FreeBSD)
  136. - Small performance improvement compared to 1.4.0
  137. - The C++ API is updated to use modern C++ types (string, vector).
  138. Backward compatibility is kept for most of the functions except for
  139. the following:
  140. - get_wordchars();
  141. - get_version();
  142. - input_conv(string, string);
  143. - removed get_csconv();
  144. 2016-04-15: Hunspell 1.4.0 release:
  145. - various abi changes due to moving away from char* to std::string
  146. 2014-06-02: Hunspell 1.3.3 release:
  147. - OpenDocument (ODF and Flat ODF) support (ODF needs unzip program)
  148. - various bug fixes
  149. 2011-02-02: Hunspell 1.3.2 release:
  150. - fix library versioning
  151. - improved manual
  152. 2011-02-02: Hunspell 1.3.1 release:
  153. - bug fixes
  154. 2011-01-26: Hunspell 1.2.15/1.3 release:
  155. - new features: MAXDIFF, ONLYMAXDIFF, MAXCPDSUGS, FORBIDWARN, see manual
  156. - bug fixes
  157. 2011-01-21:
  158. - new features: FORCEUCASE and WARN, see manual
  159. - new options: -r to filter potential mistakes (rare words
  160. signed by flag WARN in the dictionary)
  161. - limited and optimized suggestions
  162. 2011-01-06: Hunspell 1.2.14 release:
  163. - bug fix
  164. 2011-01-03: Hunspell 1.2.13 release:
  165. - bug fixes
  166. - improved compound handling and
  167. other improvements supported by OpenTaal Foundation, Netherlands
  168. 2010-07-15: Hunspell 1.2.12 release
  169. 2010-05-06: Hunspell 1.2.11 release:
  170. - Maintenance release bug fixes
  171. 2010-04-30: Hunspell 1.2.10 release:
  172. - Maintenance release bug fixes
  173. 2010-03-03: Hunspell 1.2.9 release:
  174. - Maintenance release bug fixes and warnings
  175. - MAP support for composed characters or character sequences
  176. 2008-11-01: Hunspell 1.2.8 release:
  177. - Default BREAK feature and better hyphenated word suggestion to accept
  178. and fix (compound) words with hyphen characters by spell checker
  179. instead of by work breaking code of OpenOffice.org. With this feature
  180. it's possible to accept hyphenated compound words, such as "scot-free",
  181. where "scot" is not a correct English word.
  182. - ICONV & OCONV: input and output conversion tables for optional character
  183. handling or using special inner format. Example:
  184. # Accepting de facto replacements of the Romanian comma acuted letters
  185. SET UTF-8
  186. ICONV 4
  187. ICONV ÅŸ È™
  188. ICONV ţ ț
  189. ICONV Ş Ș
  190. ICONV Ţ Ț
  191. Typical usage of ICONV/OCONV is to manage an inner format for a segmental
  192. writing system, like the Ethiopic script of the Amharic language.
  193. - Extended CHECKCOMPOUNDPATTERN to handle conpound word alternations, like
  194. sandhi feature of Telugu and other writing systems.
  195. - SIMPLIFIEDTRIPLE compound word feature: allow simplified Swedish and
  196. Norwegian compound word forms, like tillåta (till|låta) and
  197. bussjåfør (buss|sjåfør)
  198. - wordforms: word generator script for dictionary developers (Hunspell
  199. version of unmunch).
  200. - bug fixes
  201. 2008-08-15: Hunspell 1.2.7 release:
  202. - FULLSTRIP: new option for affix handling. With FULLSTRIP, affix rules can
  203. strip full words, not only one less characters.
  204. - COMPOUNDRULE works with all flag types. (COMPOUNDRULE is for pattern
  205. matching. For example, en_US dictionary of OpenOffice.org uses COMPOUNDRULE
  206. for ordinal number recognition: 1st, 2nd, 11th, 12th, 22nd, 112th, 1000122nd
  207. etc.).
  208. - optimized suggestions:
  209. - modified 1-character distance suggestion algorithms: search a TRY character
  210. in all position instead of all TRY characters in a character position
  211. (it can give more readable suggestion order, also better suggestions
  212. in the first positions, when TRY characters are sorted by frequency.)
  213. For example, suggestions for "moze":
  214. ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6),
  215. maze, more, mote, ooze, mole etc. (Hunspell 1.2.7).
  216. - extended compound word checking for better COMPOUNDRULE related
  217. suggestions, for example English ordinal numbers: 121323th -> 121323rd
  218. (it needs also a th->rd REP definition).
  219. - bug fixes
  220. 2008-07-15: Hunspell 1.2.6 release:
  221. - bug fix release (fix affix rule condition checking of sk_SK dictionary,
  222. iconv support in stemming and morphological analysis of the Hunspell
  223. utility, see also Changelog)
  224. 2008-07-09: Hunspell 1.2.5 release:
  225. - bug fix release (fix affix rule condition checking of en_GB dictionary,
  226. also morphological analysis by dictionaries with two-level suffixes)
  227. 2008-06-18: Hunspell 1.2.4-2 release:
  228. - fix GCC compiler warnings
  229. 2008-06-17: Hunspell 1.2.4 release:
  230. - add free_list() for C, C++ interfaces to deallocate suggestion lists
  231. - bug fixes
  232. 2008-06-17: Hunspell 1.2.3 release:
  233. - extended XML interface to use morphological functions by standard
  234. spell checking interface, spell() and suggest(). See hunspell.3 manual page.
  235. - default dash suggestions for compound words: newword-> new word and new-word
  236. - new manual pages: hunspell.3, hzip.1, hunzip.1.
  237. - bug fixes
  238. 2008-04-12: Hunspell 1.2.2 release:
  239. - extended dictionary (dic file) support to use multiple base and
  240. special dictionaries.
  241. - new and improved options of command line hunspell:
  242. -m: morphological analysis or flag debug mode (without affix
  243. rule data it signs the flag of the affix rules)
  244. -s: stemming mode
  245. -D: list available dictionaries and search path
  246. -d: support extra dictionaries by comma separated list. Example:
  247. hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt
  248. - forbidding in personal dictionary (with asterisk, / signs affixation)
  249. - optional compressed dictionary format "hzip" for aff and dic files
  250. usage:
  251. hzip example.aff example.dic
  252. mv example.aff example.dic /tmp
  253. hunspell -d example
  254. hunzip example.aff.hz >example.aff
  255. hunzip example.dic.hz >example.dic
  256. - new affix compression tool "affixcompress": compression tool for
  257. large (millions of words) dictionaries.
  258. - support encrypted dictionaries for closed OpenOffice.org extensions or
  259. other commercial programs
  260. - improved manual
  261. - bug fixes
  262. 2007-11-01: Hunspell 1.2.1 release:
  263. - new memory efficient condition checking algorithm for affix rules
  264. - new morphological functions:
  265. - stem() for stemming
  266. - analyze() for morphological analysis
  267. - generate() for morphological generation
  268. - new demos:
  269. - analyze: stemming, morphological analysis and generation
  270. - chmorph: morphological conversion of texts
  271. 2007-09-05: Hunspell 1.1.12 release:
  272. - dictionary based phonetic suggestion for words with
  273. special or foreign pronounciation or alternative (bad) transliteration
  274. (see Changelog, tests/phone.* and manual).
  275. - improved data structure and memory optimization for dictionaries
  276. with variable count fields
  277. - bug fixes for Unicode encoding dictionaries and ngram suggestions
  278. - improved REP suggestions with space: it works without dictionary
  279. modification
  280. - updated and new project files for Windows API
  281. 2007-08-27: Hunspell 1.1.11 release:
  282. - portability fixes
  283. 2007-08-23: Hunspell 1.1.10 release:
  284. - pronounciation based suggestion using Björn Jacke's original Aspell
  285. phonetic transcription algorithm (http://aspell.net), relicensed under
  286. GPL/LGPL/MPL tri-license with the permission of the author
  287. - keyboard base suggestion by KEY (see manual)
  288. - better time limits for suggestion search
  289. - test environment for suggestion based on Wikipedia data
  290. - bug fixes for non standard Mozilla platforms etc.
  291. 2007-07-25: Hunspell 1.1.9 release:
  292. - better tokenization:
  293. - for URLs, mail addresses and directory paths (default: skip these tokens)
  294. - for colons in words (for Finnish and Swedish)
  295. - new examples:
  296. - affixation of personal dictionary words
  297. - digits in words
  298. - bug fixes (see ChangeLog)
  299. 2007-07-16: Hunspell 1.1.8 release:
  300. - better Mac OS X/Cygwin and Windows compatibility
  301. - fix Hunspell's Valgrind environment and memory handling errors
  302. detected by Valgrind
  303. - other bug fixes (see ChangeLog)
  304. 2007-07-06: Hunspell 1.1.7 release:
  305. - fix warning messages of OpenOffice.org build
  306. 2007-06-29: Hunspell 1.1.6 release:
  307. - check capitalization of the following word forms
  308. - words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG
  309. - allcap words and suffixes: UNICEF's - UNICEF'S
  310. - prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA
  311. - suggestion for missing sentence spacing: something.The -> something. The
  312. - Hunspell executable: improved locale support
  313. - -i option: custom input encoding
  314. - use locale data for default dictionary names.
  315. - tools/hunspell.cxx: fix 8-bit tokenization (letters without
  316. casing, like ß or Hebrew characters now are handled well)
  317. - dictionary search path (automatic detection of OpenOffice.org directories)
  318. - DICPATH environmental variable
  319. - -D option: show directory path of loaded dictionary
  320. - patches and bug fixes for Mozilla, OpenOffice.org.
  321. 2007-03-19: Hunspell 1.1.5 release:
  322. - optimizations: 10-100% speed up, smaller code size and memory footprint
  323. (conditional experimental code and warning messages)
  324. - extended Unicode support:
  325. - non BMP Unicode characters in dictionary words and affixes (except
  326. affix rules and conditions)
  327. - support BOM sequence in aff and dic files
  328. - IGNORE feature for Arabic diacritics and other optional characters
  329. - New edit distance suggestion methods:
  330. - capitalisation: nasa -> NASA
  331. - long swap: permenant -> permanent
  332. - long move: Ghandi -> Gandhi, greatful -> grateful
  333. - double two characters: vacacation -> vacation
  334. - spaces in REP sug.: REP alot a_lot (NOTE: "a lot" must be a dictionary word)
  335. - patches and bug fixes for Mozilla, OpenOffice.org, Emacs, MinGW, Aqua,
  336. German and Arabic language, etc.
  337. 2006-02-01: Hunspell 1.1.4 release:
  338. - Improved suggestion for typical OCR bugs (missing spaces between
  339. capitalized words). For example: "aNew" -> "a New".
  340. http://qa.openoffice.org/issues/show_bug.cgi?id=58202
  341. - tokenization fixes (fix incomplete tokenization of input texts on big-endian
  342. platforms, and locale-dependent tokenization of dictionary entries)
  343. 2006-01-06: Hunspell 1.1.3.2 release:
  344. - fix Visual C++ compiling errors
  345. 2006-01-05: Hunspell 1.1.3 release:
  346. - GPL/LGPL/MPL tri-license for Mozilla integration
  347. - Alias compression of flag sets and morphological descriptions.
  348. (For example, 16 MB Arabic dic file can be compressed to 1 MB.)
  349. - Improved suggestion.
  350. - Improved, language independent German sharp s casing with CHECKSHARPS
  351. declaration.
  352. - Unicode tokenization in Hunspell program.
  353. - Bug fixes (at new and old compound word handling methods), etc.
  354. 2005-11-11: Hunspell 1.1.2 release:
  355. - Bug fixes (MAP Unicode, COMPOUND pattern matching, ONLYINCOMPOUND
  356. suggestions)
  357. - Checked with 51 regression tests in Valgrind debugging environment,
  358. and tested with 52 OOo dictionaries on i686-pc-linux platform.
  359. 2005-11-09: Hunspell 1.1.1 release:
  360. - Compound word patterns for complex compound word handling and
  361. simple word-level lexical scanning. Ideal for checking
  362. Arabic and Roman numbers, ordinal numbers in English, affixed
  363. numbers in agglutinative languages, etc.
  364. http://qa.openoffice.org/issues/show_bug.cgi?id=53643
  365. - Support ISO-8859-15 encoding for French (French oe ligatures are
  366. missing from the latin-1 encoding).
  367. http://qa.openoffice.org/issues/show_bug.cgi?id=54980
  368. - Implemented a flag to forbid obscene word suggestion:
  369. http://qa.openoffice.org/issues/show_bug.cgi?id=55498
  370. - Checked with 50 regression tests in Valgrind debugging environment,
  371. and tested with 52 OOo dictionaries.
  372. - other improvements and bug fixes (see ChangeLog)
  373. 2005-09-19: Hunspell 1.1.0 release
  374. * complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta)
  375. * improved ngram suggestion with swap character detection and
  376. case insensitivity
  377. ------ examples for ngram improvement (input word and suggestions) -----
  378. 1. pernament (instead of permanent)
  379. MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
  380. ornament, ornamentals, ornamental, ornamentally
  381. Hunspell 1.0.9: ornamental, ornament, tournament
  382. Hunspell 1.1.0: permanent
  383. Note: swap character detection
  384. 2. PERNAMENT (instead of PERMANENT)
  385. MySpell 3.2: -
  386. Hunspell 1.0.9: -
  387. Hunspell 1.1.0: PERMANENT
  388. 3. Unesco (instead of UNESCO)
  389. MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's,
  390. Frescoed, Fresco, Escorts, Escorting
  391. Hunspell 1.0.9: Genesco, Ionesco, Fresco
  392. Hunspell 1.1.0: UNESCO
  393. 4. siggraph's (instead of SIGGRAPH's)
  394. MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's,
  395. physiography, digraphs, serigraph, stratigraphy's, stratigraphy
  396. epigraphs
  397. Hunspell 1.0.9: serigraph's, epigraph's, digraph's
  398. Hunspell 1.1.0: SIGGRAPH's
  399. --------------- end of examples --------------------
  400. * improved testing environment with suggestion checking and memory debugging
  401. memory debugging of all tests with a simple command:
  402. VALGRIND=memcheck make check
  403. * lots of other improvements and bug fixes (see ChangeLog)
  404. 2005-08-26: Hunspell 1.0.9 release
  405. * improved related character map suggestion
  406. * improved ngram suggestion
  407. ------ examples for ngram improvement (O=old, N = new ngram suggestions) --
  408. 1. Permenant (instead of Permanent)
  409. O: Endangerment, Ferment, Fermented, Deferment's, Empowerment,
  410. Ferment's, Ferments, Fermenting, Countermen, Weathermen
  411. N: Permanent, Supermen, Preferment
  412. Note: Ngram suggestions was case sensitive.
  413. 2. permenant (instead of permanent)
  414. O: supermen, newspapermen, empowerment, endangerment, preferments,
  415. preferment, permanent, preferment's, permanently, impermanent
  416. N: permanent, supermen, preferment
  417. Note: new suggestions are also weighted with longest common subsequence,
  418. first letter and common character positions
  419. 3. pernemant (instead of permanent)
  420. O: pimpernel's, pimpernel, pimpernels, permanently, permanents, permanent,
  421. supernatant, impermanent, semipermanent, impermanently
  422. N: permanent, supernatant, pimpernel
  423. Note: new method also prefers root word instead of not
  424. relevant affixes ('s, s and ly)
  425. 4. pernament (instead of permanent)
  426. O: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
  427. ornament, ornamentals, ornamental, ornamentally
  428. N: ornamental, ornament, tournament
  429. Note: Both ngram methods misses here.
  430. 5. obvus (instad of obvious):
  431. O: obvious, Corvus, obverse, obviously, Jacobus, obtuser, obtuse,
  432. obviates, obviate, Travus
  433. N: obvious, obtuse, obverse
  434. Note: new method also prefers common first letters.
  435. 6. unambigus (instead of unambiguous)
  436. O: unambiguous, unambiguity, unambiguously, ambiguously, ambiguous,
  437. unambitious, ambiguities, ambiguousness
  438. N: unambiguous, unambiguity, unambitious
  439. 7. consecvence (instead of consequence)
  440. O: consecutive, consecutively, consecutiveness, nonconsecutive, consequence,
  441. consecutiveness's, convenience's, consistences, consistence
  442. N: consequence, consecutive, consecrates
  443. An example in a language with rich morphology:
  444. 8. Misisipiben (instead of Mississippiben [`in Mississippi' in Hungarian]):
  445. O: Misikédéiben, Pisisedéiben, Misikéiéiben, Pisisekéiben, Misikéiben,
  446. Misikéidéiben, Misikékéiben, Misikéikéiben, Misikéiméiben, Mississippiiben
  447. N: Mississippiben, Mississippiiben, Misiiben
  448. Note: Suggesting not relevant affixes was the biggest fault in ngram
  449. suggestion for languages with a lot of affixes.
  450. --------------- end of examples --------------------
  451. * support twofold prefix cutting
  452. * lots of other improvements and bug fixes (see ChangeLog)
  453. * test Hunspell with 54 OpenOffice.org dictionaries:
  454. source: ftp://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries
  455. testing shell script:
  456. -------------------------------------------------------
  457. for i in `ls *zip | grep '^[a-z]*_[A-Z]*[.]'`
  458. do
  459. dic=`basename $i .zip`
  460. mkdir $dic
  461. echo unzip $dic
  462. unzip -d $dic $i 2>/dev/null
  463. cd $dic
  464. echo unmunch and test $dic
  465. unmunch $dic.dic $dic.aff 2>/dev/null | awk '{print$0"\t"}' |
  466. hunspell -d $dic -l -1 >$dic.result 2>$dic.err || rm -f $dic.result
  467. cd ..
  468. done
  469. --------------------------------------------------------
  470. test result (0 size is o.k.):
  471. $ for i in *_*/*.result; do wc -c $i; done
  472. 0 af_ZA/af_ZA.result
  473. 0 bg_BG/bg_BG.result
  474. 0 ca_ES/ca_ES.result
  475. 0 cy_GB/cy_GB.result
  476. 0 cs_CZ/cs_CZ.result
  477. 0 da_DK/da_DK.result
  478. 0 de_AT/de_AT.result
  479. 0 de_CH/de_CH.result
  480. 0 de_DE/de_DE.result
  481. 0 el_GR/el_GR.result
  482. 6 en_AU/en_AU.result
  483. 0 en_CA/en_CA.result
  484. 0 en_GB/en_GB.result
  485. 0 en_NZ/en_NZ.result
  486. 0 en_US/en_US.result
  487. 0 eo_EO/eo_EO.result
  488. 0 es_ES/es_ES.result
  489. 0 es_MX/es_MX.result
  490. 0 es_NEW/es_NEW.result
  491. 0 fo_FO/fo_FO.result
  492. 0 fr_FR/fr_FR.result
  493. 0 ga_IE/ga_IE.result
  494. 0 gd_GB/gd_GB.result
  495. 0 gl_ES/gl_ES.result
  496. 0 he_IL/he_IL.result
  497. 0 hr_HR/hr_HR.result
  498. 200694989 hu_HU/hu_HU.result
  499. 0 id_ID/id_ID.result
  500. 0 it_IT/it_IT.result
  501. 0 ku_TR/ku_TR.result
  502. 0 lt_LT/lt_LT.result
  503. 0 lv_LV/lv_LV.result
  504. 0 mg_MG/mg_MG.result
  505. 0 mi_NZ/mi_NZ.result
  506. 0 ms_MY/ms_MY.result
  507. 0 nb_NO/nb_NO.result
  508. 0 nl_NL/nl_NL.result
  509. 0 nn_NO/nn_NO.result
  510. 0 ny_MW/ny_MW.result
  511. 0 pl_PL/pl_PL.result
  512. 0 pt_BR/pt_BR.result
  513. 0 pt_PT/pt_PT.result
  514. 0 ro_RO/ro_RO.result
  515. 0 ru_RU/ru_RU.result
  516. 0 rw_RW/rw_RW.result
  517. 0 sk_SK/sk_SK.result
  518. 0 sl_SI/sl_SI.result
  519. 0 sv_SE/sv_SE.result
  520. 0 sw_KE/sw_KE.result
  521. 0 tet_ID/tet_ID.result
  522. 0 tl_PH/tl_PH.result
  523. 0 tn_ZA/tn_ZA.result
  524. 0 uk_UA/uk_UA.result
  525. 0 zu_ZA/zu_ZA.result
  526. In en_AU dictionary, there is an abbrevation with two dots (`eqn..'), but
  527. `eqn.' is missing. Presumably it is a dictionary bug. Myspell also
  528. haven't accepted it.
  529. Hungarian dictionary contains pseudoroots and forbidden words.
  530. Unmunch haven't supported these features yet, and generates bad words, too.
  531. * check affix rules and OOo dictionaries. Detected bugs in cs_CZ,
  532. es_ES, es_NEW, es_MX, lt_LT, nn_NO, pt_PT, ro_RO, sk_SK and sv_SE dictionaries).
  533. Details:
  534. --------------------------------------------------------
  535. cs_CZ
  536. warning - incompatible stripping characters and condition:
  537. SFX D us ech [^ighk]os
  538. SFX D us y [^i]os
  539. SFX Q os ech [^ghk]es
  540. SFX M o ech [^ghkei]a
  541. SFX J ém ej ám
  542. SFX J ém ejme ám
  543. SFX J ém ejte ám
  544. SFX A ou¾it up oupit
  545. SFX A ou¾it upme oupit
  546. SFX A ou¾it upte oupit
  547. SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy
  548. SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy
  549. es_ES
  550. warning - incompatible stripping characters and condition:
  551. SFX W umar úse [ae]husar
  552. SFX W emir iñáis eñir
  553. es_NEW
  554. warning - incompatible stripping characters and condition:
  555. SFX I unan únen unar
  556. es_MX
  557. warning - incompatible stripping characters and condition:
  558. SFX A a ote e
  559. SFX W umar úse [ae]husar
  560. SFX W emir iñáis eñir
  561. lt_LT
  562. warning - incompatible stripping characters and condition:
  563. SFX U ti siuosi tis
  564. SFX U ti siuosi tis
  565. SFX U ti siesi tis
  566. SFX U ti siesi tis
  567. SFX U ti sis tis
  568. SFX U ti sis tis
  569. SFX U ti simës tis
  570. SFX U ti simës tis
  571. SFX U ti sitës tis
  572. SFX U ti sitës tis
  573. nn_NO
  574. warning - incompatible stripping characters and condition:
  575. SFX D ar rar [^fmk]er
  576. SFX U Øre orde ere
  577. SFX U Øre ort ere
  578. pt_PT
  579. warning - incompatible stripping characters and condition:
  580. SFX g ãos oas ão
  581. SFX g ãos oas ão
  582. ro_RO
  583. warning - bad field number:
  584. SFX L 0 le [^cg] i
  585. SFX L 0 i [cg] i
  586. SFX U 0 i [^i] ii
  587. warning - incompatible stripping characters and condition:
  588. SFX P l i l [<- there is an unnecessary tabulator here)
  589. SFX I a ii [gc] a
  590. warning - bad field number:
  591. SFX I a ii [gc] a
  592. SFX I a ei [^cg] a
  593. sk_SK
  594. warning - incompatible stripping characters and condition:
  595. SFX T µa» olú kla»
  596. SFX T µa» olúc kla»
  597. SFX T sµa» ¹lú sla»
  598. SFX T sµa» ¹lúc sla»
  599. SFX R µc» lèiem åc»
  600. SFX R iás» ätie mias»
  601. SFX R iez» iem [^i]ez»
  602. SFX R iez» ie¹ [^i]ez»
  603. SFX R iez» ie [^i]ez»
  604. SFX R iez» eme [^i]ez»
  605. SFX R iez» ete [^i]ez»
  606. SFX R iez» ú [^i]ez»
  607. SFX R iez» úc [^i]ez»
  608. SFX R iez» z [^i]ez»
  609. SFX R iez» me [^i]ez»
  610. SFX R iez» te [^i]ez»
  611. sv_SE
  612. warning - bad field number:
  613. SFX C 0 net nets [^e]n
  614. --------------------------------------------------------
  615. 2005-08-01: Hunspell 1.0.8 release
  616. - improved compound word support
  617. - fix German S handling
  618. - port MySpell files and MAP feature
  619. 2005-07-22: Hunspell 1.0.7 release
  620. 2005-07-21: new home page: http://hunspell.sourceforge.net