History of file gc_core/py/tokenizer.py at check-in 9aaa6194916dd234
| 2020-12-02 | ||
| 07:57 | [graphspell] tokenizer update file: [f44ddea977] check-in: [9678e9208c] user: olr, branch: trunk, size: 4031 [annotate] [blame] [check-ins using] [diff] | |
| 2020-11-30 | ||
| 15:15 | [graphspell][fx] update tokenizer and lexicographer: add symbols and emojis file: [1e4c6dee79] check-in: [b3448ac17f] user: olr, branch: trunk, size: 3769 [annotate] [blame] [check-ins using] [diff] | |
| 2020-11-25 | ||
| 20:50 | [graphspell][fr][fx] rename tokens file: [84d5574a19] check-in: [6aae160f81] user: olr, branch: trunk, size: 3513 [annotate] [blame] [check-ins using] [diff] | |
| 2020-10-02 | ||
| 09:32 | [graphspell] tokenizer: token UNDERSCORE file: [88086e8ef7] check-in: [e2313363fe] user: olr, branch: trunk, size: 3522 [annotate] [blame] [check-ins using] [diff] | |
| 2020-10-01 | ||
| 14:50 | [graphspell] tokenizer: exclude underscore from WORD token [fr] ajustements, écriture inclusive file: [3a068b8368] check-in: [cfbaf0ad4e] user: olr, branch: trunk, size: 3452 [annotate] [blame] [check-ins using] [diff] | |
| 2020-09-02 | ||
| 09:07 | [graphspell] tokenizer: token OTHER as fallback file: [5243432861] check-in: [e201630bf5] user: olr, branch: trunk, size: 3416 [annotate] [blame] [check-ins using] [diff] | |
| 2020-05-07 | ||
| 10:35 | [graphspell] tokenizer and suggestion engine: other apostrophes file: [b7228e1a86] check-in: [b68161b398] user: olr, branch: trunk, size: 3256 [annotate] [blame] [check-ins using] [diff] | |
| 2020-04-20 | ||
| 18:02 | [graphspell] tokenizer: combining diacritics recognition and NFC normalization file: [81da836011] check-in: [3ef2bdb736] user: olr, branch: trunk, size: 3242 [annotate] [blame] [check-ins using] [diff] | |
| 2019-09-01 | ||
| 08:22 | [graphspell] tokenizer: handles all kinds of apostrophes file: [a2c42f5f3e] check-in: [1bdedd3133] user: olr, branch: trunk, size: 3159 [annotate] [blame] [check-ins using] [diff] | |
| 2019-08-30 | ||
| 09:45 | [graphspell] tokenizer: consider presqu’ and quelqu’ as separate words file: [18d4ef9f97] check-in: [0f0bc77645] user: olr, branch: trunk, size: 3149 [annotate] [blame] [check-ins using] [diff] | |
| 2019-07-30 | ||
| 20:06 | [graphspell][fr] update tokenizer: ordinals file: [af7051a739] check-in: [dcdb32b057] user: olr, branch: trunk, size: 3135 [annotate] [blame] [check-ins using] [diff] | |
| 2019-06-09 | ||
| 06:32 | [graphspell] tokenizer: update HOUR file: [62c6ff6f25] check-in: [1bc78ce87f] user: olr, branch: trunk, size: 3105 [annotate] [blame] [check-ins using] [diff] | |
| 2019-05-15 | ||
| 11:55 | [graphspell][core][fr] code cleaning (pylint) file: [7d6a173497] check-in: [c65b7e2b8b] user: olr, branch: trunk, size: 3075 [annotate] [blame] [check-ins using] [diff] | |
| 2019-05-14 | ||
| 15:19 | [graphspell] tokenizer: update for HOUR tokens file: [08b2581ffe] check-in: [63672ef096] user: olr, branch: trunk, size: 2897 [annotate] [blame] [check-ins using] [diff] | |
| 2019-05-02 | ||
| 08:16 | [graphspell] tokinizer: update file: [3330e91775] check-in: [7d30bbec37] user: olr, branch: trunk, size: 2891 [annotate] [blame] [check-ins using] [diff] | |
| 07:50 | [graphspell] tokinizer: update file: [07708a4bf1] check-in: [ed3b7acf68] user: olr, branch: trunk, size: 2863 [annotate] [blame] [check-ins using] [diff] | |
| 2019-02-22 | ||
| 11:53 | [graphspell][fr] tokenisation: +signes €$# (faux positif) file: [13303390f7] check-in: [365d3554c7] user: olr, branch: trunk, size: 2845 [annotate] [blame] [check-ins using] [diff] | |
| 2018-07-17 | ||
| 06:42 | [graphspell] tokenizer: remove hyphen in number detection (always considered as a separate sign) file: [daca54adb9] check-in: [6950f5898f] user: olr, branch: rg, size: 2835 [annotate] [blame] [check-ins using] [diff] | |
| 2018-06-30 | ||
| 06:30 | [graphspell][bug] tokenizer: syntax error file: [b1bcfc3595] check-in: [ec92f6e873] user: olr, branch: rg, size: 2839 [annotate] [blame] [check-ins using] [diff] | |
| 2018-06-29 | ||
| 22:46 | [graphspell] tokenizer: add lMorph to <start> and <end> tokens file: [2adea5dc85] check-in: [2dbf497b04] user: olr, branch: rg, size: 2841 [annotate] [blame] [check-ins using] [diff] | |
| 2018-06-28 | ||
| 08:26 | [graphspell][core] tokenizer: rename ACRONYM tokens to WORD_ACRONYM file: [a1211301ce] check-in: [ccbbecbd1b] user: olr, branch: rg, size: 2795 [annotate] [blame] [check-ins using] [diff] | |
| 08:00 | [graphspell] tokenizer: rename ORDINAL tokens to WORD_ORDINAL file: [026a9c1064] check-in: [20dbc28ded] user: olr, branch: rg, size: 2785 [annotate] [blame] [check-ins using] [diff] | |
| 07:53 | [graphspell][core] tokenizer: rename ELPFX tokens to WORD_ELIDED file: [8cf6a6bb2e] check-in: [a1b165e276] user: olr, branch: rg, size: 2780 [annotate] [blame] [check-ins using] [diff] | |
| 2018-06-24 | ||
| 11:39 | [graphspell] code cleaning (pylint) file: [7c766445e1] check-in: [814d73b60e] user: olr, branch: rg, size: 2774 [annotate] [blame] [check-ins using] [diff] | |
| 2018-06-18 | ||
| 20:12 | [graphspell] tokenizer: new signs file: [044a0c747a] check-in: [da0d308818] user: olr, branch: rg, size: 2649 [annotate] [blame] [check-ins using] [diff] | |
| 2018-06-17 | ||
| 13:11 | [graphspell] tokenizer: update ordinals file: [84dbf58ecd] check-in: [4be13a74c3] user: olr, branch: rg, size: 2559 [annotate] [blame] [check-ins using] [diff] | |
| 2018-06-12 | ||
| 11:24 | [core] text processor: communication between regex rules and graph rules + [graphspell][bug] tokenizer: set i variable to 0, if sentence is empty file: [30951f1c9c] check-in: [cca3887aad] user: olr, branch: rg, size: 2509 [annotate] [blame] [check-ins using] [diff] | |
| 2018-06-02 | ||
| 13:47 | [graphspell] tokenizer: add option for <start> and <end> tokens file: [b723a02695] check-in: [3339da6424] user: olr, branch: rg, size: 2495 [annotate] [blame] [check-ins using] [diff] | |
| 2018-05-18 | ||
| 13:11 | [graphspell] tokenizer: add token index and avoid punctuations aggregation file: [b3cbfe75ea] check-in: [be6d99bbdc] user: olr, branch: rg, size: 2201 [annotate] [blame] [check-ins using] [diff] | |
| 2017-12-24 | ||
| 18:39 | Renamed gc_core/py/tokenizer.py → graphspell/tokenizer.py. [build][py] move files from gc_core to graphspell file: [17f452887e] check-in: [bb8356bd7d] user: olr, branch: graphspell, size: 2146 [annotate] [blame] [check-ins using] [diff] | |
| 2017-11-12 | ||
| 13:22 | [core][fx] tokenizer: +acronyms file: [17f452887e] check-in: [fa1205c098] user: olr, branch: Lexicographe, size: 2146 [annotate] [blame] [check-ins using] [diff] | |
| 2017-10-26 | ||
| 05:49 | [core] tokenizer: better regex for URLs and folders file: [5a9c0c9105] check-in: [843c0244bc] user: olr, branch: trunk, size: 2018 [annotate] [blame] [check-ins using] [diff] | |
| 2017-10-25 | ||
| 18:34 | [core][bug] fix tokenizer for URL file: [829b056f2c] check-in: [ee7d44a3ee] user: olr, branch: trunk, size: 2010 [annotate] [blame] [check-ins using] [diff] | |
| 2017-10-24 | ||
| 11:59 | [core] fix tokentizer: two similar group name in regex file: [353949869b] check-in: [78199c4006] user: olr, branch: trunk, size: 2006 [annotate] [blame] [check-ins using] [diff] | |
| 11:00 | [core] tokenization: folders file: [d05a70dbc3] check-in: [35c48d42a8] user: olr, branch: trunk, size: 2002 [annotate] [blame] [check-ins using] [diff] | |
| 2017-04-25 | ||
| 11:51 | Added: commit 1 file: [27b6fefad2] check-in: [2fd7dc4dd5] user: olr, branch: trunk, size: 1474 [annotate] [blame] [check-ins using] | |