Overview
| Comment: | [graphspell] tokenizer: add lMorph to <start> and <end> tokens |
|---|---|
| Downloads: | Tarball | ZIP archive | SQL archive |
| Timelines: | family | ancestors | descendants | both | graphspell | rg |
| Files: | files | file ages | folders |
| SHA3-256: |
2dbf497b0475e7d7bfb33fc484de7cef |
| User & Date: | olr on 2018-06-29 22:46:33 |
| Other Links: | branch diff | manifest | tags |
Context
|
2018-06-30
| ||
| 00:19 | [core] ge engine: function for testing token value check-in: 8289f6c423 user: olr tags: core, rg | |
|
2018-06-29
| ||
| 22:46 | [graphspell] tokenizer: add lMorph to <start> and <end> tokens check-in: 2dbf497b04 user: olr tags: graphspell, rg | |
| 22:43 | [fr] conversion: regex rules -> graph rules check-in: e7335f789f user: olr tags: fr, rg | |
Changes
Modified graphspell/tokenizer.py from [a1211301ce] to [2adea5dc85].
| ︙ | |||
50 51 52 53 54 55 56 | 50 51 52 53 54 55 56 57 58 59 60 61 62 | - + - + |
self.sLang = "default"
self.zToken = re.compile( "(?i)" + '|'.join(sRegex for sRegex in _PATTERNS[sLang]) )
def genTokens (self, sText, bStartEndToken=False):
"generator: tokenize <sText>"
i = 0
if bStartEndToken:
|