tokenize_uk package¶
Submodules¶
tokenize_uk.tokenize_uk module¶
Ukrainian tokenization script based on standard tokenization algorithm.
2016 (c) Vsevolod Dyomkin <vseloved@gmail.com>, Dmitry Chaplinsky <chaplinsky.dmitry@gmail.com>
-
tokenize_uk.tokenize_uk.
tokenize_words
(string)[source]¶ Tokenize input text to words.
Parameters: string (str or unicode) – Text to tokenize Returns: words Return type: list of strings
-
tokenize_uk.tokenize_uk.
tokenize_text
(string)[source]¶ Tokenize input text to paragraphs, sentences and words.
Tokenization to paragraphs is done using simple Newline algorithm For sentences and words tokenizers above are used
Parameters: string (str or unicode) – Text to tokenize Returns: text, tokenized into paragraphs, sentences and words Return type: list of list of list of words