Package opennlp.tools.tokenize
package opennlp.tools.tokenize
Contains classes related to finding token or words in a string. All
tokenizer implement the Tokenizer interface. Currently, there is the
learnable
TokenizerME, the WhitespaceTokenizer and
the SimpleTokenizer which is a character class tokenizer.-
ClassDescriptionA default
TokenContextGeneratorwhich produces events for maxent decisions for tokenization.ADetokenizermerges tokens back to their detokenized representation.This enum contains an operation for every token to merge the tokens together to their detokenized form.TheDetokenizerEvaluatormeasures the performance of the givenDetokenizerwith the provided referencesamples.A rule based detokenizer.A basicTokenizerimplementation which performs tokenization using character classes.A thread-safe version ofTokenizerME.Interface for context generators required forTokenizerME.The interface for tokenizers, which segment a string into its tokens.A cross validator fortokenizers.A marker interface for evaluatingtokenizers.TheTokenizerEvaluatormeasures the performance of the givenTokenizerwith the provided referencesamples.The factory that providesTokenizerdefault implementation and resources.ATokenizerfor converting raw text into separated tokens.TheTokenizerModelis the model used by a learnableTokenizer.ATokenSampleis text with token spans.This class is astream filterwhich reads in string encoded samples and createssamplesout of them.A basicTokenizerimplementation which performs tokenization using white spaces.This stream formatsObjectStreamofsamplesinto whitespace separated token strings.ATokenizerimplementation which performs tokenization using word pieces.