Package org.carrot2.language
package org.carrot2.language
Lexical component interfaces and implementations.
-
ClassDescriptionDefault implementation of
StopwordFilterDictionary
andLabelFilterDictionary
interfaces.Ephemeral per-request overrides for the defaultLanguageComponents
passed to the algorithm.A tokenizer separating input characters on whitespace, but capable of extracting more complex tokens, such as URLs, e-mail addresses and sentence delimiters.This dictionary implementation is a middle ground between the complexity of regular expressions and sheer speed of plain text matching.A cluster label candidate filter.A set of language-specific components.An adapter converting Snowball programs intoStemmer
interface.Simple lemmatization engine transforming an inflected form of a word to its base form or some other unique token.A stop word filter.A parameter supplying aStopwordFilter
.Splits input characters into tokens representing e.g.Utility methods for working withTokenizer
attributes.