Package org.carrot2.clustering.lingo
Class LingoClusteringAlgorithm
java.lang.Object
org.carrot2.attrs.AttrComposite
org.carrot2.clustering.lingo.LingoClusteringAlgorithm
- All Implemented Interfaces:
AcceptingVisitor
,ClusteringAlgorithm
Lingo clustering algorithm. Implementation as described in: Stanisław Osiński, Dawid Weiss: A
Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, May/June, 3
(vol. 20), 2005, pp. 48—54.
-
Field Summary
FieldsModifier and TypeFieldDescriptionConfiguration of the structure and labels of clusters.Determines number of clusters to create.Per-request overrides of language components (dictionaries).Configuration of the size and contents of the term-document matrix.Configuration of the matrix decomposition method to use for clustering.static final String
Configuration of the text preprocessing stage.final AttrString
Query terms used to retrieve documents being clustered.Balance between cluster score and size during cluster sorting.Fields inherited from class org.carrot2.attrs.AttrComposite
attributes
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptioncluster
(Stream<? extends T> docStream, LanguageComponents languageComponents) Performs Lingo clustering of documents.Methods inherited from class org.carrot2.attrs.AttrComposite
accept
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.carrot2.attrs.AcceptingVisitor
accept
Methods inherited from interface org.carrot2.clustering.ClusteringAlgorithm
optionalLanguageComponents, supports
-
Field Details
-
NAME
- See Also:
-
scoreWeight
Balance between cluster score and size during cluster sorting. Value equal to 0.0 will cause Lingo to sort clusters based only on cluster size. Value equal to 1.0 will cause Lingo to sort clusters based only on cluster score. -
desiredClusterCount
Determines number of clusters to create. The larger the value, the more clusters will be created. The number of clusters created by the algorithm will be proportional to the value of this parameter, but may be different. -
preprocessing
Configuration of the text preprocessing stage. -
matrixBuilder
Configuration of the size and contents of the term-document matrix. -
matrixReducer
Configuration of the matrix decomposition method to use for clustering. -
clusterBuilder
Configuration of the structure and labels of clusters. -
dictionaries
Per-request overrides of language components (dictionaries).- Since:
- 4.1.0
-
queryHint
Query terms used to retrieve documents being clustered. The query is used as a hint to avoid creating trivial clusters consisting only of query words.
-
-
Constructor Details
-
LingoClusteringAlgorithm
public LingoClusteringAlgorithm()
-
-
Method Details
-
requiredLanguageComponents
- Specified by:
requiredLanguageComponents
in interfaceClusteringAlgorithm
- Returns:
- A set of classes required to be present in the
LanguageComponents
instance provided for clustering.
-
cluster
public <T extends Document> List<Cluster<T>> cluster(Stream<? extends T> docStream, LanguageComponents languageComponents) Performs Lingo clustering of documents.- Specified by:
cluster
in interfaceClusteringAlgorithm
- Type Parameters:
T
- Any subclass ofDocument
. Clusters of objects of the same type are returned.- Parameters:
docStream
- A stream ofdocuments
for clustering.languageComponents
-LanguageComponents
with a set of suppliers for the required language-specific components.- Returns:
- A list of top-level clusters (clusters can form a hierarchy via
Cluster.getClusters()
.
-