Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.bg |
Analyzer for Bulgarian.
|
org.apache.lucene.analysis.br |
Analyzer for Brazilian Portuguese.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams.
|
org.apache.lucene.analysis.ckb |
Analyzer for Sorani Kurdish.
|
org.apache.lucene.analysis.cn |
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.commongrams |
Construct n-grams for frequently occurring terms and phrases.
|
org.apache.lucene.analysis.compound |
A filter that decomposes compound words you find in many Germanic
languages into the word parts.
|
org.apache.lucene.analysis.core |
Basic, general-purpose analysis components.
|
org.apache.lucene.analysis.cz |
Analyzer for Czech.
|
org.apache.lucene.analysis.de |
Analyzer for German.
|
org.apache.lucene.analysis.el |
Analyzer for Greek.
|
org.apache.lucene.analysis.en |
Analyzer for English.
|
org.apache.lucene.analysis.es |
Analyzer for Spanish.
|
org.apache.lucene.analysis.fa |
Analyzer for Persian.
|
org.apache.lucene.analysis.fi |
Analyzer for Finnish.
|
org.apache.lucene.analysis.fr |
Analyzer for French.
|
org.apache.lucene.analysis.ga |
Analysis for Irish.
|
org.apache.lucene.analysis.gl |
Analyzer for Galician.
|
org.apache.lucene.analysis.hi |
Analyzer for Hindi.
|
org.apache.lucene.analysis.hu |
Analyzer for Hungarian.
|
org.apache.lucene.analysis.hunspell |
Stemming TokenFilter using a Java implementation of the
Hunspell stemming algorithm.
|
org.apache.lucene.analysis.id |
Analyzer for Indonesian.
|
org.apache.lucene.analysis.in |
Analysis components for Indian languages.
|
org.apache.lucene.analysis.it |
Analyzer for Italian.
|
org.apache.lucene.analysis.lv |
Analyzer for Latvian.
|
org.apache.lucene.analysis.miscellaneous |
Miscellaneous TokenStreams
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.nl |
Analyzer for Dutch.
|
org.apache.lucene.analysis.no |
Analyzer for Norwegian.
|
org.apache.lucene.analysis.pattern |
Set of components for pattern-based (regex) analysis.
|
org.apache.lucene.analysis.payloads |
Provides various convenience classes for creating payloads on Tokens.
|
org.apache.lucene.analysis.phonetic |
Analysis components for phonetic search.
|
org.apache.lucene.analysis.position |
Filter for assigning position increments.
|
org.apache.lucene.analysis.pt |
Analyzer for Portuguese.
|
org.apache.lucene.analysis.reverse |
Filter to reverse token text.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.shingle |
Word n-gram filters
|
org.apache.lucene.analysis.sinks |
TeeSinkTokenFilter and implementations
of TeeSinkTokenFilter.SinkFilter that
might be useful. |
org.apache.lucene.analysis.snowball |
TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.standard |
Fast, general-purpose grammar-based tokenizers.
|
org.apache.lucene.analysis.stempel |
Stempel: Algorithmic Stemmer
|
org.apache.lucene.analysis.sv |
Analyzer for Swedish.
|
org.apache.lucene.analysis.synonym |
Analysis components for Synonyms.
|
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.analysis.tr |
Analyzer for Turkish.
|
org.apache.lucene.analysis.util |
Utility functions for text analysis.
|
org.apache.lucene.collation |
Unicode collation support.
|
org.apache.lucene.search.highlight |
The highlight package contains classes to provide "keyword in context" features
typically used to highlight search terms in the text of results pages.
|
org.apache.lucene.search.suggest.analyzing |
Analyzer based autosuggest.
|
Modifier and Type | Class and Description |
---|---|
class |
CachingTokenFilter
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
Modifier and Type | Class and Description |
---|---|
class |
ArabicNormalizationFilter
A
TokenFilter that applies ArabicNormalizer to normalize the orthography. |
class |
ArabicStemFilter
A
TokenFilter that applies ArabicStemmer to stem Arabic words.. |
Modifier and Type | Class and Description |
---|---|
class |
BulgarianStemFilter
A
TokenFilter that applies BulgarianStemmer to stem Bulgarian
words. |
Modifier and Type | Class and Description |
---|---|
class |
BrazilianStemFilter
A
TokenFilter that applies BrazilianStemmer . |
Modifier and Type | Class and Description |
---|---|
class |
CJKBigramFilter
Forms bigrams of CJK terms that are generated from StandardTokenizer
or ICUTokenizer.
|
class |
CJKWidthFilter
A
TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
Modifier and Type | Class and Description |
---|---|
class |
SoraniNormalizationFilter
A
TokenFilter that applies SoraniNormalizer to normalize the
orthography. |
class |
SoraniStemFilter
A
TokenFilter that applies SoraniStemmer to stem Sorani words. |
Modifier and Type | Class and Description |
---|---|
class |
ChineseFilter
Deprecated.
(3.1) Use
StopFilter instead, which has the same functionality.
This filter will be removed in Lucene 5.0 |
Modifier and Type | Class and Description |
---|---|
class |
WordTokenFilter
Deprecated.
Use
HMMChineseTokenizer instead. |
Modifier and Type | Method and Description |
---|---|
TokenFilter |
SmartChineseWordTokenFilterFactory.create(TokenStream input)
Deprecated.
|
Modifier and Type | Class and Description |
---|---|
class |
CommonGramsFilter
Construct bigrams for frequently occurring terms while indexing.
|
class |
CommonGramsQueryFilter
Wrap a CommonGramsFilter optimizing phrase queries by only returning single
words when they are not a member of a bigram.
|
Modifier and Type | Method and Description |
---|---|
TokenFilter |
CommonGramsQueryFilterFactory.create(TokenStream input)
Create a CommonGramsFilter and wrap it with a CommonGramsQueryFilter
|
TokenFilter |
CommonGramsFilterFactory.create(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
CompoundWordTokenFilterBase
Base class for decomposition token filters.
|
class |
DictionaryCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
class |
HyphenationCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
Modifier and Type | Class and Description |
---|---|
class |
LowerCaseFilter
Normalizes token text to lower case.
|
class |
StopFilter
Removes stop words from a token stream.
|
class |
TypeTokenFilter
Removes tokens whose types appear in a set of blocked types from a token stream.
|
class |
UpperCaseFilter
Normalizes token text to UPPER CASE.
|
Modifier and Type | Class and Description |
---|---|
class |
CzechStemFilter
A
TokenFilter that applies CzechStemmer to stem Czech words. |
Modifier and Type | Class and Description |
---|---|
class |
GermanLightStemFilter
A
TokenFilter that applies GermanLightStemmer to stem German
words. |
class |
GermanMinimalStemFilter
A
TokenFilter that applies GermanMinimalStemmer to stem German
words. |
class |
GermanNormalizationFilter
Normalizes German characters according to the heuristics
of the
German2 snowball algorithm.
|
class |
GermanStemFilter
A
TokenFilter that stems German words. |
Modifier and Type | Class and Description |
---|---|
class |
GreekLowerCaseFilter
Normalizes token text to lower case, removes some Greek diacritics,
and standardizes final sigma to sigma.
|
class |
GreekStemFilter
A
TokenFilter that applies GreekStemmer to stem Greek
words. |
Modifier and Type | Class and Description |
---|---|
class |
EnglishMinimalStemFilter
A
TokenFilter that applies EnglishMinimalStemmer to stem
English words. |
class |
EnglishPossessiveFilter
TokenFilter that removes possessives (trailing 's) from words.
|
class |
KStemFilter
A high-performance kstem filter for english.
|
class |
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.
|
Modifier and Type | Method and Description |
---|---|
TokenFilter |
KStemFilterFactory.create(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
SpanishLightStemFilter
A
TokenFilter that applies SpanishLightStemmer to stem Spanish
words. |
Modifier and Type | Class and Description |
---|---|
class |
PersianNormalizationFilter
A
TokenFilter that applies PersianNormalizer to normalize the
orthography. |
Modifier and Type | Class and Description |
---|---|
class |
FinnishLightStemFilter
A
TokenFilter that applies FinnishLightStemmer to stem Finnish
words. |
Modifier and Type | Class and Description |
---|---|
class |
FrenchLightStemFilter
A
TokenFilter that applies FrenchLightStemmer to stem French
words. |
class |
FrenchMinimalStemFilter
A
TokenFilter that applies FrenchMinimalStemmer to stem French
words. |
class |
FrenchStemFilter
Deprecated.
(3.1) Use
SnowballFilter with
FrenchStemmer instead, which has the
same functionality. This filter will be removed in Lucene 5.0 |
Modifier and Type | Class and Description |
---|---|
class |
IrishLowerCaseFilter
Normalises token text to lower case, handling t-prothesis
and n-eclipsis (i.e., that 'nAthair' should become 'n-athair')
|
Modifier and Type | Class and Description |
---|---|
class |
GalicianMinimalStemFilter
A
TokenFilter that applies GalicianMinimalStemmer to stem
Galician words. |
class |
GalicianStemFilter
A
TokenFilter that applies GalicianStemmer to stem
Galician words. |
Modifier and Type | Class and Description |
---|---|
class |
HindiNormalizationFilter
A
TokenFilter that applies HindiNormalizer to normalize the
orthography. |
class |
HindiStemFilter
A
TokenFilter that applies HindiStemmer to stem Hindi words. |
Modifier and Type | Class and Description |
---|---|
class |
HungarianLightStemFilter
A
TokenFilter that applies HungarianLightStemmer to stem
Hungarian words. |
Modifier and Type | Class and Description |
---|---|
class |
HunspellStemFilter
TokenFilter that uses hunspell affix rules and words to stem tokens.
|
Modifier and Type | Class and Description |
---|---|
class |
IndonesianStemFilter
A
TokenFilter that applies IndonesianStemmer to stem Indonesian words. |
Modifier and Type | Class and Description |
---|---|
class |
IndicNormalizationFilter
A
TokenFilter that applies IndicNormalizer to normalize text
in Indian Languages. |
Modifier and Type | Class and Description |
---|---|
class |
ItalianLightStemFilter
A
TokenFilter that applies ItalianLightStemmer to stem Italian
words. |
Modifier and Type | Class and Description |
---|---|
class |
LatvianStemFilter
A
TokenFilter that applies LatvianStemmer to stem Latvian
words. |
Modifier and Type | Class and Description |
---|---|
class |
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CapitalizationFilter
A filter to apply normal capitalization rules to Tokens.
|
class |
CodepointCountFilter
Removes words that are too long or too short from the stream.
|
class |
HyphenatedWordsFilter
When the plain text is extracted from documents, we will often have many words hyphenated and broken into
two lines.
|
class |
KeepWordFilter
A TokenFilter that only keeps tokens with text contained in the
required words.
|
class |
KeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
KeywordRepeatFilter
This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once with
KeywordAttribute.setKeyword(boolean) set to true and once set to false . |
class |
LengthFilter
Removes words that are too long or too short from the stream.
|
class |
LimitTokenCountFilter
This TokenFilter limits the number of tokens while indexing.
|
class |
LimitTokenPositionFilter
This TokenFilter limits its emitted tokens to those with positions that
are not greater than the configured limit.
|
class |
Lucene47WordDelimiterFilter
Deprecated.
|
class |
PatternKeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
RemoveDuplicatesTokenFilter
A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
|
class |
ScandinavianFoldingFilter
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
|
class |
ScandinavianNormalizationFilter
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ
and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
|
class |
SetKeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
StemmerOverrideFilter
Provides the ability to override any
KeywordAttribute aware stemmer
with custom dictionary-based stemming. |
class |
TrimFilter
Trims leading and trailing whitespace from Tokens in the stream.
|
class |
TruncateTokenFilter
A token filter for truncating the terms into a specific length.
|
class |
WordDelimiterFilter
Splits words into subwords and performs optional transformations on subword
groups.
|
Modifier and Type | Method and Description |
---|---|
TokenFilter |
WordDelimiterFilterFactory.create(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
EdgeNGramTokenFilter
Tokenizes the given token into n-grams of given size(s).
|
class |
NGramTokenFilter
Tokenizes the input into n-grams of the given size(s).
|
Modifier and Type | Class and Description |
---|---|
class |
DutchStemFilter
Deprecated.
(3.1) Use
SnowballFilter with
DutchStemmer instead, which has the
same functionality. This filter will be removed in Lucene 5.0 |
Modifier and Type | Class and Description |
---|---|
class |
NorwegianLightStemFilter
A
TokenFilter that applies NorwegianLightStemmer to stem Norwegian
words. |
class |
NorwegianMinimalStemFilter
A
TokenFilter that applies NorwegianMinimalStemmer to stem Norwegian
words. |
Modifier and Type | Class and Description |
---|---|
class |
PatternCaptureGroupTokenFilter
CaptureGroup uses Java regexes to emit multiple tokens - one for each capture
group in one or more patterns.
|
class |
PatternReplaceFilter
A TokenFilter which applies a Pattern to each token in the stream,
replacing match occurances with the specified replacement string.
|
Modifier and Type | Class and Description |
---|---|
class |
DelimitedPayloadTokenFilter
Characters before the delimiter are the "token", those after are the payload.
|
class |
NumericPayloadTokenFilter
Assigns a payload to a token based on the
Token.type() |
class |
TokenOffsetPayloadTokenFilter
Adds the
OffsetAttribute.startOffset()
and OffsetAttribute.endOffset()
First 4 bytes are the start |
class |
TypeAsPayloadTokenFilter
Makes the
Token.type() a payload. |
Modifier and Type | Class and Description |
---|---|
class |
BeiderMorseFilter
TokenFilter for Beider-Morse phonetic encoding.
|
class |
DoubleMetaphoneFilter
Filter for DoubleMetaphone (supporting secondary codes)
|
class |
PhoneticFilter
Create tokens for phonetic matches.
|
Modifier and Type | Class and Description |
---|---|
class |
PositionFilter
Deprecated.
(4.4) PositionFilter makes
TokenStream graphs inconsistent
which can cause highlighting bugs. Its main use-case being to make
QueryParser
generate boolean queries instead of phrase queries, it is now advised to use
QueryParser.setAutoGeneratePhraseQueries(boolean)
(for simple cases) or to override QueryParser.newFieldQuery . |
Modifier and Type | Class and Description |
---|---|
class |
PortugueseLightStemFilter
A
TokenFilter that applies PortugueseLightStemmer to stem
Portuguese words. |
class |
PortugueseMinimalStemFilter
A
TokenFilter that applies PortugueseMinimalStemmer to stem
Portuguese words. |
class |
PortugueseStemFilter
A
TokenFilter that applies PortugueseStemmer to stem
Portuguese words. |
Modifier and Type | Class and Description |
---|---|
class |
ReverseStringFilter
Reverse token string, for example "country" => "yrtnuoc".
|
Modifier and Type | Class and Description |
---|---|
class |
RussianLightStemFilter
A
TokenFilter that applies RussianLightStemmer to stem Russian
words. |
Modifier and Type | Class and Description |
---|---|
class |
ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream.
|
Modifier and Type | Class and Description |
---|---|
class |
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states
that have already been analyzed.
|
Modifier and Type | Class and Description |
---|---|
class |
SnowballFilter
A filter that stems words using a Snowball-generated stemmer.
|
Modifier and Type | Method and Description |
---|---|
TokenFilter |
SnowballPorterFilterFactory.create(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
ClassicFilter
Normalizes tokens extracted with
ClassicTokenizer . |
class |
StandardFilter
Normalizes tokens extracted with
StandardTokenizer . |
Modifier and Type | Method and Description |
---|---|
TokenFilter |
ClassicFilterFactory.create(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
StempelFilter
Transforms the token stream as per the stemming algorithm.
|
Modifier and Type | Class and Description |
---|---|
class |
SwedishLightStemFilter
A
TokenFilter that applies SwedishLightStemmer to stem Swedish
words. |
Modifier and Type | Class and Description |
---|---|
class |
SynonymFilter
Matches single or multi word synonyms in a token stream.
|
Modifier and Type | Class and Description |
---|---|
class |
ThaiWordFilter
Deprecated.
Use
ThaiTokenizer instead. |
Modifier and Type | Class and Description |
---|---|
class |
ApostropheFilter
Strips all characters after an apostrophe (including the apostrophe itself).
|
class |
TurkishLowerCaseFilter
Normalizes Turkish token text to lower case.
|
Modifier and Type | Class and Description |
---|---|
class |
ElisionFilter
Removes elisions from a
TokenStream . |
class |
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.
|
Modifier and Type | Class and Description |
---|---|
class |
CollationKeyFilter
Deprecated.
Use
CollationAttributeFactory instead, which encodes
terms directly as bytes. This filter will be removed in Lucene 5.0 |
Modifier and Type | Class and Description |
---|---|
class |
OffsetLimitTokenFilter
This TokenFilter limits the number of tokens while indexing by adding up the
current offset.
|
Modifier and Type | Class and Description |
---|---|
class |
SuggestStopFilter
Like
StopFilter except it will not remove the
last token if that token was not followed by some token
separator. |
Copyright © 2000-2015 The Apache Software Foundation. All Rights Reserved.