public class DfaBuilder<MATCHRESULT extends java.io.Serializable>
extends java.lang.Object
Given a set of patterns and the desired result of matching each pattern, you can produce a DFA that will simultaneously match a sequence of characters against all of those patterns.
You can also build DFAs for multiple sets of patterns simultaneously. The resulting DFAs will be optimized to share states wherever possible.
When you build a DFA to match a set of patterns, you get a "start state" (a DfaState
) for
that pattern set. Each character of a string can be passed in turn to DfaState.getNextState(char)
,
which will return a new DfaState
.
DfaState.getMatch()
can be called at any time to get the MATCHRESULT (if any) for
the patterns that match the characters processed so far.
A DfaState
can be used with a StringMatcher
to find instances of patterns in strings,
or with other pattern-matching classes.
NOTE that building a Dfa is a complex procedure. You should typically do it only once for each pattern set you want to use. Usually you would do this in a static initializer.
You can provide a cache that can remember and recall built DFAs, which allows you to build DFAs during your build process in various ways, instead of building them at runtime. Or you can use the cache to store built DFAs on the first run of your program so they don't need to be built the next time... But this is usually unnecessary, since building DFAs is more than fast enough to do during runtime initialization.
Constructor and Description |
---|
DfaBuilder()
Create a new DfaBuilder without a
BuilderCache |
DfaBuilder(BuilderCache cache)
Create a new DfaBuilder, with a builder cache to bypass recalculation of pre-built DFAs
|
Modifier and Type | Method and Description |
---|---|
void |
addPattern(Matchable pat,
MATCHRESULT accept) |
DfaState<MATCHRESULT> |
build(DfaAmbiguityResolver<? super MATCHRESULT> ambiguityResolver)
Build DFA for a single language
|
java.util.List<DfaState<MATCHRESULT>> |
build(java.util.List<java.util.Set<MATCHRESULT>> languages,
DfaAmbiguityResolver<? super MATCHRESULT> ambiguityResolver)
Build DFAs for multiple languages simultaneously.
|
DfaState<MATCHRESULT> |
build(java.util.Set<MATCHRESULT> language,
DfaAmbiguityResolver<? super MATCHRESULT> ambiguityResolver)
Build DFA for a single language
|
static <MR> java.util.List<DfaState<MR>> |
buildFromNfa(Nfa<MR> nfa,
int[] nfaStartStates,
DfaAmbiguityResolver<? super MR> ambiguityResolver,
BuilderCache cache)
Build DFAs from a provided NFA
|
DfaState<java.lang.Boolean> |
buildReverseFinder()
Build the reverse finder DFA for all patterns that have been added to this builder
|
DfaState<java.lang.Boolean> |
buildReverseFinder(java.util.Set<MATCHRESULT> language)
Build the reverse finder DFA for a language
|
java.util.List<DfaState<java.lang.Boolean>> |
buildReverseFinders(java.util.List<java.util.Set<MATCHRESULT>> languages)
Build reverse finder DFAs for multiple languages simultaneously.
|
StringSearcher<MATCHRESULT> |
buildStringSearcher(DfaAmbiguityResolver<MATCHRESULT> ambiguityResolver)
Build a
StringSearcher for all the patterns that have been added to this builder |
void |
clear()
Reset this DFA builder by forgetting all the patterns that have been added
|
public DfaBuilder()
BuilderCache
public DfaBuilder(BuilderCache cache)
cache
- The BuilderCache to usepublic void clear()
public void addPattern(Matchable pat, MATCHRESULT accept)
public DfaState<MATCHRESULT> build(DfaAmbiguityResolver<? super MATCHRESULT> ambiguityResolver)
The resulting DFA matches ALL patterns that have been added to this builder
ambiguityResolver
- When patterns for multiple results match the same string, this is called to
combine the multiple results into one. If this is null, then a DfaAmbiguityException
will be thrown in that case.public DfaState<MATCHRESULT> build(java.util.Set<MATCHRESULT> language, DfaAmbiguityResolver<? super MATCHRESULT> ambiguityResolver)
The language is specified as a subset of available MATCHRESULTs, and will include patterns for each result in its set.
language
- set defining the languages to buildambiguityResolver
- When patterns for multiple results match the same string, this is called to
combine the multiple results into one. If this is null, then a DfaAmbiguityException
will be thrown in that case.public java.util.List<DfaState<MATCHRESULT>> build(java.util.List<java.util.Set<MATCHRESULT>> languages, DfaAmbiguityResolver<? super MATCHRESULT> ambiguityResolver)
Each language is specified as a subset of available MATCHRESULTs, and will include patterns for each result in its set.
Languages built simultaneously will be globally minimized and will share as many states as possible.
languages
- sets defining the languages to buildambiguityResolver
- When patterns for multiple results match the same string, this is called to
combine the multiple results into one. If this is null, then a DfaAmbiguityException
will be thrown in that case.public DfaState<java.lang.Boolean> buildReverseFinder()
The "reverse finder DFA" for a set of patterns is applied to a string backwards from the end, and will
produce a Boolean.TRUE
result at every position where a non-empty string match for one of the
patterns starts. At other positions it will produce null result.
For searching through an entire string, using a reverse finder with StringSearcher
is faster than matching
with just the DFA for the language, especially for strings that have no matches.
public DfaState<java.lang.Boolean> buildReverseFinder(java.util.Set<MATCHRESULT> language)
The language is specified as a subset of available MATCHRESULTs, and will include patterns for each result in its set.
The "reverse finder DFA" for a language is applied to a string backwards from the end, and will
produce a Boolean.TRUE
result at every position where a non-empty string in the language starts. At
other positions it will produce null result.
For searching through an entire string, using a reverse finder with StringSearcher
is faster than matching
with just the DFA for the language, especially for strings that have no matches.
language
- set defining the languages to buildpublic java.util.List<DfaState<java.lang.Boolean>> buildReverseFinders(java.util.List<java.util.Set<MATCHRESULT>> languages)
Each language is specified as a subset of available MATCHRESULTs, and will include patterns for each result in its set.
The "reverse finder DFA" for a language is applied to a string backwards from the end, and will
produce a Boolean.TRUE
result at every position where a non-empty string in the language starts. At
other positions it will produce null result.
For searching through an entire string, using a reverse finder with StringSearcher
is faster than matching
with just the DFA for the language, especially for strings that have no matches.
languages
- sets defining the languages to buildpublic StringSearcher<MATCHRESULT> buildStringSearcher(DfaAmbiguityResolver<MATCHRESULT> ambiguityResolver)
StringSearcher
for all the patterns that have been added to this builderambiguityResolver
- When patterns for multiple results match the same string, this is called to
combine the multiple results into one. If this is null, then a DfaAmbiguityException
will be thrown in that case.StringSearcher
for all the patterns in this builderpublic static <MR> java.util.List<DfaState<MR>> buildFromNfa(Nfa<MR> nfa, int[] nfaStartStates, DfaAmbiguityResolver<? super MR> ambiguityResolver, BuilderCache cache)
This method is used when you want to build the NFA yourself instead of letting this class do it.
Languages built simultaneously will be globally minimized and will share as many states as possible.
nfa
- The NFAnfaStartStates
- The return value will include the DFA states corresponding to these NFA states, in the same orderambiguityResolver
- When patterns for multiple results match the same string, this is called to
combine the multiple results into one. If this is null, then a DfaAmbiguityException
will be thrown in that case.cache
- If this cache is non-null, it will be checked for a memoized result for this NFA, and will be populated
with a memoized result when the call is complete.