Copyright | (c) 2020 Sam May |
---|---|
License | MPL-2.0 |
Maintainer | ag.eitilt@gmail.com |
Stability | experimental |
Portability | portable |
Safe Haskell | Safe-Inferred |
Language | Haskell98 |
Web.Willow.Common.Parser
Description
The existing parsing libraries are wonderful, but backtracking parsers have a bad habit of being strict in their output; sure, you might be able to operate over Data.ByteString.Lazy, but they all expect to consume their entire input before handing you their result. Data.Attoparsec's continuations fully lean into that---even though you don't have to provide all the input in one block, you can't get a value before closing it out. Text.Megaparsec does provide a reentrant form in runParser', but it also comes with comparatively heavyweight error and pretty-printing features.
For complicated formats, those all can indeed be desirable. However, the HTML algorithms have been optimized for minimal lookahead and certainly no output revocation---once something is shipped out, it's not going to be called back. Not taking advantage of that by using a lazy output type means that parsing would always be subject to the whims of slow or unreliable network connections. Moreover, the entire complexity of the parsing algorithm is built around never reaching a fatal failure condition, so error handling and especially recovery are unnecessary overhead.
And so, a custom parsing framework must be defined.
Synopsis
- type Parser stream = ParserT stream Maybe
- runParser :: Parser stream out -> stream -> Maybe (out, stream)
- newtype ParserT stream gather out = ParserT {
- runParserT :: stream -> gather (out, stream)
- type StateParser state stream = StateT state (Parser stream)
- class (Alternative m, Monad m, Stream stream token, Monoid stream) => MonadParser m stream token | m -> stream where
- end :: MonadParser trans stream token => trans ()
- satisfying :: MonadParser trans stream token => (out -> Bool) -> out -> trans out
- token :: (MonadParser trans stream token, Eq token) => token -> trans token
- chunk :: (MonadParser trans stream token, Eq stream) => stream -> trans stream
- class Monoid stream => Stream stream token | stream -> token where
Concrete types
runParser :: Parser stream out -> stream -> Maybe (out, stream) #
Set the constructed parser loose on a given input. Returns both the
resulting value and the remaining contents of the Stream
.
newtype ParserT stream gather out #
Encapsulation of an operation for transforming the head of a Stream
into
some other value. Standard usage, with similar behaviour to other
Text.Parsec-derived parsers, ("accept the first which matches") may be
obtained by instantiating gather
with Maybe
, or non-deterministic
parsing ("accept any of these") through []
.
Notably, this implementation is designed to allow laziness in both input and
output. For the best usage, therefore, consume as little input at a time as
possible, and so call runParser
often).
As part of this simplification, all Text.Parsec-style integrated state
(use StateT
) and Text.Megaparsec-style error
pretty-printing (build your position tracking into the stream
, and/or wrap
the output in Either
) has been stripped out.
Constructors
ParserT | |
Fields
|
Instances
Monad gather => MonadState stream (ParserT stream gather) # | Operates over the input that has not yet been processed. Note that this therefore provides the means for forcing an early end-of-stream:
|
Monad gather => MonadReader stream (ParserT stream gather) # | Performs an action on the current input without consuming it; i.e.
|
MonadError err gather => MonadError err (ParserT stream gather) # |
|
Defined in Web.Willow.Common.Parser Methods throwError :: err -> ParserT stream gather a # catchError :: ParserT stream gather a -> (err -> ParserT stream gather a) -> ParserT stream gather a # | |
MonadTrans (ParserT stream) # | |
Defined in Web.Willow.Common.Parser | |
Monad gather => Monad (ParserT stream gather) # | ( |
Functor gather => Functor (ParserT stream gather) # | |
Monad gather => MonadFix (ParserT stream gather) # | |
Defined in Web.Willow.Common.Parser | |
MonadFail gather => MonadFail (ParserT stream gather) # | |
Defined in Web.Willow.Common.Parser | |
Monad gather => Applicative (ParserT stream gather) # |
|
Defined in Web.Willow.Common.Parser Methods pure :: a -> ParserT stream gather a # (<*>) :: ParserT stream gather (a -> b) -> ParserT stream gather a -> ParserT stream gather b # liftA2 :: (a -> b -> c) -> ParserT stream gather a -> ParserT stream gather b -> ParserT stream gather c # (*>) :: ParserT stream gather a -> ParserT stream gather b -> ParserT stream gather b # (<*) :: ParserT stream gather a -> ParserT stream gather b -> ParserT stream gather a # | |
(Alternative gather, Monad gather) => Alternative (ParserT stream gather) # |
|
(Alternative gather, Monad gather) => MonadPlus (ParserT stream gather) # |
|
MonadIO gather => MonadIO (ParserT stream gather) # | |
Defined in Web.Willow.Common.Parser | |
MonadCont gather => MonadCont (ParserT stream gather) # | The parser the inner function generates is run over the remaining input after the argument function runs (thus generating the inner function). |
(Alternative gather, Monad gather, Stream stream token, Monoid stream) => MonadParser (ParserT stream gather) stream token # | |
Defined in Web.Willow.Common.Parser Methods lookAhead :: ParserT stream gather out -> ParserT stream gather out # avoiding :: ParserT stream gather out -> ParserT stream gather () # next :: ParserT stream gather token # nextChunk :: Word -> ParserT stream gather stream # push :: token -> ParserT stream gather () # | |
(Monad gather, Semigroup out) => Semigroup (ParserT stream gather out) # | ( |
(Monad gather, Monoid out) => Monoid (ParserT stream gather out) # |
|
type StateParser state stream = StateT state (Parser stream) #
Purely a convenience of the package rather than the module, the state machines described by the HTML standard all involve some degree of persistence, and so are built over a deeper monad stack. This could easily one of the most common transformers to add, anyway, no matter what input is being parsed.
Parsing combinators
class (Alternative m, Monad m, Stream stream token, Monoid stream) => MonadParser m stream token | m -> stream where #
Generalize the transformation of an input Stream
into a more meaningful
value. This class provides the basic building blocks from which more
expressive such parsers may be constructed.
See also the description of ParserT
for some of the design decisions.
Methods
Runs the argument parser on the current input, without consuming any
of it; these are identical semantics to saving and restoring the input
after running the computation, assuming the MonadState
instance
runs over the input stream (see ParserT
):
input <-get
a <- parserput
input
a <- lookAhead
parser
Succeeds if and only if the argument parser fails (the input is not consumed).
Retrieve the next token in the stream, whatever it may be. Identical
to
in all but type.uncons
nextChunk :: Word -> m stream #
Retrieve the next several tokens in the stream. Identical to
count
(with a safer index type) in the case
that gather
is a list [token]
.
If fewer tokens are in the input stream than asked for, returns what does remain in the input stream.
Prepend a token to the input stream to be processed next. Identical
to operating on the stream directly through MonadState
, if that
instance also exists.
stream <-get
put
$cons
tok stream
push
tok
Concatenate the given sequence with the existing input, processing the
argument before the older stream
.
Instances
MonadParser trans stream token => MonadParser (MaybeT trans) stream token # | |
Defined in Web.Willow.Common.Parser | |
MonadParser trans stream token => MonadParser (IdentityT trans) stream token # | |
Defined in Web.Willow.Common.Parser | |
(MonadParser trans stream token, Monoid except) => MonadParser (ExceptT except trans) stream token # | |
Defined in Web.Willow.Common.Parser Methods lookAhead :: ExceptT except trans out -> ExceptT except trans out # avoiding :: ExceptT except trans out -> ExceptT except trans () # next :: ExceptT except trans token # nextChunk :: Word -> ExceptT except trans stream # push :: token -> ExceptT except trans () # | |
MonadParser trans stream token => MonadParser (ReaderT reader trans) stream token # | |
Defined in Web.Willow.Common.Parser Methods lookAhead :: ReaderT reader trans out -> ReaderT reader trans out # avoiding :: ReaderT reader trans out -> ReaderT reader trans () # next :: ReaderT reader trans token # nextChunk :: Word -> ReaderT reader trans stream # push :: token -> ReaderT reader trans () # | |
(MonadParser trans stream token, MonadPlus trans) => MonadParser (StateT state trans) stream token # | |
Defined in Web.Willow.Common.Parser | |
(MonadParser trans stream token, MonadPlus trans) => MonadParser (StateT state trans) stream token # | |
Defined in Web.Willow.Common.Parser | |
(MonadParser trans stream token, Monoid writer) => MonadParser (WriterT writer trans) stream token # | |
Defined in Web.Willow.Common.Parser Methods lookAhead :: WriterT writer trans out -> WriterT writer trans out # avoiding :: WriterT writer trans out -> WriterT writer trans () # next :: WriterT writer trans token # nextChunk :: Word -> WriterT writer trans stream # push :: token -> WriterT writer trans () # | |
(MonadParser trans stream token, Monoid writer) => MonadParser (WriterT writer trans) stream token # | |
Defined in Web.Willow.Common.Parser Methods lookAhead :: WriterT writer trans out -> WriterT writer trans out # avoiding :: WriterT writer trans out -> WriterT writer trans () # next :: WriterT writer trans token # nextChunk :: Word -> WriterT writer trans stream # push :: token -> WriterT writer trans () # | |
(MonadParser trans stream token, Monoid accum, MonadPlus trans) => MonadParser (AccumT accum trans) stream token # | |
Defined in Web.Willow.Common.Parser | |
(Alternative gather, Monad gather, Stream stream token, Monoid stream) => MonadParser (ParserT stream gather) stream token # | |
Defined in Web.Willow.Common.Parser Methods lookAhead :: ParserT stream gather out -> ParserT stream gather out # avoiding :: ParserT stream gather out -> ParserT stream gather () # next :: ParserT stream gather token # nextChunk :: Word -> ParserT stream gather stream # push :: token -> ParserT stream gather () # | |
(MonadParser trans stream token, Monoid writer, MonadPlus trans) => MonadParser (RWST reader writer state trans) stream token # | |
Defined in Web.Willow.Common.Parser Methods lookAhead :: RWST reader writer state trans out -> RWST reader writer state trans out # avoiding :: RWST reader writer state trans out -> RWST reader writer state trans () # next :: RWST reader writer state trans token # nextChunk :: Word -> RWST reader writer state trans stream # push :: token -> RWST reader writer state trans () # | |
(MonadParser trans stream token, Monoid writer, MonadPlus trans) => MonadParser (RWST reader writer state trans) stream token # | |
Defined in Web.Willow.Common.Parser Methods lookAhead :: RWST reader writer state trans out -> RWST reader writer state trans out # avoiding :: RWST reader writer state trans out -> RWST reader writer state trans () # next :: RWST reader writer state trans token # nextChunk :: Word -> RWST reader writer state trans stream # push :: token -> RWST reader writer state trans () # |
end :: MonadParser trans stream token => trans () #
Succeeds if and only if the input is empty.
satisfying :: MonadParser trans stream token => (out -> Bool) -> out -> trans out #
Succeeds if and only if the value parsed by the argument parser satisfies the predicate. No further input is consumed.
token :: (MonadParser trans stream token, Eq token) => token -> trans token #
Expect a specific token from the Stream
, and fail if a different
token is found instead. Identical to running satisfying
with equality
in the (by far most likely) case that gather
is a Monad
in addition
to an Alternative
:
tok <-next
>>=
satisfying
(==
desired)
tok <- token
desired
chunk :: (MonadParser trans stream token, Eq stream) => stream -> trans stream #
Expect a specific sequence of tokens from the Stream
, and fail if
anything else is found instead, or if the Stream
doesn't have enough
characters before its end. Identical to running satisfying
with equality
over nextChunk
in the case that stream
is an Eq
(which all provided
instances are) and can easily provide a length
(which they do, unless the
sequence to test against also needs to be lazy).
stream <-nextChunk
(length
desired)>>=
satisfying
(==
desired)
stream <- chunk
desired
Supporting typeclasses
class Monoid stream => Stream stream token | stream -> token where #
A sequence of values which may be processed via a MonadParser
. This
class is essentially just a unification of the various list-like interfaces
(
, etc.) as Haskell's abstractions are slightly lacking
in that area.uncons
== head
>>>
Just (tok, str) == uncons (cons tok str)
True
Methods
cons :: token -> stream -> stream #
Prepend a token to the stream for proximate processing, before everything already in it.
consChunk :: stream -> stream -> stream #
As cons
, but append multiple tokens at once.
uncons :: stream -> Maybe (token, stream) #
Retrieve the next token from the stream.
This should only return Nothing
if the stream is actually empty---if
the next value is not available yet due to slow IO or other computation,
uncons
waits until it is.
unconsChunk :: Word -> stream -> (stream, stream) #
Retrieve the next several tokens from the stream.
If fewer tokens are in the input stream than asked for, the left side of
the return value is the (shorter than requested) entire input stream and
the right is mempty
.
The number of tokens remaining in the stream.
Instances
Stream ByteString Word8 # | |
Defined in Web.Willow.Common.Parser Methods cons :: Word8 -> ByteString -> ByteString # consChunk :: ByteString -> ByteString -> ByteString # uncons :: ByteString -> Maybe (Word8, ByteString) # unconsChunk :: Word -> ByteString -> (ByteString, ByteString) # chunkLen :: ByteString -> Word # | |
Stream ByteString Word8 # | |
Defined in Web.Willow.Common.Parser Methods cons :: Word8 -> ByteString -> ByteString # consChunk :: ByteString -> ByteString -> ByteString # uncons :: ByteString -> Maybe (Word8, ByteString) # unconsChunk :: Word -> ByteString -> (ByteString, ByteString) # chunkLen :: ByteString -> Word # | |
Stream Text Char # | |
Stream Text Char # | |
Stream [token] token # | |