From 24cdbf554b42ad808cd0688a6a604aa3ef355cdd Mon Sep 17 00:00:00 2001 From: Igor Dejanovic <igor.dejanovic@gmail.com> Date: Mon, 29 Dec 2014 12:34:19 +0100 Subject: [PATCH] Update docs. Parser configuration. --- docs/index.rst | 67 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 64 insertions(+), 3 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index 6969fb1..25a23ec 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -233,7 +233,7 @@ can parse the same language. parse_tree = parser.parse(input_expr) -.. warning:: +.. note:: Just remember that using textual PEG syntax imposes a slight overhead since the grammar must be parsed and the parser for your language must be built by semantic analysis of grammar parse tree. If you plan to instantiate your parser once and than use it many times this will not have that much of @@ -432,7 +432,8 @@ list of all rules that was tried at reported location. Parser configuration -------------------- -There are some aspect of parsing that is not controlled by the grammar. +There are some aspect of parsing that can be configured using parser and/or +``ParsingExpression`` parameters. Arpeggio has some sane default behaviour but gives the user possibility to alter it. This section describes various parser parameters. @@ -466,6 +467,34 @@ For example, to prevent a newline to be treated as whitespace you could write: parser = ParserPython(calc, ws='\t\r ') +.. note:: + + These parameters can be used on the ``Sequence`` level so one could write + grammar like this: + + .. code:: python + + def grammar(): return Sequence("one", "two", "three", + skipws=False), "four" + parser = ParserPython(grammar) + + +Keyword handling +~~~~~~~~~~~~~~~~ +``autokwd`` parameter will do a word boundary match for keyword-like matches. This +parameter is disabled by default. + +.. code:: python + + def grammar(): return "one", "two", "three" + + parser = ParserPython(grammar, autokwd=True) + + # If autokwd is enabled this should parse without error. + parser.parse("one two three") + # But this will not parse + parser.parse("onetwothree") + Comment handling ~~~~~~~~~~~~~~~~ @@ -504,7 +533,29 @@ Notice the removal of each non-terminal with single child. .. warning:: Be aware that `semantic analysis <#Semantic analysis - Visitors>`_ operates on nodes of finished parse tree - and therefore on reduced tree some ``visit_xxx`` actions will not get called. + and therefore on reduced tree some ``visit_<rule_name>`` actions will not get called. + +Newline termination for Repetitions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +By default ``Repetition`` parsing expressions (i.e. ``ZeroOrMore`` and ``OneOrMore``) +will obey ``skipws`` and ``ws`` settings but there are situations where repetitions +should not pass the end of the current line. For this feature ``eolterm`` parameter is +introduced which can be set on a repetition and will ensure that it terminates +before entering a new line. + +.. code:: python + + def grammar(): return first, second + def first(): return ZeroOrMore(["a", "b"], eolterm=True) + def second(): return "a" + + # first rule should match only first line + # so that second rule will match "a" on the new line + input = """a a b a b b + a""" + + parser = ParserPython(grammar) + result = parser.parse(input) Semantic analysis - Visitors @@ -621,6 +672,16 @@ Furthermore, child nodes can be filtered by rule name using name lookup. baz_created = children['baz'] +Post-processing in second calls +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Visitor may define method with the ``second_<rule_name>`` name form. If this +method exists it will be called after all parse tree node are processed and it +will be given the results of the ``visitor_<rule_name>`` call. + +This is usually used when some additional post-processing is needed (e.g. +reference resolving). + + Default actions ~~~~~~~~~~~~~~~ For each parse tree node that does not have an appropriate ``visitor_xxx`` method a default action is performed. -- 2.18.0