Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in
Toggle navigation
A
arpeggio-gm
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Commits
Issue Boards
Open sidebar
backend
arpeggio-gm
Commits
5261b981
Commit
5261b981
authored
Oct 22, 2014
by
Igor Dejanovic
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Docs update. Parse tree navigation and some fixes.
parent
75e2027a
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
80 additions
and
10 deletions
+80
-10
index.rst
docs/index.rst
+80
-10
No files found.
docs/index.rst
View file @
5261b981
...
...
@@ -29,12 +29,12 @@ Example grammar in PEG notation:
.. code::
first = 'f
irst
' second+ EOF
second = '
a' / 'b
'
first = 'f
oo
' second+ EOF
second = '
bar' / 'baz
'
In this example ``first`` is the root rule. This rule will match a literal string ``f
irst
`` followed
In this example ``first`` is the root rule. This rule will match a literal string ``f
oo
`` followed
by one or more ``second`` rule (this is a rule reference) followed by end of input (``EOF``).
``second`` rule is ordered choice and will match either ``
a`` or ``b
`` in that order.
``second`` rule is ordered choice and will match either ``
bar`` or ``baz
`` in that order.
During parsing each successfully matched rule will create a parse tree node (see `Parse tree`_).
...
...
@@ -178,6 +178,23 @@ An example of the ``calc`` grammar given in PEG syntax (``arpeggio.cleanpeg``):
expression = term (("+" / "-") term)*
calc = expression+ EOF
Each grammar rule is given as an assignment where the lhs is the rule name (e.g. ``number``) and the
rhs is a PEG expression. Literal string matches are given as strings (e.g. ``"+"``) while regex matches
are given as strings with prefix ``r`` (e.g. ``r'\d*\.\d*|\d+'``).
Sequence is a space separated list of expressions. Ordered choice is a list of expression separated with
``/`` (e.g. ``"+" / "-"``). Zero or more expression is specified by ``*`` operator
(e.g. ``(( "*" / "/" ) factor)*``). One of more is specified by ``+`` operator (e.g. ``expression+``).
``And`` and ``Not`` predicates are also supported. ``And`` predicate is specified by ``&`` operator
(e.g. ``&expression`` - not used in the above grammar). ``Not`` predicate is specified by ``!`` operator
(e.g. ``!expression`` - not used in the above grammar).
In the rhs a rule reference is a name of another rule. Parser will try to match another rule at that
location.
Special rule ``EOF`` will match end of input string.
Creating a parser using PEG syntax is done by the class ``ParserPEG`` from the ``arpeggio.peg`` or
``arpeggio.cleanpeg`` modules.
...
...
@@ -226,13 +243,23 @@ The leaf nodes are terminals and they are matched by the string match or regex m
In the square brackets is the location in the input stream where the terminal/non-terminal is recognized.
Each parse tree node has the following attributes:
- **rule** - the parsing expression that created this node.
- **rule_name** - the name of the rule if it was the root rule or empty string otherwise.
- **position** - the position in the input stream where this node was recognized.
Terminal nodes
~~~~~~~~~~~~~~
Terminals in Arpeggio are created by the specializations of the ``Match`` class:
Terminals in Arpeggio are created by the specializations of the parsing expression ``Match`` class.
There are two specialization of ``Match`` class:
- ``StrMatch`` if the literal string is matched from the input or
- ``RegExMatch`` if a regular expression is used to match input.
To get the matched string from the terminal object just convert it to string
(e.g. ``str(t)`` where ``t`` is of ``Terminal`` type)
Non-terminal nodes
~~~~~~~~~~~~~~~~~~
Non-terminal nodes are non-leaf nodes of the parse tree. They are created by PEG grammar rules.
...
...
@@ -241,12 +268,55 @@ Children of non-terminals can be other non-terminals or terminals.
For example, nodes with the labels ``expression``, ``factor`` and ``term`` from the above parse
tree are non-terminal nodes created by the rules with the same names.
Parse tree navigation
~~~~~~~~~~~~~~~~~~~~~
Usually we want to transform parse tree to some more usable form or to extract some data from it.
Parse tree can be navigated using following approaches:
``NonTerminal`` inherits from ``list``. The elements of ``NonTerminal`` are its children nodes.
So, you can use index access:
.. code:: python
child = pt_node[2]
Or iteration:
.. code:: python
for child in pt_node:
...
Additionally, you can access children by the child rule name:
For example:
.. code:: python
TODO: Finish this section
# Grammar
def foo(): return "a", bar, "b", baz, "c", ZeroOrMore(bar)
def bar(): return "bar"
def baz(): return "baz"
# Parsing
parser = ParserPython(foo)
result = parser.parse("a bar b baz c bar bar bar")
# Accessing parse tree nodes. All asserts will pass.
# Index access
assert result[1].rule_name == 'bar'
# Access by rule name
assert result.bar.rule_name == 'bar'
print(len(result))
assert len(result) == 8
# There is 4 bar matched from result (at the beginning and from ZeroOrMore)
# Dot access collect all NTs from the given path
assert len(result.bar) == 4
# You could call dot access recursively, e.g. result.bar.baz if the
# rule bar called baz. In that case all bars would be collected from
# the root and for each bar all baz will be collected.
# Verify position
# First bar is at position 2 and second is at position 14
assert result.bar[0].position == 2
assert result.bar[1].position == 14
Grammar debugging
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment