Accessing parse tree nodes


#1

My goal is to access the phrase branches of the parse tree and extract relevant parts in a given branch in my c++ program. I am particularly interested in knowing the tree structure of the input sentence and thus, plain POS tagging is not enough.

Unfortunately I cannot see a simple way to do that in c++ since parse_tree object keeps its root node as a private and unique member. Also I could not find any relevant features in the provided “visitor” and transformation objects that could do the thing I need. I am hoping to find a solution that does not require me to customize the parse_tree or create a custom visitor class because I would like to have the MeTA library to work out of-the-box, without my specific changes in it.

Im fairly new with the vast landscape of language processing and hence my approach might be a bit backwards. Any suggestions are welcome.

Thanks!


#2

The visitor class is in the library for exactly this reason, though: to allow people to create custom visitors to accomplish whatever task they’d like to on top of a parse_tree output by the parser. In fact, almost all of the things we do with the parse trees are actually implemented as sublcasses of visitor in the library itself (see, for example, our implementation of evalb). So this is the approach I would recommend—the library API directly encourages creating your own custom visitor objects by design.

If you feel your particular task is something that many other people would benefit from, however, then we should probably add it as a built-in visitor in the toolkit so that people don’t need to re-implement it every time—feel free to submit a pull request in this case.

(Towards the very end of this tutorial you can see an example custom vistor. This is using the Python bindings, but the principle is the same in C++ as well.)


#3

Thanks for your suggestions!

That’s the thing - I currently cannot see that my visitor object would be something that many other people would benefit from. I would like to use MeTA as a part of my own robot teleoperation framework (which is in an initial and constantly changing development stage) and therefore having my visitor being built-in is a must in that case (the framework is designed to be flexible and reusable for different setups). The other option would be that I still go for creating a visitor and I could fork the repo but then my fork will likely become outdated. Difficult choice.


#4

Why not just distribute your visitor class with your library? I’m not seeing why it would have to be part of the MeTA codebase in order for you to distribute your framework; you can just place your visitor subclass in its own shared library and link against it as needed.


#5

This obvious idea didn’t even cross my mind. I guess that’s the best way to do it, thanks a lot!