Page 1 of 1

Using Ctags vs. complete C++ parser

Posted: Tue Sep 23, 2008 2:53 pm
by sreinst1
Hi,

The CodeLite Yacc parser was written in order to complement the information provided by Ctags. Were the following options ever considered (or maybe tried):

1. Extending Ctags itself to provide the missing information. For example, Ctags can now report also local variables (if the appropriate command-option is specified),but it does not report types of variables and functions.
2. Using a C++ parser to provide the entire information, both the part provided by Ctags and the complementary information about types and expression contexts.
There are several such open source parsers like the pair Elkhound & Elsa (http://www.cs.berkeley.edu/~smcpeak/elkhound/), the Harmonia project (http://harmonia.cs.berkeley.edu/harmonia/index.html), a javacc parser, etc.

Both options have the advantage that all the required information may be saved in the database and kept up to date all the time. Option 1 would also be a great contribution for other Ctags users, while option 2 would allow a more accurate implementation of the context-sensitive features.
I'm interested to know, because I was thinking myself of those directions as well as the one taken by CodeLite.

Shlomy

Re: Using Ctags vs. complete C++ parser

Posted: Tue Sep 23, 2008 3:05 pm
by eranif
sreinst1 wrote:1. Extending Ctags itself to provide the missing information. For example, Ctags can now report also local variables (if the appropriate command-option is specified),but it does not report types of variables and functions.
I am extending ctags (and also submitted patches) for example: bug / fixes made to namespace
CodeLite's version also extended the -I functionality (which allows CL to overcome some macros/defines errors) and some other, ofc, not all were accepted by ctags developers, so I "forked" my version - the sources are available in Codelite source tree
sreinst1 wrote:2. Using a C++ parser to provide the entire information, both the part provided by Ctags and the complementary information about types and expression contexts.
There are several such open source parsers like the pair Elkhound & Elsa (http://www.cs.berkeley.edu/~smcpeak/elkhound/), the Harmonia project (http://harmonia.cs.berkeley.edu/harmonia/index.html), a javacc parser, etc.
I did a lot of research before I decided heading that way. Else/Elkhound I even tried to implement an entire C++ grammar - but this is a tedious task.
One major drawback of Elsa or any full C++ parser:
- They are very slow
- They can not recover from errors properly - and writing an IDE, this is a must. All C++ parsers out there, are relying on the fact that the code is valid. But for an IDE you can not assume this, and must continue parsing. (You should read the history of Eclipse/CDT indexer history...)

Writing the parser in YACC & FLEX is a pretty simply task. And also (and most important) very maintainable. Since no code is involved only well knows grammar syntax

I also tried exploring xml-gcc http://www.gccxml.org which was too slow.
GOLD parser and others I cant remember ...

I finally decided that heading the yacc/ctags combination is fastest/reliable way to go.

Eran

Re: Using Ctags vs. complete C++ parser

Posted: Tue Sep 23, 2008 4:24 pm
by sreinst1
Thanks, this information is very valuable for me, it will save me a lot of redundant work.
I noticed that someone wrote a complete C++ grammar for ANTLR a while ago, which was also supposed to handle syntax errors nicely (it contained lots of grammar rules just for handling errors). Being just a plain grammar, with very few actions, I imagine that it would be quick enough. The down side is, it was updated to work with some old version of ANTLR, which is probably quite far from the current version, and the current version does not accept it.

So, I guess I will follow your way. Thanks! You had very good ideas. I might just take them a little bit forward and include support for multiple languages (btw, I noticed CodeLite does provide some support for other languages, e.g. "View as -> [language]".

Shlomy

Re: Using Ctags vs. complete C++ parser

Posted: Mon Oct 20, 2008 2:44 pm
by sreinst1
eranif wrote:I did a lot of research before I decided heading that way. Else/Elkhound I even tried to implement an entire C++ grammar - but this is a tedious task.
One major drawback of Elsa or any full C++ parser:
- They are very slow
- They can not recover from errors properly - and writing an IDE, this is a must. All C++ parsers out there, are relying on the fact that the code is valid. But for an IDE you can not assume this, and must continue parsing. (You should read the history of Eclipse/CDT indexer history...)

Writing the parser in YACC & FLEX is a pretty simply task. And also (and most important) very maintainable. Since no code is involved only well knows grammar syntax

I also tried exploring xml-gcc http://www.gccxml.org which was too slow.
GOLD parser and others I cant remember ...

I finally decided that heading the yacc/ctags combination is fastest/reliable way to go.

Eran
Have you tried SourceNavigator? (http://sourcenav.sourceforge.net)
They have a database API you can use to run queries.
Shlomy

Re: Using Ctags vs. complete C++ parser

Posted: Sun Oct 26, 2008 3:32 pm
by count0
Regarding C++ parser, don't forget Boost.Wave:
http://www.boost.org/doc/libs/1_36_0/li ... index.html

Re: Using Ctags vs. complete C++ parser

Posted: Sun Oct 26, 2008 4:39 pm
by eranif
It is nice, however:
At the first steps it is not planned to make a very high performance or very small C++ preprocessor. If you are looking for these objectives you probably have to look at other places. Although our C++ preprocessor iterator works as expected and is usable as a reference implementation, for instance for testing of other preprocessor oriented libraries as the Boost Preprocessor library [7] et.al. Nevertheless recent work has lead to surprising performance enhancements (if compared with earlier versions). Wave is still somewhat slower as for instance EDG based preprocessors (Intel, Comeau) on simple input files, however, as complexity increases, time dilates expontentially on EDG. Preprocessing time dilates linearly under Wave, which causes it to easily outperform EDG based preprocessors when complexity increases.
Eran

Re: Using Ctags vs. complete C++ parser

Posted: Sun Oct 26, 2008 5:34 pm
by count0
IMHO, your concern with performance is reasonable, but C++ is a huge language, so make a fast, compact full parser is almost impossible.
You can see the speed Visual Studio creates its ncb file, it is very slow.

Re: Using Ctags vs. complete C++ parser

Posted: Sat Oct 03, 2009 6:09 pm
by sreinst1
Sorry for continuing this a year later, but I just took a look at Boost.Wave and it seems to be a C++ preprocessor, not a C++ parser. Am I missing something?

Re: Using Ctags vs. complete C++ parser

Posted: Sat Oct 03, 2009 6:20 pm
by eranif
No, you are not missing anything:)

If a C++ parser was easy to find, all IDEs out there would have used it by now. The problem is that there are very few good parsers out there, all belong to commercial companies (VisuallAssist, MS, SlickEdit), the rest are at codelite's parser or below

Boost is too slow and heavy for this task.

Eran

Re: Using Ctags vs. complete C++ parser

Posted: Sat Oct 31, 2009 11:09 am
by sreinst1
Eventually I realized that the use of a custom parser (like CodeLite's) is the preferred way to go in many cases.
Why? C++ code commonly contains many compiler conditions (e.g. #ifdef WIN32). In many cases, the same function can have multiple definitions - one for each platform. Ctags does not care about these - it simply adds all definitions to its output, without differing between the compilation conditions. A parser would have to do a lot to support both definitions and keep them associated with the preprocessor conditions. A compiler does not have this problem because the preprocessor definitions are specified for the compilation. I doubt there exists a parser that supports such preprocessor conditionals, and that is still fast.

Another issue I noticed is that sometimes the preprocessor definitions are not maintained correctly, causing imbalance of '{', '}' for some combinations, e.g.:

for (int i = 0; i < 3; i++)
{
#if COND1
for (int j = 0; j < 3; j++)
{
doThis();
#endif
doThat();
}
...

In this case, COND1 would not be able to compile (probably it was okay when it was added but broke due to some code changes which did not maintain it). Ctags also breaks on such things, as the braces no longer match.
Did you do anything in CodeLite for this?