Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

C++ Programming

C++ Source Reverse Engineer - How to write a parser ?


Hi,

Im interested in Reverse Engineering C++ source code into a form more
comprehensible than the source itself.

I want to write a basic one myself, obviously i need to write a parser
for the source code.
Although this has some overlap with say a compiler it would also seem
significantly different too.

Can anyone provide me with links etc on how one would go about writing
such a parser?
No doubt i would also need a reference to the syntax rules of C++ etc.

Secondary can anyone recommend a good tool that currently exists to do
the job?

Thanks.

Herby wrote:
> Hi,

> Im interested in Reverse Engineering C++ source code into a form more
> comprehensible than the source itself.

> I want to write a basic one myself, obviously i need to write a parser
> for the source code.
> Although this has some overlap with say a compiler it would also seem
> significantly different too.

> Can anyone provide me with links etc on how one would go about writing
> such a parser?
> No doubt i would also need a reference to the syntax rules of C++ etc.

The gcc source.

--
Ian Collins.

Herby wrote:
> Im interested in Reverse Engineering C++ source code into a form more
> comprehensible than the source itself.

> I want to write a basic one myself, obviously i need to write a parser
> for the source code.
> Although this has some overlap with say a compiler it would also seem
> significantly different too.

> Can anyone provide me with links etc on how one would go about writing
> such a parser?
> No doubt i would also need a reference to the syntax rules of C++ etc.

If you have to ask this question you should IMHO better start with a smaller
project.

> Secondary can anyone recommend a good tool that currently exists to do
> the job?

I don't know if its a good one, because its a bit outdated but may be worth
a try: gccxml uses the gcc frontend to parse the sources and creates a xml
output which can be easily read.

Mathias

Herby wrote:
> Hi,

> Im interested in Reverse Engineering C++ source code into a form more
> comprehensible than the source itself.

ambitious goal, I think, but I hope you will succeed somehow :)
The problem that I see is that a human-written source code is usually
the most comprehensible expression of an algorithm that embodies all the
details about the algorithm itself. Of course you can find some sort of
compromise: for example, there are UML programs that are able to sketch
diagrams from the source files.

> I want to write a basic one myself, obviously i need to write a parser
> for the source code.
> Although this has some overlap with say a compiler it would also seem
> significantly different too.

You are right. Especially if you decide the abstraction level that you
want to stop at, it may be much simpler.

> Can anyone provide me with links etc on how one would go about writing
> such a parser?

I have a link and a suggestion. The link is:
http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/
elsa is an opensource c/c++ parser, I think it's quite accurate. Of
course that means that you need to have a deep knowledge of the c/c++
syntax.

the suggestion is: look at some sourcecode of opensource UML editors.
They do a similar thing to what you are trying to do, probably you can
find some interesting hint.

> No doubt i would also need a reference to the syntax rules of C++ etc.

C++ standard. There is everything, including the BNF syntax
specification of the language.

Regards,

Zeppe

On Jun 6, 10:58 am, Herby <prmarjo@gmail.com> wrote:

> Hi,

> Im interested in Reverse Engineering C++ source code into a form more
> comprehensible than the source itself.

> I want to write a basic one myself, obviously i need to write a parser
> for the source code.
> Although this has some overlap with say a compiler it would also seem
> significantly different too.

> Can anyone provide me with links etc on how one would go about writing
> such a parser?
> No doubt i would also need a reference to the syntax rules of C++ etc.

The classical approach (beside building one by hand) is using tools
like lex and yacc (or bison). You should read on about compiler
building (what you want to build is a compiler, if I understand you
correctly (Translating your-own-language-tm into C++)), lexing and
parsing.

If you want to stay inside c++ you can use boost::spirit, which is
similar to using yacc, but without the need to use an extra tool.

Note that spirit is a library that basically takes a modified form of
the EBNF syntax and embeds it into C++. Take a close look at how it is
implemented, because the technique used might be a better approach to
solving your problem (Just a wild guess, since I do not know what
problem you are trying to solve).

If you go the spirit route there is also boost::wave which is a full
implementation of the C++ preprocessor (in fact IIRC the only FULL
implementation of it.). Someone told me that there is also a person
who is working on a full c++ parser using spirit, but i have not yet
seen any further detail on it.

--
Fabio Fracassi

"Herby" <prmarjo@gmail.com> wrote in message

news:1181120334.748152.164790@o5g2000hsb.googlegroups.com...

> Hi,

> Im interested in Reverse Engineering C++ source code into a form more
> comprehensible than the source itself.

> I want to write a basic one myself, obviously i need to write a parser
> for the source code.

You don't need to write a parser to do reverse engineering.
It is probably true that to do reverse engineering, you will need a parser.

Building a C++ parser is lot harder than people who have not
done it think it is.  You need a lexer, covering all the standard's dark
corner requirements.
You need a preprocessor.  You need a non-standard parsing
engine because C++ isn't LALR, and yacc won't work.
You need a grammar not just for ANSI C++ but for the dialect
of C++ you actually have (Sun? GNU? Microsoft?)
If you are a realist, you'll need a symbol table telling you
where names are defined and what they are defined as, that
is scope accurate.  Expect building a robust parser to take several
man-years
at a minimum; we have considerably more than that in ours
to address the above issues.

> Although this has some overlap with say a compiler it would also seem
> significantly different too.

Ours captures comments and most preprocessor conditionals unexpanded.

> Can anyone provide me with links etc on how one would go about writing
> such a parser?
> No doubt i would also need a reference to the syntax rules of C++ etc.

Check comp.compilers and various conferences on reverse engineering.
You won't find a lot of specific detail; you'll find tantalizing hints of
how to solve problems but that won't remove the sweat
equity required.   I've been down that route.

> Secondary can anyone recommend a good tool that currently exists to do
> the job?

Depends on what you mean by "reverse engineering".
If what you want are all the above features packaged in a form in
which you can construct a reverse engineering tool,
then DMS may suit your needs:

www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html

If you mean "a tool that does reverse engineering", then Scientific
Toolworks may have what you want.

> Thanks.

--
Ira Baxter, CTO
www.semanticdesigns.com
Guys thanks for all the interesting responses.

I have worked as a software developer for 10+ years mostly in
maintenance mode for medium to large C++ projects. Usually these
projects do not have some kind of design roadmap to guide you into
them.
I feel this is much more the reality.

At best you have some kind of source browser within your IDE, find all
references, goto definition etc.

In this time i have come up with some ideas of my own that build on
these and i would really like to try them out.  So i am reversing the
source to something more abstract allowing to reason more effectively
with the source i may be about to modify.

http://www.objectmentor.com/resources/downloads.html

The about link is a script that gives some design quality metrics for
a set of header files.
Its a good start, but id like to write something proper and take the
idea much further...

Again these are some of the tools on the market -
http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis

So hope this makes it clear  what im trying to achieve.

Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc