|
|
 |
 |
 |
 |
C++ Source Reverse Engineer - How to write a parser ?
Hi, Im interested in Reverse Engineering C++ source code into a form more comprehensible than the source itself. I want to write a basic one myself, obviously i need to write a parser for the source code. Although this has some overlap with say a compiler it would also seem significantly different too. Can anyone provide me with links etc on how one would go about writing such a parser? No doubt i would also need a reference to the syntax rules of C++ etc. Secondary can anyone recommend a good tool that currently exists to do the job? Thanks.
Herby wrote: > Hi, > Im interested in Reverse Engineering C++ source code into a form more > comprehensible than the source itself. > I want to write a basic one myself, obviously i need to write a parser > for the source code. > Although this has some overlap with say a compiler it would also seem > significantly different too. > Can anyone provide me with links etc on how one would go about writing > such a parser? > No doubt i would also need a reference to the syntax rules of C++ etc.
The gcc source. -- Ian Collins.
Herby wrote: > Im interested in Reverse Engineering C++ source code into a form more > comprehensible than the source itself. > I want to write a basic one myself, obviously i need to write a parser > for the source code. > Although this has some overlap with say a compiler it would also seem > significantly different too. > Can anyone provide me with links etc on how one would go about writing > such a parser? > No doubt i would also need a reference to the syntax rules of C++ etc.
If you have to ask this question you should IMHO better start with a smaller project. > Secondary can anyone recommend a good tool that currently exists to do > the job?
I don't know if its a good one, because its a bit outdated but may be worth a try: gccxml uses the gcc frontend to parse the sources and creates a xml output which can be easily read. Mathias
Herby wrote: > Hi, > Im interested in Reverse Engineering C++ source code into a form more > comprehensible than the source itself.
ambitious goal, I think, but I hope you will succeed somehow :) The problem that I see is that a human-written source code is usually the most comprehensible expression of an algorithm that embodies all the details about the algorithm itself. Of course you can find some sort of compromise: for example, there are UML programs that are able to sketch diagrams from the source files. > I want to write a basic one myself, obviously i need to write a parser > for the source code. > Although this has some overlap with say a compiler it would also seem > significantly different too.
You are right. Especially if you decide the abstraction level that you want to stop at, it may be much simpler. > Can anyone provide me with links etc on how one would go about writing > such a parser?
I have a link and a suggestion. The link is: http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/ elsa is an opensource c/c++ parser, I think it's quite accurate. Of course that means that you need to have a deep knowledge of the c/c++ syntax. the suggestion is: look at some sourcecode of opensource UML editors. They do a similar thing to what you are trying to do, probably you can find some interesting hint. > No doubt i would also need a reference to the syntax rules of C++ etc.
C++ standard. There is everything, including the BNF syntax specification of the language. Regards, Zeppe
On Jun 6, 10:58 am, Herby <prmarjo@gmail.com> wrote: > Hi, > Im interested in Reverse Engineering C++ source code into a form more > comprehensible than the source itself. > I want to write a basic one myself, obviously i need to write a parser > for the source code. > Although this has some overlap with say a compiler it would also seem > significantly different too. > Can anyone provide me with links etc on how one would go about writing > such a parser? > No doubt i would also need a reference to the syntax rules of C++ etc.
The classical approach (beside building one by hand) is using tools like lex and yacc (or bison). You should read on about compiler building (what you want to build is a compiler, if I understand you correctly (Translating your-own-language-tm into C++)), lexing and parsing. If you want to stay inside c++ you can use boost::spirit, which is similar to using yacc, but without the need to use an extra tool. Note that spirit is a library that basically takes a modified form of the EBNF syntax and embeds it into C++. Take a close look at how it is implemented, because the technique used might be a better approach to solving your problem (Just a wild guess, since I do not know what problem you are trying to solve). If you go the spirit route there is also boost::wave which is a full implementation of the C++ preprocessor (in fact IIRC the only FULL implementation of it.). Someone told me that there is also a person who is working on a full c++ parser using spirit, but i have not yet seen any further detail on it. -- Fabio Fracassi
"Herby" <prmarjo @gmail.com> wrote in message news:1181120334.748152.164790@o5g2000hsb.googlegroups.com... > Hi, > Im interested in Reverse Engineering C++ source code into a form more > comprehensible than the source itself. > I want to write a basic one myself, obviously i need to write a parser > for the source code.
You don't need to write a parser to do reverse engineering. It is probably true that to do reverse engineering, you will need a parser. Building a C++ parser is lot harder than people who have not done it think it is. You need a lexer, covering all the standard's dark corner requirements. You need a preprocessor. You need a non-standard parsing engine because C++ isn't LALR, and yacc won't work. You need a grammar not just for ANSI C++ but for the dialect of C++ you actually have (Sun? GNU? Microsoft?) If you are a realist, you'll need a symbol table telling you where names are defined and what they are defined as, that is scope accurate. Expect building a robust parser to take several man-years at a minimum; we have considerably more than that in ours to address the above issues. > Although this has some overlap with say a compiler it would also seem > significantly different too.
Ours captures comments and most preprocessor conditionals unexpanded. > Can anyone provide me with links etc on how one would go about writing > such a parser? > No doubt i would also need a reference to the syntax rules of C++ etc.
Check comp.compilers and various conferences on reverse engineering. You won't find a lot of specific detail; you'll find tantalizing hints of how to solve problems but that won't remove the sweat equity required. I've been down that route. > Secondary can anyone recommend a good tool that currently exists to do > the job?
Depends on what you mean by "reverse engineering". If what you want are all the above features packaged in a form in which you can construct a reverse engineering tool, then DMS may suit your needs: www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html If you mean "a tool that does reverse engineering", then Scientific Toolworks may have what you want. > Thanks.
-- Ira Baxter, CTO www.semanticdesigns.com
Guys thanks for all the interesting responses. I have worked as a software developer for 10+ years mostly in maintenance mode for medium to large C++ projects. Usually these projects do not have some kind of design roadmap to guide you into them. I feel this is much more the reality. At best you have some kind of source browser within your IDE, find all references, goto definition etc. In this time i have come up with some ideas of my own that build on these and i would really like to try them out. So i am reversing the source to something more abstract allowing to reason more effectively with the source i may be about to modify. http://www.objectmentor.com/resources/downloads.html The about link is a script that gives some design quality metrics for a set of header files. Its a good start, but id like to write something proper and take the idea much further... Again these are some of the tools on the market - http://en.wikipedia.org/wiki/List_of_tools_for_static_code_analysis So hope this makes it clear what im trying to achieve.
|
 |
 |
 |
 |
|