Thread: Compiler Help
View Single Post
Old 07-19-2005, 08:44 PM   #2 (permalink)
TheHeadFL
 
Ultra Techie

Join Date: Jul 2005

Posts: 530

TheHeadFL

Send a message via AIM to TheHeadFL
Default

Well, when I did this back in my OS/Compiler design classes, we had to use lex and yacc to do it. (Tools that generate lexical analyzer and grammar parser)

But basically, are you asking how to write your own lexical analyzer? Well, I am not sure of the STL libraries in C++ provide a tokenizer, but I am pretty sure that they do. You would need to parse all the tokens in your input and generate a big list of token objects.

So, basically, you break down your input into all the corresponding tokens.

[Keep in mind all of the following lexical analyzer commentary I am writing here is for writing a homemade easy rigged-type version. Normally this would be done using regular expressions but that is a whole OTHER can of worms]

You let the STL library function (assuming it exists, otherwise, just go grab a library off google to help with this) find the actual individual tokens, then you compare each token and determine the type of token it is (i.e. reserved word, operator, whatever) and assign it to a token object with a corresponding token type attribute. You are going to need to define some rules for the tokens, in other words, what are your reserved words, and, can your variables start with a number or not. You're also going to need to tell the built-in tokenizer to stop parsing text when it encounters things like whitespace, your operators, etc.

Then, once you have all of these things worked out, you've got your input parsed into a bunch of token objects that have a corresponding type which fits into a grammar somehow. So, to represent, you might have a list like <reserved word> <variable> <operator> <variable>, etc.

Then, you take all of this and run it through a parser which can go through all the tokens you've classified and determine if the input matches your grammar, which you would have to define somehow. Mostly this is done in formal language definition form, and the grammar parser is made by yacc or bison or something. But since you're going to do it manually, you'll want to set up a bunch of types of things....

So in your rules specification, you would want to have maybe... (Forgive my syntax, its been a LONG time, but you get the general idea)
<expression> = <expression> <operator> <variable> | <expression> <operator> <expression> | <variable> <operator> <variable>.

And so on and so forth. You break your language down into that kind of grammar and such, and you write a parser to examine the tokens and fit them into the grammar. If they don't fit, you have a syntax error.

I hope that I kinda made it somewhat clearer, but if I haven't I apologize. Writing lexical analyzers and grammar parsers is a huge pain in the rear and basically few people do this kind of thing from scratch anymore. There isn't any sense in reinventing the wheel. But, of course, writing a compiler is already a pretty complicated thing to do, so this is just another aspect of that.
__________________
Desktop machine: 2 x Opteron 246, Asus K8N-DL, 2GB PC3200 ECC Reg., XFX GeForce 6600GT, 74gb WD Raptor, 2 x 19\" LCDs, Windows XP x64
Server machine: Intel P4 3.0GHz 2MB EM64T, ECS i865pe, 1GB PC3200, 36gb WD Raptor, Windows Server 2003
Laptop: Dell Inspiron 9100 (Intel P4 3.2GHz 1MB Prescott, i865pe, 512MB PC3200, Mobility Radeon 9700, DVD+R/DL Burner), Windows XP
Linux: P3 450Mhz, 386MB ram, Slackware 10.1 (Running mySQL/Apache)
TheHeadFL is offline