Introduction
Brainf**k (replace ** with u and c and you get the actual name - but I don't want to cause offence) is a super minimal programming language. Managed COBOL is a very large programming language. I just love the idea of translating BF into Managed COBOL.
BF (as I shall call it from here on) contains only 8 symbols and is derived from P" which contains only 6. Both languages are Turin Complete and neither is of any utility in actual programming. P" was created as a exercise; the font of all knowledge (ahem) Wikipedia has this to say about it 'P'' was the first "GOTO-less" imperative
structured programming language to be proven
1,2 Turing-complete.'
My idea here is:
- Make a translator from BF to Managed COBOL in Managed COBOL.
- The result should be able to be used to create classes where the BF code is a method body.
- Demonstrate these being compile and called from COBOL with Visual COBOL 2010.
- Demonstrate some simple abstract symbol tree optimizations.
- Demonstrate BF being called from C# and (using a developmental compiler) Java [Later post].
- Go the whole-hog and translate BF to COBOL and then call from F# [Later post].
Yes - I want to prove that Managed COBOL can do fun stuff like making it possible to call BF from F#!
For more on Managed COBOL please see my blog.
What is BF and how can we translate it to COBOL?
BF has a notion of a continuous area of bytes which hold integers. There is a pointer into those bytes referred to as the data-pointer. There is also the program which is a sequence of symbols. All symbols are legal, but anything other than the 8 BF symbols are ignored. The program notionally moves from the start to the end of the symbols but does have a looping construct. P" and BF are the same to this point, however, BF extends the model into having some form of IO. This is normally associated with keyboard and display, but I will use the IO extensions of BF to provide the mechanism for passing in and returning data when BF is invoked from other languages.
Here as the BF symbols:
> move the data pointer right (higher value)
< move the data pointer left (lower value)
+ increment by 1 the byte at the current data pointer
- decrement by 1 the byte at the current data pointer
[ begin loop
] close loop
. move the value at the data pointer to the output
, move the next input value to the value at the data pointer
Notes: Loops continue to execute the symbols contained repeatedly until the data pointed to by the data pointer is 0 at the start of an iteration. Bytes can take the decimal values 0-255. Incrementing a value 255 will result in 0; decrementing a value 0 will result in 255. Loops can be nested.
Arguments And Returns
To allow my BF routine to be invoked from other languages, I have the idea of using the IO symbols. A table of bytes will be passed to BF. And BF will 'return' a new table of bytes (for the non COBOL heads, a table is pretty much the same thing as an array in C#).
The passed in table is the arguments to the routine and the passed back table is the return. BF has an input pointer which starts at the start on the arguments table; each time BF reads an bytes from the arguments the pointer is incremented. Incrementing the input pointer beyond the end of the table causes the pointer to reset to 1 (the start).
The returned table is initialized as empty then each time there is an output from BF the output bytes value is added to the end of the output table. (The implementation does not actually append to a table, it appends to a list and converts the result to a table, but the logical effect is the same.)
Compilation
The compilation process involves simply iterating over to input bytes and each one being converted into an abstract symbol tree (AST) token or ignored. The AST can then be translated into COBOL and finally wrapped in some boilerplate COBOL to give it the appropriate environment. Optimization can happen to the AST.
The only part of BF which requires any serious thought (and only a few seconds at that) is the [] looping construct. Fortunately, modern COBOL dialects have the in-line, nested perform syntax. The [] BF syntax directly translated to...
until data(data-pointer) = 0
...
end-perform
... assuming that the data area is simply a table of bytes and there is a local-storage item data-pointer which points into that table.
OK - Let's write a compiler!
I have chosen not to implement the normal file read and write stuff which a compiler does. This compiler will accept a string and return a string. The input string being BF and the output being the source for a Managed COBOL method body. This is the most straight forward implementation which can then be added to other programs to compile files or other source of source.
The compiler is massive overkill. BF is such a simple language that a very simple compiler will do. My aim with this compiler it to make it have a very clean and structured design. I have not actually implemented a full AST but rather just a list (a tree with only one branch). This is possible because there is no calling structure and no variables. It is really a testament to the efficiency of languages with only conditional loops.
Also, I have represented each of the BF symbols as an object (I call them tokens) which 'knows' the symbol and the required COBOL to enact that symbol. These objects all implement a common interface (BFToken). By moving the symbol definition and COBOL representation to a polymorphic type, it is possible to move the responsibility for understanding the lex from the main compiler loop and into these token objects. Although BF has only 8 symbols, the approach would allow additions of new symbols without changing the lexer or parser at all. This was a design feature I used in VSPL to some success.
Here is the code defining the token class interface and an instance of a token class:
interface-id BFToken as "BF.BFToken".
method-id get property BFSymbol.
procedure division returning ret as string.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
end method.
end interface.
class-id BFIncr as "BF.BFIncr" implements type BFToken.
01 source-code.
03 filler pic x(29) value "if data-block(data-ptr) = 255".
03 filler pic x(02) value x"0D0A".
03 filler pic x(34) value " move 0 to data-block(data-ptr)".
03 filler pic x(02) value x"0D0A".
03 filler pic x(04) value "else".
03 filler pic x(02) value x"0D0A".
03 filler pic x(34) value " add 1 to data-block(data-ptr)".
03 filler pic x(02) value x"0D0A".
03 filler pic x(06) value "end-if".
method-id get property BFSymbol.
procedure division returning ret as string.
move "+" to ret.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
move source-code to ret
end method.
end class.
The AST is just a list of already created token classes. Here is the code which does the lexing/parsing and creates the AST:
table of type BFToken
(
new BFDecr
new BFIncr
new BFLeft
new BFRight
new BFInput
new BFOutput
new BFLoopStart
new BFLoopEnd
).
01 separator string value x"0D0A".
method-id compile public.
01 current-char character.
01 current-token type BFToken.
*> Use a list as a dicationary is overkill when we only
*> have 8 memebers and this is a trivial implementation
01 ast type List[type BFToken] value new List[type BFToken].
01 code-builder type StringBuilder value new StringBuilder.
procedure division using by value
source-code as string
returning generated-code as string.
*> These loops make up the parser and lexer
*> all in one
perform varying current-char through source-code::ToCharArray
*> respocibility for what characters make up the language
*> is devolved to the token objects
perform varying current-token through tokens
if current-token::BFSymbol::Chars(0) equals current-char
invoke ast::Add(current-token)
end-if
end-perform
end-perform
Once this loop has constructed the AST it can be interrogated directly to get the COBOL source to inject into the boiler plate.
By going to the effort of laying out the COBOL to be generated in the value clauses of COBOL group items we cause the generated COBOL code to be human readable and have a clean representation in the compiler source as well.
Here is an example of the generated code:
01 data-block binary-short occurs 32768.
01 data-ptr binary-long value 1.
01 input-ptr binary-long value 1.
01 output-list type List[binary-short] value new List[binary-short].
01 data-block-size binary-long.
01 input-block-size binary-long.
procedure division
using by value input-block as binary-short occurs any
returning output-block as binary-short occurs any.
set data-block-size to data-block::Length
set input-block-size to input-block::Length
if data-block(data-ptr) = 255
move 0 to data-block(data-ptr)
else
add 1 to data-block(data-ptr)
end-if
if data-block(data-ptr) = 255
move 0 to data-block(data-ptr)
else
add 1 to data-block(data-ptr)
end-if
if data-block(data-ptr) = 255
move 0 to data-block(data-ptr)
else
add 1 to data-block(data-ptr)
end-if
if data-block(data-ptr) = 255
move 0 to data-block(data-ptr)
else
add 1 to data-block(data-ptr)
end-if
Visual COBOL 2010 - this makes playing around like this really straight forward!
Below is the compiler in Visual COBOL 2010 and then the source code for the compiler. I have included the entire compiler in one source file.
Here is my working compiler with the null optimizer:
$set sourceformat(variable)
$set ilusing(BF)
class-id BFTester as "BF.BFTester".
method-id main static.
01 compiler type BF.BFCompiler value new BF.BFCompiler.
procedure division.
display compiler::compile("++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.")
display "All done"
end method.
end class.
class-id BFCompiler as "BF.BFCompiler".
01 optimizer type BFOptimizer value new NullBFOptimizer.
01 tokens type BFToken occurs any value
table of type BFToken
(
new BFDecr
new BFIncr
new BFLeft
new BFRight
new BFInput
new BFOutput
new BFLoopStart
new BFLoopEnd
).
01 separator string value x"0D0A".
method-id compile public.
01 current-char character.
01 current-token type BFToken.
*> Use a list as a dicationary is overkill when we only
*> have 8 memebers and this is a trivial implementation
01 ast type List[type BFToken] value new List[type BFToken].
01 code-builder type StringBuilder value new StringBuilder.
procedure division using by value
source-code as string
returning generated-code as string.
*> These loops make up the parser and lexer
*> all in one
perform varying current-char through source-code::ToCharArray
*> respocibility for what characters make up the language
*> is devolved to the token objects
perform varying current-token through tokens
if current-token::BFSymbol::Chars(0) equals current-char
invoke ast::Add(current-token)
end-if
end-perform
end-perform
*> Now we have the ast we optimize it
invoke optimizer::Optimize(ast)
*> Now we generate
*> First put in boiler plate
invoke code-builder::Append("01 data-block binary-short occurs 32768.")
invoke code-builder::Append(separator)
invoke code-builder::Append("01 data-ptr binary-long value 1.")
invoke code-builder::Append(separator)
invoke code-builder::Append("01 input-ptr binary-long value 1.")
invoke code-builder::Append(separator)
invoke code-builder::Append("01 output-list type List[binary-short] value new List[binary-short].")
invoke code-builder::Append(separator)
invoke code-builder::Append("01 data-size binary-long.")
invoke code-builder::Append(separator)
invoke code-builder::Append("01 input-size binary-long.")
invoke code-builder::Append(separator)
invoke code-builder::Append("procedure division")
invoke code-builder::Append(separator)
invoke code-builder::Append(" using by value input-block as binary-short occurs any")
invoke code-builder::Append(separator)
invoke code-builder::Append(" returning output-block as binary-short occurs any.")
invoke code-builder::Append(separator)
invoke code-builder::Append("set data-block-size data-block::Length")
invoke code-builder::Append(separator)
invoke code-builder::Append("set input-block-size input-block::Length")
invoke code-builder::Append(separator)
perform varying current-token through ast
invoke code-builder::Append(current-token::COBOLCode)
invoke code-builder::Append(separator)
end-perform
*> Final boiler plate
invoke code-builder::Append("set output-block to output-list::ToArray")
invoke code-builder::Append(separator)
set generated-code to code-builder::ToString
goback.
end method.
end class.
interface-id BFToken as "BF.BFToken".
method-id get property BFSymbol.
procedure division returning ret as string.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
end method.
end interface.
class-id BFIncr as "BF.BFIncr" implements type BFToken.
01 source-code.
03 filler pic x(29) value "if data-block(data-ptr) = 255".
03 filler pic x(02) value x"0D0A".
03 filler pic x(34) value " move 0 to data-block(data-ptr)".
03 filler pic x(02) value x"0D0A".
03 filler pic x(04) value "else".
03 filler pic x(02) value x"0D0A".
03 filler pic x(34) value " add 1 to data-block(data-ptr)".
03 filler pic x(02) value x"0D0A".
03 filler pic x(06) value "end-if".
method-id get property BFSymbol.
procedure division returning ret as string.
move "+" to ret.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
move source-code to ret
end method.
end class.
class-id BFDecr as "BF.BFDecr" implements type BFToken.
01 source-code.
03 filler pic x(27) value "if data-block(data-ptr) = 0".
03 filler pic x(02) value x"0D0A".
03 filler pic x(36) value " move 255 to data-block(data-ptr)".
03 filler pic x(02) value x"0D0A".
03 filler pic x(04) value "else".
03 filler pic x(02) value x"0D0A".
03 filler pic x(40) value " subtract 1 from data-block(data-ptr)".
03 filler pic x(02) value x"0D0A".
03 filler pic x(06) value "end-if".
method-id get property BFSymbol.
procedure division returning ret as string.
move "-" to ret.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
move source-code to ret
end method.
end class.
class-id BFDecr as "BF.BFRight" implements type BFToken.
01 source-code.
03 filler pic x(30) value "if data-ptr = data-block-size".
03 filler pic x(02) value x"0D0A".
03 filler pic x(22) value " move 1 to data-ptr".
03 filler pic x(02) value x"0D0A".
03 filler pic x(04) value "else".
03 filler pic x(02) value x"0D0A".
03 filler pic x(21) value " add 1 to data-ptr".
03 filler pic x(02) value x"0D0A".
03 filler pic x(06) value "end-if".
method-id get property BFSymbol.
procedure division returning ret as string.
move ">" to ret.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
move source-code to ret
end method.
end class.
class-id BFDecr as "BF.BFLeft" implements type BFToken.
01 source-code.
03 filler pic x(15) value "if data-ptr = 1"
03 filler pic x(02) value x"0D0A".
03 filler pic x(37) value " move data-block-size to data-ptr".
03 filler pic x(02) value x"0D0A".
03 filler pic x(04) value "else".
03 filler pic x(02) value x"0D0A".
03 filler pic x(28) value " subtract 1 from data-ptr".
03 filler pic x(02) value x"0D0A".
03 filler pic x(06) value "end-if".
method-id get property BFSymbol.
procedure division returning ret as string.
move "<" to ret.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
move source-code to ret
end method.
end class.
class-id BFDecr as "BF.BFInput" implements type BFToken.
01 source-code.
03 filler pic x(51) value "move input-block(input-ptr) to data-block(data-ptr)"
03 filler pic x(02) value x"0D0A".
03 filler pic x(32) value "if input-ptr = input-block-size".
03 filler pic x(02) value x"0D0A".
03 filler pic x(23) value " move 1 to input-ptr".
03 filler pic x(02) value x"0D0A".
03 filler pic x(04) value "else".
03 filler pic x(02) value x"0D0A".
03 filler pic x(22) value " add 1 to input-ptr".
03 filler pic x(02) value x"0D0A".
03 filler pic x(06) value "end-if".
method-id get property BFSymbol.
procedure division returning ret as string.
move "," to ret.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
move source-code to ret
end method.
end class.
class-id BFDecr as "BF.BFOutput" implements type BFToken.
method-id get property BFSymbol.
procedure division returning ret as string.
move "." to ret.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
move "invoke output-list::Add(data-block(data-ptr))" to ret
end method.
end class.
class-id BFDecr as "BF.BFLoopStart" implements type BFToken.
01 source-code pic x(55) value "perform with test before until data-block(data-ptr) = 0".
method-id get property BFSymbol.
procedure division returning ret as string.
move "[" to ret.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
move source-code to ret
end method.
end class.
class-id BFDecr as "BF.BFLoopEnd" implements type BFToken.
01 source-code pic x(11) value "end-perform".
method-id get property BFSymbol.
procedure division returning ret as string.
move "]" to ret.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
move source-code to ret
end method.
end class.
interface-id BFOptimizer as "BF.BFOptimizer".
method-id Optimize.
procedure division using by value ast
as type System.Collections.Generic.List[type BFToken].
end method.
end interface.
class-id NullBFOptimizer as "BF.NullBFOptimizer" implements type BFOptimizer.
method-id Optimize.
procedure division using by value ast
as type System.Collections.Generic.List[type BFToken].
*> Do nothing - does not optimize
end method.
end class.
I ran this compiler which was compiling a simple 'Hello World' program. I directed the output to a text file and then cut and paste the COBOL into a Visual Studio 2010 Managed COBOL project. When I ran the project, it produced the expected array (which in ASCII says Hello World! followed by a new line).
Simple Optimization
In the above implementation I have used a null optimiser; it does not do anything. The optimiser has the AST passed to it and is expected to perform all optimisations in place. I have been doing this project for fun (lunch is for lightweights etc.) and so I have not implemented much in the way of an optimiser. One thing which is really simple but insanely verbose about BF is its need of long sections of identical symbols. My optimiser takes chains of incr tokens and converts them into a single token which performs the same operation but in one step. To do this I have added a new token which is not added to the lexer but which is usable by the generator. This token replaces the inefficient incr chains in the AST.
The optimiser works by iterating over the AST and adding all non incr tokens directly to a new AST. When it encounters incr tokens, it counts over them and when the next non incr token is found it creates a single token to replace the chain of incr tokens and adds this new token to the new AST. Once the entire AST has been iterated over, it replaced the contents of the AST with the new AST.
class-id BFIncrMany as "BF.BFIncrMany" implements type BFToken.
01 source-code string.
method-id new.
01 source-builder type StringBuilder value new StringBuilder.
procedure division using by value how-many as binary-long.
invoke source-builder::AppendFormat("add {0} to data-block(data-ptr)" how-many)
invoke source-builder::Append(x"0D0A")
invoke source-builder::AppendFormat("compute data-block(data-ptr) = data-block(data-ptr) b-and 255")
set source-code to source-builder::ToString
end method.
method-id get property BFSymbol.
procedure division returning ret as string.
move "Not Implemented" to ret.
end method.
method-id get property COBOLCode.
procedure division returning ret as string.
move source-code to ret
end method.
end class.
class-id IncrBFOptimizer as "BF.IncrBFOptimizer" implements type BFOptimizer.
method-id Optimize.
01 in-incr condition-value value false.
01 how-many binary-long.
01 current-token type BFToken.
01 swapsies type List[type BFToken] value new List[Type BFToken].
procedure division using by value ast
as type System.Collections.Generic.List[type BFToken].
perform varying current-token through ast
if in-incr
if current-token::BFSymbol = "+"
*> This simple logic means very long chains of
*> + could overflow - but this is demo only
add 1 to how-many
else
set in-incr to false
invoke swapsies::Add(new BFIncrMany(how-many))
end-if
else
if current-token::BFSymbol = "+"
add 1 to how-many
set in-incr to true
else
invoke swapsies::Add(current-token)
end-if
end-if
end-perform
invoke ast::Clear
perform varying current-token through swapsies
invoke ast::Add(current-token)
end-perform
end method.
end class.
Now we can see how the large chains of increment token generated code have been replaced by simple additions:
01 data-block binary-short occurs 32768.
01 data-ptr binary-long value 1.
01 input-ptr binary-long value 1.
01 output-list type List[binary-short] value new List[binary-short].
01 data-size binary-long.
01 input-size binary-long.
procedure division
using by value input-block as binary-short occurs any
returning output-block as binary-short occurs any.
set data-block-size data-block::Length
set input-block-size input-block::Length
add 10 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
if data-ptr = data-block-size
move 1 to data-ptr
else
add 1 to data-ptr
end-if
add 17 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
add 27 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
add 30 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
add 31 to data-block(data-ptr)
compute data-block(data-ptr) = data-block(data-ptr) b-and 255
if data-ptr = 1
move data-block-size to data-ptr
else
subtract 1 from data-ptr
end-if
if data-ptr = 1
move data-block-size to data-ptr
else
subtr...
I really enjoyed that. I got back to my nerd roots!
If you got as far as the bottom of this post - thanks for reading and keep having fun!