where can I find information about high level language compilers?

Question

0.00/5 (No votes)

See more:

I know how compiler translate C or Pascal to machine code, but not how OOP compiler translate new concepts such as
1-Generic (Method,Types,Classes)
2-Delegate
3-Dynamic Binding
4-Inheritance (may be easy)
5-use child as parent and parent as child (Polymorphism)
I ask about native language such C++ not C# or JAVA
------
Thanks in advance

Posted 23-May-13 17:11pm

ibrahim_ragab

Add a Solution

Comments

pasztorpisti 24-May-13 4:26am

SA's answer already points this out but I would emphasize an important point: The compilation of a native and "non-native" language really isn't different. From the perspective of a compiler its just parsing the code (compiler frontend) and then generating/emitting machine code (assembly, or .net bytecode) for the target platform (compiler backend). The basic building blocks are the same.

Sergey Alexandrovich Kryukov 24-May-13 9:52am

That's right, but essential difference is bytecode/CIL and JIT. The final result is the same, but it actually affects the code. I did not even pay attention for the term "native language". We can understand by example what OP meant, but it's a confusing term. "Native language" suggests that someone learned C++ before saying "mama" and "papa"... :-)
—SA

pasztorpisti 24-May-13 12:28pm

:-) :-) :-) Sure, there is indeed no "native language". The OP probably refers to a non-dynamic language with a compiler that emits high performance code in the assembly of a real world bare metal. If we had a .net processor or java processor (I'm nut sure but I heared that there exist a java processor...) then we could treat C# and java as "native languages". In case of .Net and java the things is more complicated because the compiler emits just .net assembly. The jit itself isn't really part of the compiler - I treated it as an implementation detail of the virtual machine that is indeed another compiler that translates from .net bytecode to the assembly of the host architecture with a balance between optimization and compilation time.

Sergey Alexandrovich Kryukov 24-May-13 14:29pm

Indeed. Good idea about the concept of a processor and bytecode/CIL as a native language for it. In a way, a number traditional general-purpose hardware processors themselves work like that, as their instruction set is the outside shell only accessible for programming, but there is also microcode inside.
—SA

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Accepted Answer · 2013-05-23T18:56:00

Very briefly:

Generics: it depends on platform and technology. In all cases, generics are not represented in machine code. In technologies like Ada or C++ (not for .NET), it is like preprocessing. As all the instantiations of generic types and methods are statically known through the whole program, only the actual complete types are generated into machine code, and they are fully equivalent to non-generic methods and functions.

In platforms based on byte code of CIL (Java, .NET), generics are actually preserved in byte code or CIL language and are generated as a result of compilation. The generation of machine code happens during run-time. Please see:
http://en.wikipedia.org/wiki/Just-in-time_compilation[^].

Typically, JIT-compilations happens on per-method basis. As soon as some call is first needed, the generic type is instantiated as the code goes and the generic function is instantiated and generated in machine code. Not only incomplete type do not physically exist, but even the method may never be generated in machine code, if they exist on bytecode/CIL but not actually called during given runtime. Pretty interesting mechanism.

Classes is a very different issue. Classes with no virtual method are like regular structures, and their static methods are like regular non-OOP methods; their is no any difference from the machine code perspective. Not static methods, too, with one simple twist: they are fully equivalent to static methods, with one difference: they have one implicit parameter which is a reference or a pointer to some object representing the instance of the class. I explained it in further detail in my past answers:
What makes static methods accessible?[^],
Catch 22 - Pointers to interface objects die when function using them is made static.[^],
C# windows base this key word related and its uses in the application[^].

But virtual function are completely different. It's better to mention them in connection to dynamic binding (dispatch).
Now, delegates.

The delegate types do not exist in machine code. They are just notations of required method signatures, used only by compilers for validation. And the delegate instances have very different runtime types, not even related to delegate types.

The anatomy of the types of delegate instances (I have to resort to this ugly term to avoid confusion with delegate types) is very different in different systems. Here is how a simplest delegate is represented: this is a structure of 1) some method with "this" parameter which can be null during the call (see my links above), 2) a reference/pointer to the instance of some type (this type is unrelated to delegate type and to the type of the delegate instance) implementing the delegate method, this instance can be null, it is passed as "this" parameter during the invocation of delegate instance, 3) and, naturally, the instance of the type itself, as the reference mentioned in item #2 should reference some object. Naturally, the instance does not exist if the reference is null.

sinlge-cast

immutable

multi-cast

Dynamic Method Dispatcher

^

You probably mean dynamic dispatch: http://en.wikipedia.org/wiki/Dynamic_binding_%28computer_science%29[^].

See also: http://en.wikipedia.org/wiki/Late_binding[^].

In classical OOP, it is done via virtual method table. Those tables are also different (multiple inheritance, if it is supported, makes them much more complex). This is nothing but a structure of method addresses and some other data (such as RTTI: http://en.wikipedia.org/wiki/RTTI[^]).

The mechanism is pretty simple, but it's pretty hard to explain in words or even code. The best explanation of this main OOP mechanism is simple: you can understand it all if you have a late binding code and run it under the debugger step by step. Those who finally first get it usually experience some intellectual shock. All programmers should understand it well. (Too many, may be even majority don't, but they only work on programmer's positions but actually are not real developers.)

Virtual method table is actually an object, one per type (class/structure).

Please see: http://en.wikipedia.org/wiki/Virtual_method_table[^],
http://en.wikipedia.org/wiki/Virtual_function[^].
Inheritance. Actually, not so easy. This is the heard of OOP. If virtual methods are involved, inheritance creates a new virtual method table. If some methods are overridden, respective members of the table are replaced. It's important to understand: the mechanism of the call dispatch is dynamic, but all tables are statically known by the end of compilation, so they are never changed during runtime. So, in machine code, they are just static structures.

Remember what I mentioned about instance methods above: the pass implicit "this" parameter. This points to the type instance, and the instance has a pointer/reference to its virtual method table. So, if a method is virtual, it is called indirectly. Compile-type of the variable is one, but its run-time type is different, so some methods are called from the ancestor type(s), some from a type later in the hierarchy. Does it gives you a basic hint on how OOP actually works?

This is actually hard to appreciate without some example showing how this technique is applied. I don't think I should try to explain it all at once.
Polymorphism is a consequence of late binding explained above. It is itself is not a separate mechanism. It takes place when you have some set of objects of the same compile-time type (it can be a root ancestor type, but not just this, because there are also interfaces which you did not mention), but different runtime types. At this point, the talking of machine-code representation is over, because everything is already described above.

The remaining considerations are only about the use of this machine code:

Imagine you traverse the whole set in some loop. You call some methods, but only those available in the root parent class (others are not applicable, as some object won't have them). When you call those methods, they call other methods, which are late bound and are specific for different runtime types. This way, you handle the whole set with some common interface, but the different object respond to your calls in their different specific ways.

It leaves the interfaces out of picture. Interfaces generate somewhat different kind of polymorphism. I think think enough is enough, but you can see my past answers:
When we use abstract and when we use interface...?[^],
Difference between abstract class and interface if they have same no of methods and var[^],
How to decide to choose Abstract class or an Interface[^],
and this one is about interface-based polymorphism: Interfaces and Polymorphism[^].

That's all for now. I tried very hard by wrote it very quickly, almost without planning. So, you something is not clear (or even not quite accurate), sorry. Your follow-up questions will be welcome.

—SA