Monday, February 4, 2013

Theory of Computing - Part 2b: The Software (cont.)

Compiled Programs

At this point, computers are still very large, very specialized machines. A program is written in assembly language for the specific device. You can share programs between identical computer models, but sharing programs between different types of computer depends upon whether they have shared processor instruction sets and assembler languages or not. Of course, most computers are specialized enough that no one cares, but some early computer scientists come to the conclusion that it would be very useful if you could take a program in a single language and build it for any device you want. Thus the compiler is born.

So what is a compiler and how is it different from an assembler program?
At their core, both an assembler and a compiler are computer programs that take a human readable computer program and convert it into low level instructions that can be executed by a computer.
However, a compiler is more complicated in that:

  1. The programming instructions a complier takes in do not directly correspond to the operation codes required by any processor.
  2. The output of a compiler may not be in any machine specific language. Instead, they can be converted from something similar to assembly called object code into machine specific instructions.
  3. As you will see, a compiler generally has more steps involved when translating from the programming language to the binary executable, and the process has become more involved over time.

How a compiler works

As mentioned earlier, the a compiler is a computer program that takes another computer program as input. However, unlike an assembler, this program can be written in a language that is independent of the processor instructions, and can therefore be more human readable. However, to go from "X = Y + Z;" to something that can be ran on a computer, takes many more steps.

The first step is to read through the program and break it down into pieces that can be converted into object codes. This is called parsing the program and it involves taking certain key words and symbols and replacing them with one or more non-ambiguous symbols which can be used to differentiate identifiers (like "X", "Y", "Z" in the example above), operators (like "+" and "=") or structural statements (like the ";" which represents the end of the statement") that are present in the program. From there, you can tell if there are errors made by the programmer which result in something that can't be converted into machine code.

After parsing the program, and determining that the program is valid, the compiler then proceeds to analyze and optimize the intermediate code and convert it into object code, depending upon the compiler, and later machine executable code.

Variations on this theme

This has basically brought us up to the 50s and early 60s in terms of chronology. Computer software has moved from a completely hardware dependent model that talked directly to the processor to a device independent model that is completely oblivious of what the processor needs and is only interested in satisfying the compiler. While developments in language and hardware have allowed programmers to harness more powerful machines without needing to get involved in the nuts and bolts of the processor, the concept of how programming works is not that different. You take the problem you want to solve, break it into steps, describe those steps using the programming language you want, and let the compiler convert it into machine language. However, there are two important variations that have allowed programmers to move completely away from the need for a static compiler at all. They are: Interpreted languages, and Virtual machine coding.

Interpreted languages

As mentioned before, a compiler is a computer program, that takes a different computer program as input and converts it into machine language. Early on, many programmers asked themselves, what would happen if we simply eliminated that last step? Instead of outputting a machine language, what if we just had it execute the instructions as if they were part of its own process?

The result was a way of developing programs that were simple and device independent but allowed computer programmers and operators to solve small problems without the overhead of a compiled program. As interpreted languages have evolved, it has also allowed programmers to be almost completely divorced from the boundaries of the machine. As long as the person who wrote the interpreter knew when and how to convert and define your code, you could write code that completely blurred the boundaries between letters, numbers, operations, etc. Much of the Internet is built on device independent code like this. The only downside is, it takes much more memory and processor time than the same code would take if compiled directly. So it would be completely useless for the intensive memory and data applications that computers were originally designed to be good at, but great for smaller and more casual applications.

Virtual machines

The other question some programmers asked themselves is, what would happen if, instead of outputting machine code for a specific device, we had the compiler output binary code that could be compiled to instructions for a "virtual" processor which then quickly converts from there to device specific instruction codes? This is the concept of a virtual machine, and it solved the problem of device independent code taking up too much memory and processor time. However, you then have to make your language generic enough that it can be used on everything from a computing array to a cell phone processor, which still makes it less popular and useful for many applications. And, so, while there are a number of virtual machines, the language and machine used most often is Java.

Software concluded.

Hopefully, this has given you some insight into how computer software works, why a computer has software at all, and why you are now able to write in a single language and have it be ran on many different kinds of machines without any problems. The next topic will cover what an operating system is, why it is necessary, and what it does for programmers and users.