API: Application Programmer Interface
An API is a documented group of software components that can be called by application programmers as needed to extend their abilities. In the context of an operating system, the system APIs are a phone book of features that are available for your application to use. These APIs allow the programmer to access hardware and devices, take input and output from the user and, in the case of a Graphical User Interface, draw windows, cursors, and other interface elements without having to create the code for that directly.
Unlike other aspects of computing, which only care about the compiled binary instructions and are agnostic about how you reach them, APIs are language dependent. While the code that is called when your program uses the API is a compiled binary, the function call to use that code has to be made using the same language as the API and linked to your code by the compiler before it will work.
Function Calls Explained
While it may work slightly differently for different languages, all modern languages have a way for a programmer to place blocks of code in a separate area called a function. This separation allows the code to be reusable and to be independent of whatever is happening in the main body of code that is calling the function.
Each function has a signature or header that gives it a specific name and tells the compiler (and the programmer) what the function needs in order to run properly. For example, a function that returns the average of two numbers could be called "average (x,y)", and could require two floating point numbers "x" and "y" as input values and return a floating point value as its output.
Another major difference between older systems, which had "subroutines" instead of "functions", but which worked similarly, is in the way the processor handles the execution of these instructions. In the older systems, the subroutine was called using a jump command, which tells the processor to go to a specific address in the program memory and execute the instructions there, followed by a different jump command at the end of the subroutine which told the processor to jump back to the location that called it.
Modern programs instead push the memory address of the command that called the function onto a data structure in memory known as a "stack", or in this context "the call stack". The processor then pops the data stored back off of the "call stack" when given the command to return so that it will know which memory location to go to and continue executing.
This means that the function no longer has to keep track of the specific memory location to return to, unlike with subroutines. Also, when creating the entry in the call stack, each set of variables used by the function have their own separate copy. This means you can call a function multiple times by different actively running sections of code without having one section of executing code interfere with what is happening in another section of code even if they both are making calls to the same function at roughly the same time. Of course, too many levels of function call can create a different error, known as a "stack overflow", which would cause the kernel to kill the process. This generally only happens under recursive function calls that are not properly ended and result in a kind of infinite loop. See this article for more details.
API Calls In Detail
The first step to making an API call correctly happens when you are writing and compiling the code. Each API has all of the allowed functions stored in separate header files that your program can link to when building it. These header files tell the compiler what each function needs to be called correctly. Assuming you have correctly linked your source code and made your call to the API, then at run time the processor will run the code exposed by the API exactly the same as any normal function created by your program. That is, it will push the memory address and variables used by the code calling the API function onto the stack, and then execute the API code which has already been loaded into memory by the kernel either at system startup, for common functions, or at application startup. When the code is finished executing, it will send a return command which pops the stack in order to get the memory location and variable state and loads the values returned by the function into the memory locations indicated when you made the function call. The processor will then go to the next instruction in memory and continue execution from there.
So, in summary, an API is a way for your code to call a function that is used in the operating system and to add its functionality just as you would import a function you created on your own, only without needing to know too much of the details about how it works. As long as you call the API correctly, it should work with your code and allow it to use the system to extend your program's functionality.