Compiling programs on Linux with GCC
A compiler is a special program that processes instructions written in a particular programming language and transforms them into machine language or code.
Generally, a programmer writes instructions in a high-level language such as Pascal or C, using a text editor. The file that is created contains what we call the program source.
The programmer then executes the appropriate language compiler, specifying the name of the file containing the source instructions.
When executed, the compiler first analyzes all of the language’s instructions syntactically one after the other and then, in one or more successive stages or “steps”, builds the exit code.
Traditionally, compilation output is referred to as object code or sometimes object module. Object code is machine code that the processor can execute.
Traditionally in some systems and languages, an additional step is required after compilation.
This step is intended to resolve the relative location of instructions and data when more than one object module is executed at the same time and they are cross-referenced from one to the other. This process is known as bonding.
The most used compiler on Linux it’s the GNU Compiler Collection - GCC. It compiles ANSI C codes, well
such as C++, Java, and Fortran. GCC supports various levels of error checking in source codes, produces debugging information and can also optimize the file object produced.
The compilation involves up to four stages: preprocessing, actual compilation, assembly and
connection, always in this order.
GCC is capable of pre-processing and compiling multiple files into one or more assembler files.
The assembler file generates one or more files called objects and these are linked to libraries (linking) to make an executable file.
PRE-PROCESSING
Pre-processing is responsible for expanding the macros and include the header files in the source file. The result is a file containing the expanded source code.
COMPILATION
The next stage called “compilation” proper” is responsible for translating the pre-processed source code into assembly language (machine language) for a specific processor.
ASSEMBLER
The next stage is called assembler. In this step, GCC converts the machine code of a specific processor into an object file.
If there are external function calls in this code, gcc leaves their addresses undefined to be filled in later at the connection stage.
LINKER
The last stage called linking, or linker, is responsible for linking object files to create an executable file.
It does this by filling in the addresses of the undefined functions in the object files with the addresses of the operating system’s external libraries.
This is necessary because executable files they need many external system functions and C libraries to be executed.
Libraries can be linked to the executable filling in the addresses of the libraries in external calls or in a form static when library functions are copied to the executable.
In the first case, the program will use libraries in a shared way and will be dependent on them to function.
This scheme saves resources because a library used by many programs needs to be loaded into memory only once. The executable size will be small.
However, if the installed library is of a different version than the executable requires, the program will not run until the appropriate version of the library is installed.
In the second case, the program is independent, since the functions it needs are in its code.
This scheme allows that when there is a change of library version the program will not be affected. The downside will be the size of the executable and the need for more resources.
The GCC - GNU Compiler Collection
The GNU Compiler Collection — GNU compiler collection — they are complete compilers of the ANSI C language with support for K&R C, C++, Objective C, Java, and Fortran. The GCC offers also different levels of error checking for source code and information for debugging and optimizations of the object program.
It also supports the modern Intel IA-64 processor platform. Version 7.0 includes new APIs and C++ libraries. The manual was also substantially rewritten and improved.
Preprocessing
Preprocessing is responsible for expanding the macros and including the header files in the source file. The result is a file containing the expanded source code.
Preprocessing the simple example below will generate a C file with over 800 lines of expanded code.
See what this simple Hello World-style program that prints Linux certification on the screen looks like:
#include
int main ()
{
printf (“Linux Certification! n”);
};
GCC’s “-E” option tells the compiler to only pre-process the source code.
# gcc —E test.c —or test.i
Notice that the preprocessed file teste.i generated 841 lines of pre-processed code:
# cat test.i | wc
841 2074 16879
Compiling in GCC
The next stage called “compilation proper” is responsible for translating the source code
preprocessed in assembly language (machine language) for a processor specific.
To let you see the result of the compilation stage, GCC’s “-S” (capital) option generates machine code for a specific processor.
The code below is the pre-processed example above that was compiled in assembly for the Intel Xeon processor:
# gcc —S test.i
# cat teste.s
.file “teste.c”
.section .rodata
.LC0:
.string “Certify 303 247 303 243o Linux!”
.text .globl
main .type main
, @function main: .LFB0
:
.cfi_startproc
pushq%rbp .cfi_def_cfa_offset
16 .cfi_offset 6, -16 movq %rsp, %rbp
.cfi_def_cfa_register
6 movl $.LC0,
%edi call puts popq %rbp
.cfi_def_cfa_register 6 movl $.LC0, %edi call puts popq %rbp .cfi_def_cfa_register 6
movl $.LC0, %edi
call puts
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident “GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-28)”
.section .note.GNU-Stack, ””, @progbits
Code Assembler in gcc
The next stage is called assembler. In this step, GCC converts the machine code of a processor specific in an object file. If in this code there are external calls from functions, gcc leaves your addresses undefined to be filled in later through the connection stage.
The following command will do the compiling the assembly code from the previous example for machine code:
# gcc test.o -o test
The as command is an assembler compiler available in the GNU GCC package.
The result will be a “test.o” file containing example machine instructions with the undefined reference to the external printf function.
The “call printf” line in the assembly code states that this function is defined in a library and should be called during processing.
Linker the code with the libraries
The last stage called linking, or linker, is responsible for linking object files to create an executable file.
It does this by filling in the addresses of the undefined functions in the object files with the addresses of the operating system’s external libraries.
This is necessary because executable files need many external system functions and C libraries to run.
Libraries can be linked to the executable
filling in the addresses of the libraries in external calls or in a form
static when library functions are copied to the executable.
In the first case, the program will use libraries in a shared way and will be dependent on them to function.
This scheme saves resources because a library used by many programs needs to be loaded into memory only once. The executable size will be small.
In the second case, the program is independent, since the functions it needs are in its code.
This scheme allows that when there is a change of library version the program will not be affected. The downside will be the size of the executable and the need for more resources.
Internally, the connection step is
very complex, but the GCC does this transparently through the command:
# gcc test.o —the test
The result will be an executable called test.
#. /test
Linux Certification!
This program was linked to libraries in a shared way. The ldd command tells you which libraries are connected to the executable:
# ldd test
linux-vdso.so.1 => (0x00007ffd69cff000)
libc.so.6 => /lib64/libc.so.6 (0x00007f8ee7991000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8ee7d5e000)
As libc.so.6 and ld-linux.so.2 libraries are referenced dynamically through the test program.
The option compiler’s “-static” copies external functions from libraries to executable:
# gcc —static test.c —or test1
The ldd command will inform you that the test1 executable has no dynamic connections:
# ldd test1
not a dynamic executable
The difference in the size of the executables is very large. The test program that uses dynamic connections has 8,279 bytes.
Test1 is 1,884,604 bytes long. Therefore, only use the copy of the libraries for the executable if strictly necessary.
Compiling the right way
Once the software has been downloaded from the Internet and extracted the contents of its package, it’s time to compile the source code.
It is very common for developers when creating their software and distributing its source code to incorporate two special files that greatly facilitate the compilation of programs: configure and Makefile.
Configure
The “configure” is a script that the developer creates to check that the machine has all the necessary requirements to compile the software. This generally involves checking that the required compiler exists and that all necessary libraries are installed.
The configure also allows the user to enable or disable any function of the software at the time of compilation.
You can see the options that the configure allows with the “—help” option:
Once the configure is executed, it does all the checking of programs, libraries, and dependencies to compile the software.
It will create a special file called Makefile, which contains the software’s compilation directives.
If there is a problem or lack of any dependency, configure will alert the user so that this dependency is satisfied, and then configure can be executed again, until the Makefile is generated.
Makefile
The Makefile is a file in the form of a script containing the commands for compiling the software, customized for the machine in question, with the options that the configuration enabled or disabled.
For the software to be compiled, the make utility is required to read the contents of the Makefile and trigger the software compilation process.
Make
The make utility is necessary to compile multiple source code files for a project. It uses a description file generally named Makefile. The contents of this file contain rules that define the dependencies between source files and the commands required for compilation.
From this description file it creates sequences of commands that are interpreted by the shell. Generally, the gcc compiler is invoked with several options that complete the dependencies of other files, objects, and libraries.
Even the smallest software projects contain several files that are interdependent, and the make command and Makefile greatly facilitate the process of compiling software.
To compile the software, simply type make in the software project’s current directory:
In this way, make will read the Makefile and complete the entire process of compiling the software.
Errors may occur at the time of compilation, mainly due to lack of libraries or problems with the version of the libraries, which were not foreseen by the developer when creating the configure.
Once the software is compiled, make’s “install” directive can be used to install the newly compiled software in the appropriate directories on Linux:
Once this is done, the software will be properly installed on the system.
Programs that are built this way were generally packaged using a set of programs referred to as autotools. This suite includes autoconf, automake, and many other programs, all of which work together to make the life of a software maintainer significantly easier. The end user doesn’t see these tools, but they take the pain out of setting up an installation process that will run consistently across different Linux distributions.