What is symbol and symbol visibility

Symbol is one of the basic terms when talking about object files, linking, and so on. In fact, in C/C++ language, symbol is the corresponding entity of most user-defined variables, function names, mangled with namespace, class/struct/name, and so on. For example, a C/C++ compiler may generate symbols in an object file when people define non-static global variables or non-static functions, which are useful for the linker to decide if different modules (object files, dynamic shared libraries, executables) would share same data or code.

Though both variables and functions may be shared among modules, variable sharing is more common among object files. For example, a programmer may declare a variable in a.c:

extern int shared_var;

And, define it in b.c:

int shared_var;

Thus, both symbol shared_var appears in compiled object a.o, b.o, and symbol in a.o may share the address of b.o finally after linker’s resolution. However, it is rare that people make variables shared amongd shared libraries and executables. And for such modules, it is very common to make only functions visible to the others. Sometimes we call such functions API, as the module is deemed to provide such interfaces for others to call into. We also say such symbols are exported since it is visible to the others. Notice that such visibility only takes effect at dynamic linking time since shared libraries are commonly loaded as part of memory image at program runs. Therefore, symbol visibility comes to be an attribute for all global symbols for dynamic linking.

Why need to control symbol visibility

On different platforms, the XL C/C++ compiler might choose either to export all the symbols in modules or not. For example, when creating Executable and Linking Format (ELF) shared libraries on the IBM PowerLinux™ platform, by default, all the symbols are exported. While creating an XCOFF library on AIX that runs on the POWER platform, current XL C/C++ compiler may choose not to export any without the assistance of a tool. And there are still some other ways to allow a programmer to determine symbol visibility one by one. (That is what we will introduce in the next part of this series.) However, generally it is not recommended to export all the symbols in modules. Programmers can just export symbols as needed. This does not only benefit library security, but also benefits dynamic linking time.

When programmers choose to export all symbols, there exists a high risk to get symbol collision at linking time, especially when modules are developed by different programmers. Because symbol is a low-level concept, it does not get scope involved. As soon as one links against a library with the same symbol names as that of yours, the library might accidentally overwrite your own symbols as linker’s resolution is done (hopefully there is some warning or error information given). And, in most cases, such symbols are never expected to be used from the library designer’s perspective. Therefore, creating only limited, (characterized by careful thought) meaningful names for the symbols can help a lot on such issues.

For C++ programming, nowadays there is a growing requirement for performance. However, due to dependencies against other libraries and using of specific C++ features such as templates, compiler/linker tend to use and generate a huge amount of symbols. Therefore, exporting all symbols slows down the program and costs massive memory. Exporting limited number of symbols can reduce the loading and linking time for dynamic shared libraries. Furthermore, it also enables optimization from the compiler’s perspective, which means more efficient code could be generated.

The above drawbacks of exporting all symbols explain why defining symbol visibility is mandatory. In this article, we provide solutions to make symbols in the dynamic shared object (DSO) be under control. Users can identify different ways to solve the same problem, and we also propose which one should be preferred on a specific platform.

Ways to control symbol visibility

In the discussions below, we will make use of the following C++ code snippet:

Listing 1. a.C


int myintvar = 5;

int func0 () {
  return ++myintvar;
}

int func1 (int i) {
  return func0() ∗ i;
}

In a.C, we define one variable myintvar, and two functions func0 and func1. By default, when creating a shared library on the AIX platform, the compiler and linker along with the CreateExportList tool would make all three symbols visible. We can check it from the Loader Symbol Table Information with the dump binary tool:

$ xlC ‑qpic a.C ‑qmkshrobj ‑o libtest.a
$ dump ‑Tv libtest.a

                        ∗∗∗Loader Symbol Table Information∗∗∗
[Index]      Value      Scn     IMEX Sclass   Type           IMPid Name

[0]     0x20000280    .data      EXP     RW SECdef        [noIMid] myintvar
[1]     0x20000284    .data      EXP     DS SECdef        [noIMid] func0Fv
[2]     0x20000290    .data      EXP     DS SECdef        [noIMid] func1Fi

Here, “EXP” means the symbol is “exported”. The function names func0 and func1 are mangled with C++ mangling rules. (However, it is not hard to guess.) The -T option of dump tool shows the Loader Symbol Table Information, which would be used by the dynamic linker. In this case, all the symbols in a.C are exported. But from the perspective of a library writer, we may want to export only func1 for this case. Global symbol myintvar and function func0 are deemed as keeping/changing internal status only, or say just locally used. Thus making them invisible is important for the library writer.

We may at least have three ways to achieve this goal. This include: Using the static keyword, defining the GNU visibility attribute, and using an export list. Each of them has unique functionality and (may be) drawbacks as well. We shall look into them now.

1. Using the static keyword

The static keyword in C/C++ may be an overloaded keyword as it can specify both the scope and the storage for the variable. For scope, we may say that it disables the external linkage for the symbol in file. That means that the symbol with the keyword, static would never be linkable as the compiler does not leave any information for the linker about this symbol. It is a language-level control and it is the simplest way to hide the symbol.

Let us add the static keyword to the above case:

Listing 2. b.C


static int myintvar = 5;

static int func0 () {
  return ++myintvar;
}

int func1 (int i) {
  return func0() ∗ i;
}

When we generate the shared library and look in to the Loader Symbol Table Information again, it works as expected:

$ xlC ‑qpic a.C ‑qmkshrobj ‑o libtest.a
$ dump ‑Tv libtest.a

                        ∗∗∗Loader Symbol Table Information∗∗∗
[Index]      Value      Scn     IMEX Sclass   Type           IMPid Name

[0]     0x20000284    .data      EXP     DS SECdef        [noIMid] func1__Fi

Now, only func1 is exported as the information shows. However, though the static keyword can hide the symbol, it also defines an extra rule that variables or functions can only be used within the file scope where it is defined. Thus, if we define:

extern int myintvar;

Later, in file b.C you may want to build libtest.a from both a.o and b.o. When you do so, the linker would display an error message stating that myintvar defined in b.C cannot be linked, because the linker did not find a definition elsewhere. That breaks the data/code sharing inside the same module, which the programmer would generally require. Thus, it is more used as a visibility control of variables/functions inside the file, rather than for visibility control of low-level symbols. In fact, most of them would not rely on the static keyword to control symbol visibility. Therefore, we can consider the second method:

2. Defining the visibility attribute (GNU only)

The next candidate to control symbol visibility is to use the visibility attribute. The ELF application binary interface (ABI) defines the visibility of symbols. Generally, it defines four classes, but in most cases, only two of them are more commonly used:

STV_DEFAULT - Symbols defined with it will be exported. In other words, it declares that symbols are visible everywhere.
STV_HIDDEN - Symbols defined with it will not be exported and cannot be used from other objects.

Notice that this is an extension for GNU C/C++ only. Thus currently, PowerLinux customers can use it as GNU attribute for symbols. Here is an example for our case:

int myintvar attribute ((visibility ("hidden")));
int attribute ((visibility ("hidden"))) func0 () {
  return ++myintvar;
}
...

To define a GNU attribute, you need to include __attribute__ and the parenthesized (double parenthesis) content. You can specify the visibility of symbols as visibility(“hidden”). In the above case, we can mark myintvar and func0 as hidden visibility. This doesn not allow them to get exported in the library, but can be shared among source files. In fact, the hidden symbols would not appear in the dynamic symbol table, but is left in the symbol table for static linking purpose. That is a well-defined behavior and can definitely achieve our goal. It obviously surpasses the static keyword solution.

Notice that, for the variable specified with the visibility attribute, declaring it as static might confuse the compliler. As a result, the compiler would display a warning message.

The ELF ABI also defines other visibility modes:

STV_PROTECTED: The symbol is visible outside the current executable or shared object, but it may not be overridden. In other words, if a protected symbol in a shared library is referenced by an other code in the shared library, the other code will always reference the symbol in the shared library, even if the executable defines a symbol with the same name.
STV_INTERNAL: The symbol is not accessible outside the current executable or shared library.

Notice that currently, this method is not supported by the XL C/C++ compiler, yet even on the PowerLinux platform. But still, we have other way out.

3. Using the export list

The above two solutions can take effect at the source-code level and only require the compiler to make the functionality achieved. However, it is essential for users to have the ability to tell the linker to perform similar work as symbol visibility gets involved mainly in dynamic linking. The solution for the linker is the export list.

The export list would be generated by the compiler (or related tools, such as CreateExportlist) automatically at the time of creating the shared library. It can also be written by the developer manually. An export list is passed into and treated as input for the linker by the linker option. However, as the compiler driver would do all trivial work, the programmer seldom takes much care of very detailed options.

The idea of the export list is to explicitly instruct the linker about the symbols that can be exported from the object files through an external file. GNU people named such an external file as “export map”. We can write an export map for our case:

{
global: func1;
local: ∗;
};

The above description tells the linker that only the func1 symbol is going to be exported, and other symbols (matched by ) are local. The programmer can also explicitly list func0 or myintvar as local symbols (local:func0;myintvar;). But obviously, catch-all () is more convenient. And generally speaking, using the catch-all(*) case to mark all the symbols as locals and only picking out the ones that need to be exported is highly recommended because it is safer. It avoids users forgetting to keep some symbols local and also avoids duplication in both lists, which may cause an unexpected behavior.

To generate a DSO with this method, the programmer has to pass the export map file with the --version-script linker option:

$ gcc ‑shared ‑o libtest.so a.C ‑fPIC ‑Wl,‑‑version‑script=exportmap

Reading the ELF object file with the readelf binary ultility together with the -s option: readelf -s mylib.so

It would show that only func1 is globally visible for this module (entries in section .dynsym), and other symbols are hidden as local.

For the IBM AIX OS linker, a similar export list is provided. To be exact, the export list is called the export file on AIX.

Writing an export file is simple. The programmer just needs to put the symbols that are needed to be exported into the export file. In our case, it is just as simple as shown below:

func1__Fi  // symbol name

Thus, when we specify the export file with a linker option, the only symbol we want to export is added into the “loader symbol table” for XCOFF, while the others are kept as un-exported.

And for AIX 6.1 and above version, programmer may even append a visibility attribute to describe the visibility of symbols in the export file. The AIX linker now accepts 4 of such visibility attribute types:

export: Symbol is exported with the global export attribute.
hidden: Symbol is not exported.
protected: Symbol is exported but cannot be rebound (preempted), even if runtime linking is being used.
internal: Symbol is not exported. The address of the symbol must not be provided to other programs or shared objects, but the linker does not verify this.

The distinctions between export and hidden are obvious. However, the distinctions between exported and protected are subtle. We will continue to talk about symbol preemption in the next section with better description.

Anyway, the above four keywords are available in the export file. By appending them (with a blank) to the tail of symbol, it will provide different granularity controlling of symbol visibility. In this case, we can also specify symbol visibility (on AIX 6.1 and later versions) as shown below:

func1Fi export
func0Fv hidden
myintvar hidden

This informs the linker that only func1__Fi(that is,func1) will be exported, and others will not be exported.

You may notice that, unlike the GNU export map, the symbols listed in the export file are all mangled names. Mangled names do not look so friendly because the programmer may not be aware of the rule of mangling. But, it does help the linker to quickly do name resolution. To close this gap, the AIX OS chooses to utilize a tool to help programmer.

To be short, if the programmer specifies the -qmkshrobj option while invoking the XL C/C++ compiler, the compiler driver invokes the CreateExportList tool to generate the export file that holds the names of the mangled symbols automatically, after the compiler successfully generates the object file. The compiler driver then passes the export file to the linker to process the symbol visibility setting. Considering this example, if we invoke:

$ xlC ‑qpic a.C ‑qmkshrobj ‑o libtest.a

The libtest.a library is generated with all the symbols exported (this is default). Though it does not achieve our goal, at least the whole process looks transparent to the programmer. And, the programmer can also choose to use the CreateExportList utility to generate the export file instead. If you choose this way, you are now able to modify the export file manually. For example, suppose the export file name you want is exportfile, then qexpfile=exportfile is the option you need to pass to the XL C/C++ compiler driver.

$ xlC ‑qmkshrobj ‑o libtest.a a.o ‑qexpfile=exportfile

In this case, you can find out all the symbols as shown below:

func0Fv
func1Fi
myintvar

Based on our requirement, we can either simply remove lines with the myintvar, func0, or append the hidden visibility keyword after them, and then save the export file and use the linker option -bE:exportfile to pass the refined export file back.

$ xlC ‑qmkshrobj ‑o libtest.a a.o ‑bE:exportfile

That would finalize all the steps. Now the generated DSO will not have func1__Fi(that is,func1) exported:

$ dump ‑Tv libtest.a

                        ∗∗∗Loader Symbol Table Information∗∗∗
[Index]      Value      Scn     IMEX Sclass   Type           IMPid Name

[0]     0x20000284    .data      EXP     DS SECdef        [noIMid] func1__Fi

Alternatively, the programmer can also use the CreateExportList utility to explicitly generate the export file as shown below:

$ CreateExportList exportfile a.o

In our case, it works exactly as the one above.

For the new format on AIX 6.1 and later versions, appending the keyword for symbol visibility one by one might require more effort. However, the XL C/C++ compiler is planning to make some changes to make life easier for programmer. (Related information will be provided in the next part in this series.)

In the export list solution, all the information is kept in the export list and programmers do not need to change the source file. It separates the work of code development and library development. However, we might face an issue with such a process. As we keep the source file unmodified, the binary code compiler generated might not be optimal. The compiler misses the chance to optimize symbols that are not exported due to lack of information. It would either increase the binary size generated or slow down the process of symbol resolution. However, this is not a major issue for most of the applications.

The following table compares all the above solutions and makes the view centralized.

Table 2. Comparison of each solution

Solution	Advantage	Disadvantage
static keyword	Simple Language-level support	The static keyword restrict that the variable or function can only be used in the file scope where it defined
export list	Eliminate the restriction on the static keyword No more code is needed Can associate version information Have different granularity controlling of symbol visibility (AIX)	Need extra effort to modify the export file Need basic knowledge of mangling of symbol Lack of optimization information
Specify visibility attribute	Eliminate the restriction on the static keyword More visibility choices (four) to control export symbols	Write more at coding stage to set symbols' visibility

Symbol preemption

As we mentioned above, there is a subtle distinction between the visibility keywords export and protected. And the subtle distinction is about symbol preemption. Symbol preemption occurs when the symbol address resolved at link time is replaced with another symbol address resolved at runtime (notice that runtime linking is optional on AIX though). Conceptually, runtime linking would resolve undefined and non-deferred symbols in shared modules after the program execution has begun. It is a mechanism for providing runtime definitions (these function definitions are not available at link time) and symbol rebinding capabilities. On AIX, when the main program is linked with the -brtl flag or when preloaded libraries are specified with the LDR_CNTRL environment variable, the program is able to use the runtime linking facility. Compiling with -brtl adds a reference to the dynamic linker to the program, which will be called by the program's startup code (/lib/crt0.o) when the program begins to run. Shared object input files are listed as dependents in the program loader section in the same order as they are specified in the command line. When the program begins to run, the system loader loads these shared objects so that their definitions are available to the dynamic linker.

Thus, the functionality of redefining the items in shared objects at runtime is called symbol preemption. Symbol preemption is only possible on AIX when runtime linking is used. Imports bound to a module at link time can be rebound to another module at runtime. Whether a local definition can be preempted by an imported instance depends on the way the module was linked. However, a non-exported symbol can never be preempted at runtime. When the runtime loader loads a component, all the symbols within the component that have the default visibility are subject to preemption by symbols of the same name in components that are already loaded. Note that because the main program image is always loaded first, none of the symbols defined by it will be preempted (redefined).

A protected symbol is exported, but it is not preemptible. In contrast, an exported symbol is exported and can be preempted (if runtime linking is used).

For default symbols, there is a difference between Linux® and AIX. The GNU compilers and ELF file format define a default visibility, which is used for symbols that are exported and preemptible. This is similar to the exported visibility defined on AIX.

The following code takes the AIX platform as an example.

Listing 3. func.C


#include <stdio.h>
void func_DEFAULT(){
        printf("func_DEFAULT in the shared library, Not preempted\n");
}

void func_PROC(){
        printf("func_PROC in the shared library, Not preempted\n");
}

Listing 4. invoke.C


extern void func_DEFAULT();
extern void func_PROC();

void invoke(){
        func_DEFAULT();
        func_PROC();
}

Listing 5. main.C


#include <stdio.h>

extern void func_DEFAULT();
extern void func_PROC();
extern void invoke();

int main(){
        invoke();
        return 0;
}

void func_DEFAULT(){
        printf("func_DEFAULT redefined in main program, Preempted ==> EXP\n");
}

void func_PROC(){
        printf("func_PROC redefined in main program, Preempted ==> EXP\n");
}

In the above description, we defined func_DEFAULT and func_PROC both in func.C and main.C. They have the same names but with different behaviors. A function invoke from invoke.C will call func_DEFAULT and func_PROC in sequence. We will use the following exportlist code to see if symbols are exported and how they are exported.

Listing 6. exportlist


func_DEFAULTFv export
func_PROCFv protected
invoke__Fv

If you are using the linker version before to AIX 6.1, you may use a blank space instead of export, and the symbolic keyword instead of the protected keyword. The command for building the libtest.so library and the main executable are listed in the following code:

/∗ generate position‑independent code suitable for use in shared libraries. ∗/
$ xlC ‑c func.C invoke.C ‑qpic

/∗ generate shared library, exportlist is used to control symbol visibility ∗/
$ xlC ‑G ‑o libtest.so func.o invoke.o ‑bE:exportlist

$ xlC ‑c main.C

/∗ ‑brtl enable runtime linkage. ∗/
$ xlC main.o ‑L. ‑ltest ‑brtl ‑bexpall ‑o main

Basically, we construct libtest.so from func.o and invoke.o. We use exportlist to set func_DEFAULT from func.C and func_PROC from func.C as exported symbols, but still protected. Thus libtest.so has two exported symbols and one protected symbol. For the main program, we export all the symbols from main.C, but link it to libtest.so. Notice that we use the -brtl flag to enable dynamic linking for libtest.so.

The next step is to invoke the main program.

$ ./main
func_DEFAULT redefined in main program, Preempted ==> EXP
func_PROC in the shared library, Not preempted

Here we see something interesting: func_DEFAULT is the version from main.C, while func_PROC is the version from libtest.so (func.C). The func_DEFAULT symbol is preempted because the local version (we say it is local because the calling function invoke is from invoke.C, which is basically in the same module with func_DEFAULT from func.C) from libtest.so is replaced by the one from another module. However, same condition does happen on func_PROC, which is specified as protected visibility in the export file.

Notice that the symbol that can preempt others should always be exported. Suppose we remove the -bexpall option while building the executable main, the output is as shown below:

$ xlC main.o ‑L. ‑ltest ‑brtl ‑o main; //‑brtl enable runtime linkage.
$ ./main
func_DEFAULT in the shared library, Not preempted
func_PROC in the shared library, Not preempted

Here no preemption happens. All the symbols are kept as same version in module.

In fact, to check if a symbol is exported or even protected at runtime, we can make use of the dump utility:

$ dump ‑TRv libtest.so
libtest.so:

                        ∗∗∗Loader Section∗∗∗

                        ∗∗∗Loader Symbol Table Information∗∗∗
[Index]      Value      Scn     IMEX Sclass   Type           IMPid Name

[0]     0x00000000    undef      IMP     DS EXTref   libc.a(shr.o) printf
[1]     0x2000040c    .data      EXP     DS SECdef        [noIMid] func_DEFAULTFv
[2]     0x20000418    .data      EXP     DS SECdef        [noIMid] func_PROCFv
[3]     0x20000424    .data      EXP     DS SECdef        [noIMid] invokeFv

                        ∗∗∗Relocation Information∗∗∗
             Vaddr      Symndx      Type      Relsect    Name
        0x2000040c  0x00000000   Pos_Rel      0x0002     .text
        0x20000410  0x00000001   Pos_Rel      0x0002     .data
        0x20000418  0x00000000   Pos_Rel      0x0002     .text
        0x2000041c  0x00000001   Pos_Rel      0x0002     .data
        0x20000424  0x00000000   Pos_Rel      0x0002     .text
        0x20000428  0x00000001   Pos_Rel      0x0002     .data
        0x20000430  0x00000000   Pos_Rel      0x0002     .text
        0x20000434  0x00000003   Pos_Rel      0x0002     printf
        0x20000438  0x00000004   Pos_Rel      0x0002     func_DEFAULTFv
        0x2000043c  0x00000006   Pos_Rel      0x0002     invoke__Fv

This is the output from libtest.so. We may find that func_DEFAULT__Fv and func_PROC__Fv are all exported. However, func_PROC__Fv does not have any relocations. It means that the loader may not be able to find a way to replace the address of func_PROC from TOC table. And the address of func_PROC in TOC table is where the function invokes transfer control to. Therefore, func_PROC does not appear to be preempted. We then realize that it is protected.

Symbol preemption is rarely used in fact. However, it leaves a possibility that people replace the symbol dynamically at run time but also leave some security holes. If you do not want key symbols in your library to be preempted (but still need to export it for use), you need to make it protected for safety.

Acknowledgments

We would like to thank Dr. Jinsong Ji for reviewing and providing valuble suggestions on this article.

Part 1 - Introduction to symbol visibility

Part 1 - Introduction to symbol visibility