Alias analysis proceeds in 3 main phases:
This phase walks the use-def chains in the SSA web looking for three things:
P_i = &VAR
The concept of `escaping' is the same one used in the Java world. When a pointer or an ADDR_EXPR escapes, it means that it has been exposed outside of the current function. So, assignment to global variables, function arguments and returning a pointer are all escape sites.
This is where we are currently limited. Since not everything is renamed into SSA, we lose track of escape properties when a pointer is stashed inside a field in a structure, for instance. In those cases, we are assuming that the pointer does escape.
We use escape analysis to determine whether a variable is call-clobbered. Simply put, if an ADDR_EXPR escapes, then the variable is call-clobbered. If a pointer P_i escapes, then all the variables pointed-to by P_i (and its memory tag) also escape.
We have two classes of memory tags. Memory tags associated with the pointed-to data type of the pointers in the program. These tags are called “type memory tag” (TMT). The other class are those associated with SSA_NAMEs, called “name memory tag” (NMT). The basic idea is that when adding operands for an INDIRECT_REF *P_i, we will first check whether P_i has a name tag, if it does we use it, because that will have more precise aliasing information. Otherwise, we use the standard type tag.
In this phase, we go through all the pointers we found in points-to analysis and create alias sets for the name memory tags associated with each pointer P_i. If P_i escapes, we mark call-clobbered the variables it points to and its tag.
This pass will compare the alias set of every type memory tag and every addressable variable found in the program. Given a type memory tag TMT and an addressable variable V. If the alias sets of TMT and V conflict (as computed by may_alias_p), then V is marked as an alias tag and added to the alias set of TMT.
For instance, consider the following function:
foo (int i) { int *p, *q, a, b; if (i > 10) p = &a; else q = &b; *p = 3; *q = 5; a = b + 2; return *p; }
After aliasing analysis has finished, the type memory tag for
pointer p
will have two aliases, namely variables a
and
b
.
Every time pointer p
is dereferenced, we want to mark the
operation as a potential reference to a
and b
.
foo (int i) { int *p, a, b; if (i_2 > 10) p_4 = &a; else p_6 = &b; # p_1 = PHI <p_4(1), p_6(2)>; # a_7 = V_MAY_DEF <a_3>; # b_8 = V_MAY_DEF <b_5>; *p_1 = 3; # a_9 = V_MAY_DEF <a_7> # VUSE <b_8> a_9 = b_8 + 2; # VUSE <a_9>; # VUSE <b_8>; return *p_1; }
In certain cases, the list of may aliases for a pointer may grow too large. This may cause an explosion in the number of virtual operands inserted in the code. Resulting in increased memory consumption and compilation time.
When the number of virtual operands needed to represent aliased loads and stores grows too large (configurable with --param max-aliased-vops), alias sets are grouped to avoid severe compile-time slow downs and memory consumption. The alias grouping heuristic proceeds as follows:
may-aliases(T) = { V1, V2, V3, ..., Vn }
This means that every statement that references T, will get
n
virtual operands for each of the Vi tags. But, when
alias grouping is enabled, we make T an alias tag and add it
to the alias set of all the Vi variables:
may-aliases(V1) = { T } may-aliases(V2) = { T } ... may-aliases(Vn) = { T }
This has two effects: (a) statements referencing T will only get a single virtual operand, and, (b) all the variables Vi will now appear to alias each other. So, we lose alias precision to improve compile time. But, in theory, a program with such a high level of aliasing should not be very optimizable in the first place.
may-aliases(T) = { V1, V2, V3 } may-aliases(R) = { V2, V4 }
In step (2) we would have reverted the aliases for T as:
may-aliases(V1) = { T } may-aliases(V2) = { T } may-aliases(V3) = { T }
But note that now V2 is no longer aliased with R. We could add R to may-aliases(V2), but we are in the process of grouping aliases to reduce virtual operands so what we do is add V4 to the grouping to obtain:
may-aliases(V1) = { T } may-aliases(V2) = { T } may-aliases(V3) = { T } may-aliases(V4) = { T }