How I think about C99 strict aliasing rules
Recently I was asked how I review C99 code for problems that arise from failing to follow the so called “strict aliasing rules”. I struggled to answer, so I thought I would write a post to hopefully make my explanation more coherent.
The strict aliasing rules can be surprising because the way optimizers take advantage of them doesn’t mesh well with the popular belief that pointers are “just numbers”. Ultimately, I think there are practical benefits to understanding the rules even if you disagree with them. Popular compilers such as GCC and Clang take advantage of the rules so knowing them can help with debugging if for nothing else.
This is just my simplified model for how C compilers make use of the rules, and I don’t claim that it’s 100% correct. However, I have found it useful enough to spot problems in code and when the code looked fine, used it to spot a problem in the compiler.
What are the rules, anyways?
The relevant part in C99 is §6.5p7, but in my head it basically boils down to
“two value accesses are disjoint when the types are different, except when one
of the types is a char
type”. Yes, many subtleties are out the window and it’s
not going to get me into WG14, but I think it’s a useful level of understanding
regardless.
What happens when the optimizer can see that a write is disjoint with respect to a read? It can decide to reorder the program and do the read first if it seems profitable for performance.
Here is a sample where we can see GCC making use of the aliasing rules:
unsigned
reorder(unsigned *foo)
{
*foo = 0;
short *ptr = (short *)foo;
*ptr = 1;
return *foo;
}
Compiling with -O2
gives the following:
reorder:
mov DWORD PTR [rdi], 1
xor eax, eax
ret
The code is using an idiom to zero out eax
, but the gist is that it’s returning zero.
GCC’s output has a different order than our source program; *ptr = 1
seems to
have no effect on the final read from foo
, even though one might understand
foo
and ptr
as having the same address and expect *ptr = 1
to happen
before return *foo
, as ordered in the source code. Adding to the surprise, GCC has combined the two
indirect writes into one, seemingly via the understanding that foo
and ptr
have the same address! There seems to be some strange contradiction.
Compile with -O2 -fno-strict-aliasing
and we get something different:
reorder:
mov DWORD PTR [rdi], 1
mov eax, 1
ret
Ah ha! By default, when GCC gets to use the powers granted to it by the
standard, it can assume that short
writes have no effect on unsigned
reads,
but -fno-strict-aliasing
tells GCC to forget about that part of the standard.
GCC is organized as optimization passes and separate passes don’t necessarily
share information. The strange inconsistency we saw when we compiled with default
options is likely a result of this – the mov
and the xor
are likely coming from
two separate parts of the compiler that don’t share the same understanding of our
program.
The bug reporting guide for GCC has a section
about -fno-strict-aliasing
, perhaps because many people have been surprised
by this optimization:
To disable optimizations based on alias-analysis for faulty legacy code, the option
-fno-strict-aliasing
can be used as a work-around.
Oof. Okay GCC, type-based alias analysis is great and useful, but no need to judge this hard.
Snap back to reality
Let’s go look at a practical example in CRuby where we did not follow the rules. If you’d like to follow along, you can grab this commit and build with the following commands:
$ ./autogen.sh
$ ./configure cflags=-flto LDFLAGS=-flto
$ make -j8 miniruby
I’ll be using GCC 11.2.0 on a GNU/Linux distribution.
This example has to do with an output parameter, where we expect a function to do a write using the out parameter before returning. The call site looks like this:
typedef unsigned long VALUE;
typedef unsigned long ID;
typedef struct st_table st_table;
int rb_id_table_lookup(struct rb_id_table *tbl, ID id, VALUE *valp);
// ^^^^
// out param of interest
void
do_lookup(struct rb_id_table *const_cache, ID id)
{
st_table *ics;
if (rb_id_table_lookup(constant_cache, id, (VALUE *) &ics)) {
// successful lookup
st_foreach(ics, iterator_fn, 0);
}
}
When rb_id_table_lookup()
returns 1, it indicates that it has written through valp
:
//... inside rb_id_table_lookup()
if (index >= 0) {
*valp = tbl->items[index].val;
return TRUE;
}
else {
return FALSE;
}
Let’s focus on the code path where the lookup succeeds and break it down into a sequence of accesses by type:
- write
unsigned long
akaVALUE
throughvalp
- read
st_table *
using theics
local variable
Uh oh, unsigned long
and st_table *
are distinct types, so by the aliasing
rules the compiler is free to assume that the two accesses have no relation. If it decides
to reorder and do the read before the write, that would betray our
intention – we want to make use of the output from the successful lookup so
we always want the write to happen first!
Does GCC tell us anything about this mismatch between our intention and what we wrote? Why yes:
vm_method.c:146:9: warning: ‘ics’ may be used uninitialized [-Wmaybe-uninitialized]
146 | st_foreach(ics, rb_clear_constant_cache_for_id_i, (st_data_t) NULL);
| ^
class.c: In function ‘clear_constant_cache_i’:
vm_method.c:143:15: note: ‘ics’ declared here
143 | st_table *ics;
| ^
That’s a bit of a strange warning to get if you expect accesses to happen in source
code order. I suspect what has happened under the hood is that GCC considered
putting the read before the write and while evaluating that schedule GCC detects that
it reads from an uninitialized variable. I think GCC only sees this read-before-write schedule
when interpreting the aliasing rules strictly because adding -fno-strict-aliasing
makes the warning disappear.
The fix for this issue makes the code write and read through the same type. If you’re in the mood for an exercise, you can imagine what the code change looks like before looking at the patch.
Takeaways
This post tries to build intuition for spotting strict aliasing issues. The analysis I showed involves distilling the program under review into accesses by type, sort of like taking a projection of it. The CRuby example is interprocedural and to get all the requisite information for our analysis we needed to reference two functions in separate files. Similarly, GCC issues a warning about the code only when we build with link time optimization, where it can reason about the two functions in separate translation units together.
Have fun coding in ISO C and be careful casting pointers!