Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12111

Why GCC generates a mov of the array's beginning on every loop iteration to access array using []? (-O3, x86)



I created a sample to study the TLB access/misses statistics. Sample writes 1 to every 4096-th element of the array. Array has 10'000 * 4096 bytes. I expect to see 10'000 TLB stores only, but generated assembly loads beginning of the array every iteration, resulting in 10'000 TLB loads in addition to stores. -O3 optimization is applied

When I looked into the assembly, I noticed that the for-loop looks like that:

  1. move beginning of the array to the register
  2. set to 1 a shifted beginning of the array
  3. increase index
  4. jump to step 1

Question: Why step 1 is executed every single iteration? Beginning of the array is not changing. I expect the beginning to be loaded once and the jump to be to step 2

C code

(main just calls this test_function 10K times):

#define PAGESIZE 4096#define PAGES 10000char *data = (char *) malloc(PAGES * PAGESIZE);inline void test_function(){    for (int i = 0; (i < PAGES * PAGESIZE); i += (PAGESIZE)) {        data[i] = 1;    }}

Generated assembly with gcc and -O3

    1070:       mov    rdx,QWORD PTR [rip+0x2fa1]        # 4018 <data>    1077:       mov    BYTE PTR [rdx+rax*1],0x1    107b:       add    rax,0x1000    1081:       cmp    rax,0x2710000    1087:       jne    1070 <main+0x10>

perf stats for 100'000 repetitions

Per function call we can see:

  • 10K L1 cache loads, 10K L1 cache stores
  • 10K TLB loads, 10K TLB stores
  • ~0 TLB load misses, 10K TLB store missesSo load of the array's beginning is always cached in TLB, but it's still accessed. Why?
        1000184312      L1-dcache-load:u                                              (66.60%)        1001155723      L1-dcache-stores:u                                            (66.63%)        1010296235      dTLB-loads:u                                                  (66.61%)        1000451484      dTLB-stores:u                                                 (66.69%)             42124      dTLB-loads-misses:u       #    0.00% of all dTLB cache accesses  (66.79%)         998312626      dTLB-stores-misses:u                                          (66.68%)


Intel(R) Core(TM) i7-10610U CPU

Ubuntu 22.04.3 LTS

g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Compilation line: g++ ./tlb.cpp -O3 -g -o gcc.out

Viewing all articles
Browse latest Browse all 12111

Trending Articles