Why GCC generates a mov of the array's beginning on every loop iteration to access array using []? (-O3, x86)

Description

I created a sample to study the TLB access/misses statistics. Sample writes 1 to every 4096-th element of the array. Array has 10'000 * 4096 bytes. I expect to see 10'000 TLB stores only, but generated assembly loads beginning of the array every iteration, resulting in 10'000 TLB loads in addition to stores. -O3 optimization is applied

When I looked into the assembly, I noticed that the for-loop looks like that:

move beginning of the array to the register
set to 1 a shifted beginning of the array
increase index
jump to step 1

Question: Why step 1 is executed every single iteration? Beginning of the array is not changing. I expect the beginning to be loaded once and the jump to be to step 2

C code

(main just calls this test_function 10K times):

#define PAGESIZE 4096#define PAGES 10000char *data = (char *) malloc(PAGES * PAGESIZE);inline void test_function(){    for (int i = 0; (i < PAGES * PAGESIZE); i += (PAGESIZE)) {        data[i] = 1;    }}

Generated assembly with gcc and -O3

    1070:       mov    rdx,QWORD PTR [rip+0x2fa1]        # 4018 <data>    1077:       mov    BYTE PTR [rdx+rax*1],0x1    107b:       add    rax,0x1000    1081:       cmp    rax,0x2710000    1087:       jne    1070 <main+0x10>

perf stats for 100'000 repetitions

Per function call we can see:

10K L1 cache loads, 10K L1 cache stores
10K TLB loads, 10K TLB stores
~0 TLB load misses, 10K TLB store missesSo load of the array's beginning is always cached in TLB, but it's still accessed. Why?

        1000184312      L1-dcache-load:u                                              (66.60%)        1001155723      L1-dcache-stores:u                                            (66.63%)        1010296235      dTLB-loads:u                                                  (66.61%)        1000451484      dTLB-stores:u                                                 (66.69%)             42124      dTLB-loads-misses:u       #    0.00% of all dTLB cache accesses  (66.79%)         998312626      dTLB-stores-misses:u                                          (66.68%)

Platform

Intel(R) Core(TM) i7-10610U CPU

Ubuntu 22.04.3 LTS

g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Compilation line: g++ ./tlb.cpp -O3 -g -o gcc.out

Why GCC generates a mov of the array's beginning on every loop iteration to access array using []? (-O3, x86)

Description

C code

Generated assembly with gcc and -O3

perf stats for 100'000 repetitions

Platform

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112