Straightforward C code takes a long time to run, any methods of optimising it?

I wrote a basic code that utilises a leapfrog integrator (aka kick-drift-kick) to simulate gravitational systems. So far it has worked successfully at simulating orbits in any potential you input.

However, for a proper test-particle simulation I want to increase the particle count to ~1 million, which is where the main issue manifests - it is remarkably slow, approximately 35 minutes for 1 Myr (the timestep at which the simulation progresses is 0.001 Myr, so 1000 steps in a for-loop).

Most of my coding experience is in Python, so my concern is that I missed something key in my C code that slows down the code so significantly. I tested different optimisation methods, like parallelising my time loop so I can integrate more particles at a single time, but that either marginally improved the runtime or non at all.

Below I am pasting the code with all the necessary comments, but I cut down on the code size for this question as some of the potentials I use are very bulky and take up space.I would be thankful for guidance and suggestions, I want to truly understand how C operates and not limit myself to half-baked code, I might have missed a key C coding practice and added artificial barriers for the code to run fast etc.

Some burning questions are:

Are while-loops are slower than for-loops when performing orbit integration?
Are the if-conditions I use to save particle information are slowing down my code?
Is there a way to create a buffer array to dump particle coordinates/velocities into to save at a later date? I am worried that a 20x1,000,000x7 double array would be too massive to be computational efficient.
If there are no particle-particle interactions to compute, would using OpenMP be considered superfluous or can it still speed up the computations?

Edit: I was unaware how compiling can affect the performance of my code, so I will add this information here: I just compile it with gcc-13 with no other options, unless I was testing OpenMP in which case I added -fopenmp.

Here is the code

#include <stdlib.h>#include <math.h>// #include <omp.h> //previous experiments, touched on at the end//Model Parameters//Haloconst double Mh=2e12; //Mass of haloconst double ah=16e3; //Halo paramterconst double ch=0.53980675319058735; //Halo paramter c=15.3, simplified for formula value//t_force parametersconst double Mt_f=2.5*3.43*1e10; //Maximum massconst double R_t=(15*2.5)*1000; // Fixed paramater determining the scale of the potentialconst double mdt=2000; //Mass gain rate (units = Myr)// Units#define pi 3.141592653589793#define G 0.00449857598201285// Sim Setupconst double theta=75*pi/180.; //Inclination of disc, explained further downconst double CCD=1; // Period of file saving, explained further downconst double dt=0.001; //Fixed time step (units = Myr)const double runtime=4e3; //Total runtime (units = Myr)const int timelim=runtime/dt; //Integer limit in for-loopchar name[999]="SIMNAME"; //Simulation snapshot name format//Number of particles#define pno 1000000//Norms of vectors norm()double norm(double x[3])    {      return sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);    }// Function that rotates particles around x-axisvoid rotate(double x[3],double v[3],double theta){    double yd,zd,vy,vz;    yd=x[1]*cos(theta)-x[2]*sin(theta);//    zd=x[1]*sin(theta)+x[2]*cos(theta);//    x[1]=yd;//    x[2]=zd;//    vy=v[1]*cos(theta)-v[2]*sin(theta);    vz=v[1]*sin(theta)+v[2]*cos(theta);    v[1]=vy;    v[2]=vz;}//Mass function of t_force - a time dependant potential (only increasing mass, other parameters stay constant)double Mt(double t)    {        double mass;        if (t<mdt) {mass=Mt_f*t/mdt;}        else {mass=Mt_f;}        return mass;    }//Calculating force for every particle (only potentials, no particle-particle interactions). h_force is a static potential representing an NFW halovoid calculate_force(double x[3], double a[3], double mt_c){    //Normss    double md=norm(x);    //Forces    double t_force[3]={0.,0.,0.};    double h_force;    t_force[0]=...; // ... is indicative of long stretches of code. These lines use the x[3] array and constants defined earlier    t_force[1]=...;    t_force[2]=...;    for (int ind=0;ind<3;ind++)    {        h_force=...;        a[ind]=t_force[ind]+h_force;    }}//Calculating force for every particle, for setting up initial conditionsvoid force(double x[3], double a[3]){    //Norms    double md=norm(x);    //Forces    double h_force;    for (int ind=0;ind<3;ind++)    {        h_force=...;        a[ind]=h_force;    }}// Assigning velocities to particles void initial_cond(double x[3],double v[3], double a[3]){    //Norms    double md=norm(x);    double phi=atan2(x[1],x[0]);    //Forces    force(x,a);    v[0]=sqrt(norm(a)*md)*cos(phi+pi/2);    v[1]=sqrt(norm(a)*md)*sin(phi+pi/2);    v[2]=0;}//'Kick' part of a Leapfrog integratorvoid kick(double x[3], double v[3], double a[3], double dt, double t){     calculate_force(x, a, t);    for (int k=0;k<3;k++)    {        v[k]+=a[k]*dt/2.;    }}//'Drift' part of Leapfrog integratorvoid drift(double x[3], double v[3], double dt){    for (int k=0;k<3;k++)    {         x[k]+=v[k]*dt;    }}// Function to save particles positions/velocities/energy/etc to a filevoid snapshot( char nom[999], int snapshot,char *n1, char *n2, char *n3, double x[pno][3], double v[pno][3],double mt_c){    FILE *fill;    char NAM[999];    sprintf(NAM, "%s_%.12s_%.12s_%.12s.%06d.dat", nom,n1,n2,n3,snapshot);    fill = fopen(NAM,"w");        fprintf(fill,"%.16f\t%.16f\t%.16f\t%.16f\t%.16f\t%.16f\t%.16f\n", 0.0,0.0,0.0,0.0,0.0,0.0,0.0);    for (int j=0;j<pno;j++)            {            fprintf(fill,"%.16f\t%.16f\t%.16f\t%.16f\t%.16f\t%.16f\t%.16f\n", x[j][0],x[j][1],x[j][2],v[j][0],v[j][1],v[j][2],                calculate_erg(x[j],v[j],mt_c));            }    fclose(fill);}/////////////////////Finished defining functions///////////////////////Core of codeint main(int argc, char **argv){    //Setting coordinates and velocities, prior to launch    double t, mt_c;    double mt_c;    int step=1;    // Coordinates, velocities, and accelerations of toy particles    static double x[pno][3];    static double v[pno][3];    static double a[pno][3];    // Loading particle coordinates from .IC file, getting the initial velocities and writing them into the aformentioned arrays    FILE *f1;    char IC[999];    sprintf(IC,"NFW.IC");    char output_H[9];    char output_TM[9];    char output_TR[9];    f1 = fopen(IC,"r");        for (int j=0;j<pno;j++)    {        fscanf(f1,"%lf\t%lf\t%lf\n", &x[j][0],&x[j][1],&x[j][2]);        initial_cond(x[j],v[j],a[j]);        // Rotating the coordiantes and velocities, my test requires a tilted disc of toy particles        rotate(x[j],v[j],pi/2-theta);    }    // Writing down simulation parameters for filenames    snprintf(output_H,9,"%f",log10(Mh));    snprintf(output_TM,9,"%f",log10(Mt_f));    snprintf(output_TR,9,"%f",log10(R_t));    // Setting initial mass of t_force potential to 0, saving the particles as an "initial conditions" file    mt_c=0;    snapshot(name,0,output_H,output_TM,output_TR,x,v,mt_c);    // Main orbit integration    // Prior to this it was a while-loop but from previous work in Python    // I was cautious it would slow computation    for (int k=0;k<timelim;k+=1)    {        // Get current simulation time and t_force mass to use in force calculations        t=k*dt;        mt_c=Mt(t);        // Leapfrog integrator loop        // Prior to this version, I added a "#pragma omp parallel for" but it only slowed the code down        for (int j=0;j<pno;j++)        {            kick(x[j],v[j],a[j],dt,mt_c);            drift(x[j],v[j],dt);            kick(x[j],v[j],a[j],dt,mt_c);        }        // When time meets the condition below - save snapshot.         // I am cautious that this is a bottleneck, but many runs were inconclusive        if (t>CCD*step)        {               snapshot(name,step,output_H,output_TM,output_TR,x,v,mt_c);            step+=1;        }        t+=dt;    }}

Straightforward C code takes a long time to run, any methods of optimising it?

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List