Introduction
The primary question I am asking here is, "Why does System::arraycopy
and feel so slow?". Array::copyOf
is also mentioned here since as will be noted in the background - my version of Java uses System::arraycopy in its implementation.
Background
I was developing a fast method for creating deepcopies of an array in Java. I did some tests to confirm that Array:copyOf(array,length)
and Object::clone()
were equivalent in speed (https://stackoverflow.com/a/15962949/21941522). I after a bit of research I realized my (Eclipse Adoptium jdk-21.0.1.12-hotspot's) of implementation of Array::copy
was basically just making a fancy call to System::arraycopy
thus I started using that instead. After about 1.5 hours of coding and testing I ended up with this (NOTE: Lines 17-19 it will become very important later):
class ArrayDeepCopy { public static Object deepCopy(Object array) { if (array == null) { return null; } if (!array.getClass().isArray()) { throw new IllegalArgumentException("Argument is not an array"); } return deepCopyV5(array); } static Object deepCopyV5(Object array){ final Class<?> componentClazz = array.getClass().componentType(); final int length = Array.getLength(array); if (!componentClazz.isArray()){ Object ret = Array.newInstance(componentClazz,length); //Line 17 System.arraycopy(array, 0, ret, 0, length); //Line 18 return ret; //Line 19 } else{ Object[] retObjectArray = (Object[]) Array.newInstance(componentClazz, length); Object[] oObjectArray = (Object[]) array; for (int i = 0; i < length; i++) { retObjectArray[i] = deepCopy3(oObjectArray[i]); } return retObjectArray; } }}
Out of curiousity I wrote the following helper method - as an alternative meeans of shallow copying arrays:
/** * @apiNote this function has no type or legnth safety. Use with discretion and care. */ private static Object dumbCopy(Object array, Class<?> componentType, int length){ switch (componentType.getName()) { case "boolean": boolean[] retBooleanArray = new boolean[length]; boolean[] oBooleanArray = (boolean[]) array; for (int i = 0; i < length; i++) retBooleanArray[i] = oBooleanArray[i]; return retBooleanArray; case "byte": byte[] retByteArray = new byte[length]; byte[] oByteArray = (byte[]) array; for (int i = 0; i < length; i++) retByteArray[i] = oByteArray[i]; return retByteArray; case "short": short[] retShortArray = new short[length]; short[] oShortArray = (short[]) array; for (int i = 0; i < length; i++) retShortArray[i] = oShortArray[i]; return retShortArray; case "int": int[] retIntegerArray = new int[length]; int[] oIntegerArray = (int[]) array; for (int i = 0; i < length; i++) retIntegerArray[i] = oIntegerArray[i]; return retIntegerArray; case "long": long[] retLongArray = new long[length]; long[] oLongArray = (long[]) array; for (int i = 0; i < length; i++) retLongArray[i] = oLongArray[i]; return retLongArray; default: Object[] retObjectArray = (Object[]) Array.newInstance(componentType, length); Object[] oObjectArray = (Object[]) array; for (int i = 0; i < length; i++) retObjectArray[i] = oObjectArray[i]; return retObjectArray; } }
I then replaced Lines 17-19 with return dumbCopy(array,componentType, length);
.
Strange Findings
This change, resulted performance (speed of my algorithm) imporving by a factor of 1.8 times. Thus, I decided to do some tests some tests using a type safe vesions of dumbCopy against Array::copyOf
; even with all the added checks I still got slightly superior performance.
However, this comparison was somewhat biased and unfair since Array::copyOf
and what I was using were different. Hence, the most logically step was meant implementing my own version of System::arraycopy
.
Stranger Findings
Hence, why I carefully created a method that closely mimicked System::arraycopy
's behaviours and error handling. This yielded the following:
class MyArrayCopy{ /** * Blind implementation of {@link System::copyarray} * * @param src the source array. * @param srcPos starting position in the source array. * @param dest the destination array. * @param destPos starting position in the destination data. * @param length the number of array elements to be copied. * @throws IndexOutOfBoundsException if copying would cause * access of data outside array bounds. * @throws ArrayStoreException if an element in the {@code src} * array could not be stored into the {@code dest} array * because of a type mismatch. * @throws NullPointerException if either {@code src} or * {@code dest} is {@code null}. */ public static void arraycopy(Object src, int srcPos, Object dest,int destPos, int length){ if (src==null || dest==null) throw new NullPointerException(); if (!src.getClass().isArray()) throw new ArrayStoreException("arraycopy: destination type "+src.getClass().getName()+" is not an array"); if (!dest.getClass().isArray()) throw new ArrayStoreException("arraycopy: source type "+dest.getClass().getName()+" is not an array"); if (length<0) throw new ArrayIndexOutOfBoundsException("arraycopy: length "+length+" is negative"); final int srcLength = Array.getLength(src); final int destLength = Array.getLength(dest); String tmp; if (srcPos<0){ tmp = dest.getClass().componentType().getName(); tmp = tmp.substring(0,tmp.length()-1)+srcLength+"]"; throw new ArrayIndexOutOfBoundsException("arraycopy: last source index "+srcPos+" out of bounds for "+tmp); } if ((srcPos+length)>srcLength){ tmp = dest.getClass().componentType().getName(); tmp = tmp.substring(0,tmp.length()-1)+srcLength+"]"; throw new ArrayIndexOutOfBoundsException("arraycopy: last source index "+(srcPos+length)+" out of bounds for "+tmp); } if (destPos<0){ tmp = dest.getClass().componentType().getName(); tmp = tmp.substring(0,tmp.length()-1)+destLength+"]"; throw new ArrayIndexOutOfBoundsException("arraycopy: last destination index "+destPos+" out of bounds for "+tmp); } if ((destPos+length)>destLength){ tmp = dest.getClass().componentType().getName(); tmp = tmp.substring(0,tmp.length()-1)+destLength+"]"; throw new ArrayIndexOutOfBoundsException("arraycopy: last destination index "+(destPos+length)+" out of bounds for "+tmp); } Class<?> componentType = src.getClass().componentType(); int i = -1; try { switch (componentType.getName()) { case "boolean": boolean[] retBooleanArray = (boolean[]) dest; boolean[] oBooleanArray = (boolean[]) src; while (++i < length) retBooleanArray[destPos + i] = oBooleanArray[srcPos + i]; return; case "byte": byte[] retByteArray = (byte[]) dest; byte[] oByteArray = (byte[]) src; while (++i < length) retByteArray[destPos + i] = oByteArray[srcPos + i]; return; case "short": short[] retShortArray = (short[]) dest; short[] oShortArray = (short[]) src; while (++i < length) retShortArray[destPos + i] = oShortArray[srcPos + i]; return; case "int": int[] retIntegerArray = (int[]) dest; int[] oIntegerArray = (int[]) src; while (++i < length) retIntegerArray[destPos + i] = oIntegerArray[srcPos + i]; return; case "long": long[] retLongArray = (long[]) dest; long[] oLongArray = (long[]) src; while (++i < length) retLongArray[destPos + i] = oLongArray[srcPos + i]; return; default: Object[] retObjectArray = (Object[]) dest; Object[] oObjectArray = (Object[]) src; while (++i < length) retObjectArray[destPos + i] = oObjectArray[srcPos + i]; } } catch (ClassCastException e) { throw new ArrayStoreException("arraycopy: type mismatch: can not copy " +componentType+"[] into "+ dest.getClass().getComponentType().getName()+"[]"); } }}
After repeated testing on my machine I found that there is only a less than 5% variance in the measured performance between System::arraycopy
and MyArrayCopy::arraycopy
(for fastest,slowest and mean times), which I found to be surprising as System::arraycopy
is an @IntrinsicCandidate
with no body - indicating it is programmed in hand-written Assembly.
This leads me to my core question: Why is System::arraycopy
so slow?. From my understanding, a complex method programmed in low level machine code should (in theory) be significantly faster than my high level code. I have neither the technically knowledge nor the proficiency in Assembly to understand why System::arraycopy
and MyArrayCopy::arraycopy
perform comparably. This does not make any sense to me.
Benchmarking
Here is the (JMH) micro-benchmark I used:
/** * Based on <a href="https://www.baeldung.com/java-system-arraycopy-arrays-copyof-performance">...</a>. */@BenchmarkMode(Mode.AverageTime)@State(Scope.Thread)@OutputTimeUnit(TimeUnit.NANOSECONDS)@Warmup(iterations = 8,time = 1)@Fork(1)@Measurement(iterations = 32,time = 1)public class Main{ //small array, medium size array, large array @Param({"16", "1048576","268435456"}) public int SIZE; private int[] src; private final static Random RNG = new Random(); @Setup public void setup() { src = new int[SIZE]; for (int i = 0; i < SIZE; i++) { src[i] = RNG.nextInt(); } } @Benchmark public int[] systemArrayCopyBenchmark() { int[] target = new int[SIZE]; System.arraycopy(src, 0, target, 0, SIZE); return target; } @Benchmark public int[] arraysCopyOfBenchmark() { return Arrays.copyOf(src, SIZE); } @Benchmark public int[] myArrayCopyBenchmark(){ int[] target = new int[SIZE]; MyArrayCopy.arraycopy(src, 0, target, 0, SIZE); return target; } public static void main(String[] args) throws Exception { org.openjdk.jmh.Main.main(args); }}
I used JMH version: 1.37, and VM version: JDK 21.0.1, OpenJDK 64-Bit Server VM, 21.0.1+12-LTS
These are the results:
Benchmark (SIZE) Mode Cnt Score Error UnitsMain.arraysCopyOfBenchmark 16 avgt 32 13,098 ± 0,613 ns/opMain.arraysCopyOfBenchmark 1048576 avgt 32 512901,199 ± 18530,241 ns/opMain.arraysCopyOfBenchmark 268435456 avgt 32 314768066,667 ± 12220991,158 ns/opMain.myArrayCopyBenchmark 16 avgt 32 19,256 ± 1,239 ns/opMain.myArrayCopyBenchmark 1048576 avgt 32 1046029,690 ± 63247,568 ns/opMain.myArrayCopyBenchmark 268435456 avgt 32 295916640,625 ± 8129828,629 ns/opMain.systemArrayCopyBenchmark 16 avgt 32 14,097 ± 0,856 ns/opMain.systemArrayCopyBenchmark 1048576 avgt 32 606517,109 ± 75605,950 ns/opMain.systemArrayCopyBenchmark 268435456 avgt 32 259920011,250 ± 8760026,025 ns/op
Edits:
- corrected "digression" to "discretion", added Benchmarking section, edited language used to be more concise and less opinionated/emotional, removed sub-question on asking for suggestions on how to improve my deepcopy algorithm, updated benchmarking.