The main issue in the partitioning approach in the spring batch is integrating or sharing data between the manager(master) instance and the workers (slaves) instance.It means, the manager instance takes partitions among eligible data and sends some info such as partition ID and metadata about partitioned data to workers. On the opposite side, workers start to split steps among themselves. On the other hand, the one step (read, process, write) that can be done by one instance, will be separated and done by some workers. Now, the metadata for starting steps by workers is so important and the way that the metadata is shared is a main concern.Spring batch uses some tables for this issue. I mean, the partition info and other info that we need for completing batch processing, is stored in Execute-Content that persists into related batch tables as a central storage, and workers read these data from tables and use them in step stages.For example, we put partition number, min ID, and max ID into Execute-Content and related table and fetch it during the batch partitioning process.
Now if we want instead of this method, we will use List of Ids, we have to put all IDs of each partition into the Execution-Content object that causes to store in the spring batch step execution content table. Sometimes the IDs might be more than thousands and storing this capacity of data is not feasible.
The second approach is a produce these IDs into a specific topic on the Kafka browser, In this way, we need to develop custom facilities to produce and consume messages and maybe we have to ignore spring batch facilities, but the crucial problem in this way is a capacity of data to put into Kafka
Both approaches are not perfect for me. My question is how can I query one time on data and partition it between partitions and send these data to workers so I don't have to query again in the read item step?Now I pass minId and MaxId into workers and in the reader item step, with these values, I query again on fetch related data. but I think if I can pass a List of IDs for each partition instead of minId and mxiId, it will be perfect, won't it?