I am tackling the problem of semantic segmentation of large resolution metal microstructure images that look similar to this:Microstructure image to be segmentedThe classes used are particle, particle boundary and pore (and 1 or 2 more that are less relevant here).
No architecture has been chosen yet, but the first try will likely be a architecture similar to unet. From the microscope, the dimensions of the images are something like 1200 times 10000 pixels. I would love to hear the opinion of some experienced practitioners on how to best determine the image tile size. Intuitively, I would be hesitant to use very small tile sizes like 256 times 256, since I worry the ability of the model might decrease if the number of uncropped partices (as can be seen in the picture) is too little. On the other hand, I also don't know if large sizes like 1200 times 1200 might be problematic since I see most applications using smaller tiles. Of course, computationally it would be more difficult, but the GPU resources I have should be sufficient for images of that size. What could be a good initial guess for tile sizes?
I am also unsure what operations can be applied to the images once the labeling has been done. Is it generally possible to apply any transformations to the dataset, as long as the same one is applied to the image and the label? I know augmentation is possible, but what about downscaling the images to a lower resolution, for example - would that have to happen before labeling?
And lastly: Right now the images do not have matching dimensions.Raw image dimensions Should they be labeled in this form or should I crop them to e.g. 1200 times 1200 squares before setting up the labeling job for those images?
I would appreciate any help greatly.