I have a data set that has 0-1000ish patients each with a unique identifier and each have a bunch of different entries. I am very new to pandas, and python for that matter, I an wondering how could you go about using pandas to make sure there are no duplicate patient IDs. I cant use .groupby as this will miss any duplicates by just shoving them all into one big group. Could one use is.unique? but my thought is because there are multiple entries for each one that wont work either as there are hundreds of duplicates.
another feature of the data is timestep, when when timestep = 0 this indicates a new patient and new patient timeline.
so basically I am trying to use pandas to go through the data and find if there are anycases where the patient ID has been used before and the timestep is also equal to zero as this would indicate a mistake in the patient id allocation and thus a duplicate patient.
Im just not sure how to implement pandas in the scenario any advice would be so appreciated !!!
The data is sorted into ascending patient IDs, so I created a loop that essentially returns false if it finds an ID that is not greater than or equal to the prior ID! A really bad way of doing this I know, but I actually made a bunch of test functions and it worked pretty well. But yea I feel like this is bad coding and I should learn how to do it properly