I have two pandas dataframes that looks like:
df1
records the students and their mock exam score and the mock exam date:
ID Mock_Date Student_ID Mock_score1 14/3/2020 792 213 2 9/5/2020 792 437 3 17/8/2020 792 435 4 4/1/2022 14598 112312 5 29/12/2022 14350 4325 6 3/10/2019 621 523 7 12/8/2020 621 876 8 5/5/2022 621 4324 9 6/9/2022 621 5432 10 6/3/2022 455 34
df2
records the students and their actual exam score and the exam date:
Student_ID Date Score324 14/2/2019 543792 14/2/2019 9785792 3/11/2019 7690621 3/11/2019 32412 16/3/2020 34234792 16/3/2020 423514598 16/3/2020 975792 9/5/2020 427792 17/8/2020 876621 17/8/2020 986
And I want to merge df1
with df2
using the following logic: for a particular row in df2
(the actual exam score of a particular student), use the row from df1
with mock exam date just before the actual exam date (i.e. the closest date before the actual exam date), and if it doesn't exist, then put NaN. So the desired output looks like:
Student_ID Date Score Mock_Date Mock_score324 14/2/2019 543 NaN NaN792 14/2/2019 9785 NaN NaN792 3/11/2019 7690 NaN NaN621 3/11/2019 324 3/10/2019 523 #last occurrence before 3/11 is 3/1012 16/3/2020 34234 NaN NaN792 16/3/2020 4235 14/3/2020 213 #last occurrence before 16/3 is 14/314598 16/3/2020 975 NaN NaN792 9/5/2020 427 14/3/2020 213 #last occurrence before 9/5 is 14/3792 17/8/2020 876 9/5/2020 437 #last occurrence before 17/8 is 9/5621 17/8/2020 986 12/8/2020 876
I have no idea how to start even, thanks in advance.