Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12201

Merging two Pandas dataframes without a primary keys but using latest dates instead

$
0
0

I have two pandas dataframes that looks like:

df1 records the students and their mock exam score and the mock exam date:

ID             Mock_Date      Student_ID        Mock_score1              14/3/2020      792               213     2              9/5/2020       792               437     3              17/8/2020      792               435     4              4/1/2022       14598             112312     5              29/12/2022     14350             4325     6              3/10/2019      621               523     7              12/8/2020      621               876     8              5/5/2022       621               4324 9              6/9/2022       621               5432 10             6/3/2022       455               34     

df2 records the students and their actual exam score and the exam date:

Student_ID  Date        Score324         14/2/2019   543792         14/2/2019   9785792         3/11/2019   7690621         3/11/2019   32412          16/3/2020   34234792         16/3/2020   423514598       16/3/2020   975792         9/5/2020    427792         17/8/2020   876621         17/8/2020   986

And I want to merge df1 with df2 using the following logic: for a particular row in df2 (the actual exam score of a particular student), use the row from df1 with mock exam date just before the actual exam date (i.e. the closest date before the actual exam date), and if it doesn't exist, then put NaN. So the desired output looks like:

Student_ID  Date        Score   Mock_Date    Mock_score324         14/2/2019   543     NaN          NaN792         14/2/2019   9785    NaN          NaN792         3/11/2019   7690    NaN          NaN621         3/11/2019   324     3/10/2019    523   #last occurrence before 3/11 is 3/1012          16/3/2020   34234   NaN          NaN792         16/3/2020   4235    14/3/2020    213   #last occurrence before 16/3 is 14/314598       16/3/2020   975     NaN          NaN792         9/5/2020    427     14/3/2020    213   #last occurrence before 9/5 is 14/3792         17/8/2020   876     9/5/2020     437   #last occurrence before 17/8 is 9/5621         17/8/2020   986     12/8/2020    876

I have no idea how to start even, thanks in advance.


Viewing all articles
Browse latest Browse all 12201

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>