With a dataframe that has a datetime index, I am used to getting data for quarters with the syntax eg. df.loc["2014-Q1"]
to grab the data for the first quarter of 2014 (Jan, Feb, Mar).
This works normally in most cases, but I came upon a bug when used a resampled dataframe. I am unsure of whether this is expected behaviour from pandas or if there is a corner case bug here.
I am using pandas 2.1.1 in python 3.12.
The initial following code produces expected results, for example:
df = pd.DataFrame(index=pd.date_range(start="2014-01-01", end="2023-01-01", freq="M"))df.loc["2014-Q1"]
does return the expected dataframe (empty, with indices in the first quarter of 2014), e.g.
Empty DataFrameColumns: []Index: [2014-01-31 00:00:00, 2014-02-28 00:00:00, 2014-03-31 00:00:00]
However, if I try resampling then I get an unexpected behaviour.
The following throws an error,
df.resample("QS").sum().loc["2014-Q1"]
it tells me essentially that it can't find the key value.
File ~/anaconda3/envs/py3/lib/python3.12/site-packages/pandas/core/indexes/datetimes.py:613, in DatetimeIndex.get_loc(self, key) 611 return self._partial_date_slice(reso, parsed) 612 except KeyError as err:--> 613 raise KeyError(key) from err 615 key = parsed 617 elif isinstance(key, dt.timedelta): 618 # GH#20464KeyError: '2014-Q1'
When I started digging into this, I found that doing df.loc[f"{year}-Q{quarter}"]
could in fact search for data on the previous year. Because my dataframe doesn't have indices for 2013, it won't show it.
Using the same minimal example, I tried
df.resample("QS").sum().loc["2015-Q1"]
and the data it returns is for 2014!
Empty DataFrameColumns: []Index: [2014-01-01 00:00:00]
Is this normal behaviour after the resampling, or is it a bug in pandas?