I have a pandas dataset of various features, including datetime feature.It looks like this:
DD SSCL1 SEG_CLASS_CODE FCLCLD PASS_BK SA AU DTD DAY_OF_YEAR0 2018-01-01 C C 1 0 0 18 -1 11 2018-01-01 C C 0 0 7 26 -1 12 2018-01-01 C C 0 0 9 18 -1 13 2018-01-01 C C 1 10 0 18 -1 14 2018-01-01 C C 0 9 1 18 -1 1
I need to use DD
column to train the model. The problem is how to encode this column?
I can`t use Cyclic Feature Encoding, described here:How to handle date variable in machine learning data pre-processingbecause in the field for which I am teaching the model, 2020 is not the same as 2018, and February 2022 is not February 2023. So, years, months and days sometimes differ from each other.
My idea is to somehow transform datetime to int. For example, to get total days or hours or minutes or seconds, but i do not know the starting point (Maybe January 1st, 1970 as usual).The easiest way to use: dataset['DD']).apply(lambda x: x.value)
, so I`ll get something like this:
0 15147648000000000001 15147648000000000002 15147648000000000003 15147648000000000004 1514764800000000000 ... 1450583 15775776000000000001450584 15776640000000000001450585 15776640000000000001450586 15771456000000000001450587 1577232000000000000Name: DD, Length: 1450588, dtype: int64
After that I would like to use MinMaxScaler or Standardscaler.
So, are there any ways to encode datetime according to my requirements?