遭遇した問題を探しているパンダの時間列のグループ化diff
Pandas Time Column Grouping Seeking Problems Encountered Diff
例:
df = pd.DataFrame() df['A'] = [1, 1, 2] df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)] df['C'] = df.groupby('A').B.diff() df['C'] = df.C.dt.days
エラー:
Traceback (most recent call last): File 'D:python_virtualenvcommonlibsite-packagespandas-0.20.3-py3.6-win-amd64.eggpandascoreseries.py', line 2820, in _make_dt_accessor return maybe_to_datetimelike(self) File 'D:python_virtualenvcommonlibsite-packagespandas-0.20.3-py3.6-win-amd64.eggpandascoreindexesaccessors.py', line 84, in maybe_to_datetimelike 'datetimelike index'.format(type(data))) TypeError: cannot convert an object of type to a datetimelike index During handling of the above exception, another exception occurred: Traceback (most recent call last): File 'D: / learning /pandas_test/pandas_learn_20190102.py', line 49, in test2() File 'D: / learning /pandas_test/pandas_learn_20190102.py', line 32, in test2 df['C'] = df.C.dt.days File 'D:python_virtualenvcommonlibsite-packagespandas-0.20.3-py3.6-win-amd64.eggpandascoregeneric.py', line 3077, in __getattr__ return object.__getattribute__(self, name) File 'D:python_virtualenvcommonlibsite-packagespandas-0.20.3-py3.6-win-amd64.eggpandascorease.py', line 243, in __get__ return self.construct_accessor(instance) File 'D:python_virtualenvcommonlibsite-packagespandas-0.20.3-py3.6-win-amd64.eggpandascoreseries.py', line 2822, in _make_dt_accessor raise AttributeError('Can only use .dt accessor with datetimelike ' AttributeError: Can only use .dt accessor with datetimelike values
理由:
差分要求パケットは次の結果です。
A B C 0 1 2018-01-02 NaT 1 1 2018-01-03 1 days 00:00:00 2 2 2018-01-03 NaN
タイプ:
A int64 B object C object dtype: object
予想されるタイプは次のとおりです。
A int64 B object C timedelta64[ns] dtype: object
解決する:
元々、オブジェクト列にastype forceを使用しようとしましたが、列はtimedeltaに変わりました
df['C'] = df.C.astype(pd.Timedelta)
このコードは文句を言いませんが、列Cのタイプは変更されず、影響はありません。
最後に、2つのアプローチがあります。
Bは、事前に時間列として定義されています。
df = pd.DataFrame() df['A'] = [1, 1, 2] df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)] df.B = pd.to_datetime(df.B) df['C'] = df.groupby('A').B.diff() df['C'] = df.C.dt.days
型変換を増やす:
df = pd.DataFrame() df['A'] = [1, 1, 2] df['B'] = [datetime.date(2018, 1, 2), datetime.date(2018, 1, 3), datetime.date(2018, 1, 3)] df['C'] = df.groupby('A').B.diff() df['C'] = pd.to_timedelta(df.C, unit='d').dt.days