「Python 実践 データ加工/可視化 100本ノック」を購入したので、
一通りやっているところです。今日はノック50を行います。
資料は以下から入手できます。
Python実践 データ加工/可視化 100本ノック|サポート|秀和システム
ノック50:移動平均を計算して可視化しよう
移動平均のデータを作成します。
from glob import glob
import pandas as pd
files.sort()
data=[]
for iii in files:
data.append(tmp)
data = pd.concat(data,ignore_index=True)
#ノック43
data['receive_date']= data['receive_time'].dt.date
#ノック44
data['dayofweek'] = data['receive_time'].dt.dayofweek
data['day_name'] = data['receive_time'].dt.day_name()
#ノック45
import datetime as dtt
data_extract = data.loc[(data['receive_time']>=dtt.datetime(2021,1,20))&
(data['receive_time']<dtt.datetime(2021,1,23))].copy()
#ノック46
data_extract['receive_time_sec'] = data_extract['receive_time'].dt.round('S')
# print(data_extract.head())
# print(len(data_extract))
# print(len(data_extract['receive_time_sec'].unique()))
dupli_data = data_extract[data_extract['receive_time_sec'].duplicated(keep=False)]
# print(dupli_data)
data_extract['receive_time_sec'] = data_extract['receive_time'].dt.floor('S')
# print(len(data_extract))
# print(len(data_extract['receive_time_sec'].unique()))
#print(len(data_extract))
#ノック47
#print(pd.date_range('2021-01-15','2021-01-16',freq='S'))
min_receive = data_extract['receive_time_sec'].min()
max_receive = data_extract['receive_time_sec'].max()
date1 = pd.date_range(min_receive,max_receive,freq='S')
base_data = pd.DataFrame({'receive_time_sec':date1})
data_base_extract = pd.merge(base_data,data_extract,on='receive_time_sec',how='left')
#ノック48
data_base_extract.sort_values('receive_time_sec',inplace=True)
data_base_extract = data_base_extract.fillna(method='ffill')
#ノック49
data_analytics = data_base_extract[['receive_time_sec','in1','out1']].copy()
#print(data_analytics.head())
data_before_1sec = data_analytics.shift(1)
#print(data_before_1sec.head())
data_before_1sec.columns=['receive_time_sec_b1sec','in1_b1sec','out1_b1sec']
data_analytics = pd.concat([data_analytics,data_before_1sec],axis=1)
#print(data_analytics.head())
data_analytics['in1_calc']=data_analytics['in1']-data_analytics['in1_b1sec']
data_analytics['out1_calc']=data_analytics['out1']-data_analytics['out1_b1sec']
data_analytics['date_hour'] = data_analytics['receive_time_sec'].dt.strftime('%Y%m%d%H')
viz_data = data_analytics[['date_hour','in1_calc','out1_calc']].groupby('date_hour',as_index=False).sum()
import matplotlib.pyplot as plt
plt.figure(figsize=(10,5))
plt.xticks(rotation=90)
#plt.show()
#ノック50
viz_data = data_analytics[['date_hour','in1_calc','out1_calc']].groupby('date_hour',as_index=False).sum()
viz_data_rolling = viz_data[['in1_calc','out1_calc']].rolling(3).mean()
print(viz_data_rolling.head())
viz_data_rolling['date_hour'] = viz_data['date_hour']
viz_data_rolling = pd.melt(viz_data_rolling,id_vars='date_hour',value_vars=['in1_calc','out1_calc'])
sns.lineplot(x=viz_data_rolling['date_hour'],y=viz_data_rolling['value'],hue=viz_data_rolling['variable'])
plt.show()
meltメソッドを使用して横向きから縦向きにデータを変更します。
pandas.melt(df,
id_vars=None,
var_name=None,
col_level=None)
- df:対象となるデータフレーム
- id_vars:IDとして利用する変数(カラム)
- value_vars:melt する変数(カラム)、無指定の場合はid_vars以外の変数全部
- var_name:variable変数の変数(カラム)名、無指定の場合はvariableが変数(カラム)名
- value_name:value変数の変数(カラム)名、無指定の場合はvalueが変数(カラム)名
- col_level:meltする変数(カラム)のレベル指定
実行結果
in1_calc out1_calc
0 NaN NaN
1 NaN NaN
2 1.666667 2.333333
3 0.666667 0.333333
4 0.666667 1.000000