34. Pandas的时间序列数据-date_range函数
在pandas里可以使用date_range函数产生时间集合,即一系列的时间。有点儿像range函数,但是形参不是整数而是时间。
- freq设置一定的时间间隔。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-01-01', freq = "2D")
print cur0
cur1 = pd.date_range('12/16/2018', '2019-01-01', freq = "W")
print cur1
cur2 = pd.date_range('2018-12-16 17:30:30', '2019-01-01', freq = "6H")
print cur2
cur3 = pd.date_range('2018-12-16', '2019-08-01', freq = "M")
print cur3
cur4 = pd.date_range('2010-12-16', '2019-01-01', freq = "Y")
print cur4
cur5 = pd.date_range('2010', '2019', freq = "AS")
print cur5
程序的执行结果:
DatetimeIndex(['2018-12-16', '2018-12-18', '2018-12-20', '2018-12-22',
'2018-12-24', '2018-12-26', '2018-12-28', '2018-12-30',
'2019-01-01'],
dtype='datetime64[ns]', freq='2D')
DatetimeIndex(['2018-12-16', '2018-12-23', '2018-12-30'], dtype='datetime64[ns]', freq='W-SUN')
DatetimeIndex(['2018-12-16 17:30:30', '2018-12-16 23:30:30',
'2018-12-17 05:30:30', '2018-12-17 11:30:30',
'2018-12-17 17:30:30', '2018-12-17 23:30:30',
'2018-12-18 05:30:30', '2018-12-18 11:30:30',
'2018-12-18 17:30:30', '2018-12-18 23:30:30'],
dtype='datetime64[ns]', freq='6H')
DatetimeIndex(['2018-12-31', '2019-01-31', '2019-02-28', '2019-03-31',
'2019-04-30', '2019-05-31', '2019-06-30', '2019-07-31'],
dtype='datetime64[ns]', freq='M')
DatetimeIndex(['2010-12-31', '2011-12-31', '2012-12-31', '2013-12-31',
'2014-12-31', '2015-12-31', '2016-12-31', '2017-12-31',
'2018-12-31'],
dtype='datetime64[ns]', freq='A-DEC')
DatetimeIndex(['2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01',
'2014-01-01', '2015-01-01', '2016-01-01', '2017-01-01',
'2018-01-01', '2019-01-01'],
dtype='datetime64[ns]', freq='AS-JAN')
freq="2D"
是间隔两天,freq='6H'
则为间隔6小时,freq='M'
间隔以月为单位。更多的date_range函数的freq参数,常用的参考参数值如下表
Alias | Description |
---|---|
B | business day frequency |
C | custom business day frequency |
D | calendar day frequency |
W | weekly frequency |
M | month end frequency |
SM | semi-month end frequency (15th and end of month) |
BM | business month end frequency |
CBM | custom business month end frequency |
MS | month start frequency |
SMS | semi-month start frequency (1st and 15th) |
BMS | business month start frequency |
CBMS | custom business month start frequency |
Q | quarter end frequency |
BQ | business quarter end frequency |
QS | quarter start frequency |
BQS | business quarter start frequency |
A, Y | year end frequency |
BA, BY | business year end frequency |
AS, YS | year start frequency |
BAS, BYS | business year start frequency |
BH | business hour frequency |
H | hourly frequency |
T, min | minutely frequency |
S | secondly frequency |
L, ms | milliseconds |
U, us | microseconds |
N | nanoseconds |
表里的T是分钟,而B则是工作日的意思。接下来可以借助date_range来创建一个时间序列。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-02-05', freq = "B")
#print cur0, len(cur0)
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts[:14]
程序执行结果:
2018-12-17 0.128278
2018-12-18 -0.128049
2018-12-19 0.872805
2018-12-20 -0.809540
2018-12-21 -0.104894
2018-12-24 0.720047
2018-12-25 0.965698
2018-12-26 0.926640
2018-12-27 -1.505794
2018-12-28 0.246031
2018-12-31 -0.536505
2019-01-01 1.609414
2019-01-02 0.459005
2019-01-03 0.347774
Freq: B, dtype: float64
从结果第一列可以看出周六、周日时间不存在,freq = "B"
只产生工作日的时间。
下面的例子是产生都是周几的时间。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16', '2019-02-05', freq = "W-WED")
print cur0
程序执行结果:
DatetimeIndex(['2018-12-19', '2018-12-26', '2019-01-02', '2019-01-09',
'2019-01-16', '2019-01-23', '2019-01-30'],
dtype='datetime64[ns]', freq='W-WED')
- period设置时间的个数。
import numpy as np
import pandas as pd
cur0 = pd.date_range('2018-12-16 18:30:34', periods=5, freq='2h20min')
vi = np.random.randn(len(cur0))
ts = pd.Series(vi, index = cur0)
print ts
执行结果:
2018-12-16 18:30:34 -0.289575
2018-12-16 20:50:34 -0.782106
2018-12-16 23:10:34 0.152276
2018-12-17 01:30:34 -0.661511
2018-12-17 03:50:34 -1.676650
Freq: 140T, dtype: float64