31. Pandas的数据分组-transform函数
pandas有些函数在不同的版本下可能使用功能有些不同,这是个麻烦事情。查看pandas的版本。
$ python
>>> import pandas as pd
>>> print pd.__version__
0.17.1
升级pandas
$ sudo pip install -U pandas
或者安装指定版本的软件:
$ sudo pip install pandas=x.y.z
x.y.z为选用的pandas的版本号。而本章的transform函数是在pandas的0.20版本后才加入pandas的。 transform函数可以作用于groupby之后的每个组的所有数据。
import pandas as pd
import numpy as np
idx = [101,101,101,102,102,102,103,103,103]
idx += [101,102,103] * 3
name = ["apple","pearl","orange", "apple","pearl","orange","apple","pearl","orange"]
name += ["apple"] * 3 + ["pearl"] * 3 + ["orange"] * 3
price = [4.1,5.3,6.3,4.20,5.4,6.0,4.5,5.5,6.8]
price += [4] * 3 + [5] * 3 + [6] * 3
df0 = pd.DataFrame({ "fruit": name, "price" : price, "supplier" :idx})
print "df", "*" * 30
print df0
def p_data(o):
for name, group in o:
print name
print group[:3]
dg1 = df0.groupby(["fruit"])
print "1", "*" * 30
print p_data(dg1)
def f1(x):
return x + 1
def f2(x):
return x + 100
print "2", "*" * 30
print dg1["price"].transform(f1)[:3]
print "3", "*" * 30
print dg1["supplier"].transform(f2)[:3]
print "4", "*" * 30
print dg1.transform(f2)[:3]
示例里的print dg1.transform(f2)[:3]
语句是对dg1各组里各个值都在原有基础上加100,从执行结果可以看出是这样的。
df ******************************
fruit price supplier
0 apple 4.1 101
1 pearl 5.3 101
2 orange 6.3 101
3 apple 4.2 102
4 pearl 5.4 102
5 orange 6.0 102
6 apple 4.5 103
7 pearl 5.5 103
8 orange 6.8 103
9 apple 4.0 101
10 apple 4.0 102
11 apple 4.0 103
12 pearl 5.0 101
13 pearl 5.0 102
14 pearl 5.0 103
15 orange 6.0 101
16 orange 6.0 102
17 orange 6.0 103
1 ******************************
apple
fruit price supplier
0 apple 4.1 101
3 apple 4.2 102
6 apple 4.5 103
orange
fruit price supplier
2 orange 6.3 101
5 orange 6.0 102
8 orange 6.8 103
pearl
fruit price supplier
1 pearl 5.3 101
4 pearl 5.4 102
7 pearl 5.5 103
2 ******************************
0 5.1
1 6.3
2 7.3
Name: price, dtype: float64
3 ******************************
0 201
1 201
2 201
Name: supplier, dtype: int64
4 ******************************
price supplier
0 104.1 201
1 105.3 201
2 106.3 201