31. Pandas的数据分组-transform函数

pandas有些函数在不同的版本下可能使用功能有些不同,这是个麻烦事情。查看pandas的版本。

$ python
>>> import pandas as pd
>>> print pd.__version__
0.17.1

升级pandas

$ sudo pip install -U pandas

或者安装指定版本的软件:

$ sudo pip install pandas=x.y.z

x.y.z为选用的pandas的版本号。而本章的transform函数是在pandas的0.20版本后才加入pandas的。 transform函数可以作用于groupby之后的每个组的所有数据。

import pandas as pd
import numpy as np
idx = [101,101,101,102,102,102,103,103,103]
idx += [101,102,103] * 3
name = ["apple","pearl","orange", "apple","pearl","orange","apple","pearl","orange"]
name += ["apple"] * 3 + ["pearl"] * 3 + ["orange"] * 3
price = [4.1,5.3,6.3,4.20,5.4,6.0,4.5,5.5,6.8]
price += [4] * 3 + [5] * 3 + [6] * 3
df0 = pd.DataFrame({ "fruit": name, "price" : price, "supplier" :idx})
print "df", "*" * 30
print df0
def p_data(o):
    for name, group in o:
        print name
        print group[:3]
dg1 =  df0.groupby(["fruit"])
print "1", "*" * 30
print p_data(dg1)
def f1(x):
    return x + 1
def f2(x):
    return x + 100
print "2", "*" * 30
print dg1["price"].transform(f1)[:3]
print "3", "*" * 30
print dg1["supplier"].transform(f2)[:3]
print "4", "*" * 30
print dg1.transform(f2)[:3]

示例里的print dg1.transform(f2)[:3]语句是对dg1各组里各个值都在原有基础上加100,从执行结果可以看出是这样的。

df ******************************
     fruit  price  supplier
0    apple    4.1       101
1    pearl    5.3       101
2   orange    6.3       101
3    apple    4.2       102
4    pearl    5.4       102
5   orange    6.0       102
6    apple    4.5       103
7    pearl    5.5       103
8   orange    6.8       103
9    apple    4.0       101
10   apple    4.0       102
11   apple    4.0       103
12   pearl    5.0       101
13   pearl    5.0       102
14   pearl    5.0       103
15  orange    6.0       101
16  orange    6.0       102
17  orange    6.0       103
1 ******************************
apple
   fruit  price  supplier
0  apple    4.1       101
3  apple    4.2       102
6  apple    4.5       103
orange
    fruit  price  supplier
2  orange    6.3       101
5  orange    6.0       102
8  orange    6.8       103
pearl
   fruit  price  supplier
1  pearl    5.3       101
4  pearl    5.4       102
7  pearl    5.5       103
2 ******************************
0    5.1
1    6.3
2    7.3
Name: price, dtype: float64
3 ******************************
0    201
1    201
2    201
Name: supplier, dtype: int64
4 ******************************
   price  supplier
0  104.1       201
1  105.3       201
2  106.3       201