13. Pandas的DataFrame列操作

本章主要研究一下DataFrame数据结构如何修改、增删等操作。

13.1 rename修改列名字

对一个dataframe的数据使用rename函数后返回新的dataframe,不影响原dataframe。

import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print df1
print "*" * 21, "<- dataframe"
df2 = df1.rename(columns = {"ax" : "close", "bx" : "open"})
print df2
print "*" * 21, "<- dataframe"

程序执行结果:

   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
   close  open  cx  dx  ex
a     10    11  12  13  14
b     15    16  17  18  19
c     20    21  22  23  24
d     25    26  27  28  29
e     30    31  32  33  34
f     35    36  37  38  39
g     40    41  42  43  44
h     45    46  47  48  49
i     50    51  52  53  54
j     55    56  57  58  59
********************* <- dataframe

如果想直接影响本dataframe,可以使用参数inplace设置为True。

import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print df1
print "*" * 21, "<- dataframe"
df1.rename(columns = {"ax" : "close", "bx" : "open"}, inplace = True)
print df1
print "*" * 21, "<- dataframe"

程序的执行结果:

   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
   close  open  cx  dx  ex
a     10    11  12  13  14
b     15    16  17  18  19
c     20    21  22  23  24
d     25    26  27  28  29
e     30    31  32  33  34
f     35    36  37  38  39
g     40    41  42  43  44
h     45    46  47  48  49
i     50    51  52  53  54
j     55    56  57  58  59
********************* <- dataframe

13.2 增加一列

在pandas里对dataframe数据的增加可以通过[]或者insert函数等方法来实现。

  • []方式将新的series添加在原dataframe的尾部。
import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print df1
print "*" * 21, "<- dataframe"
nval = val = np.arange(100, 110).reshape(10, 1)
df1["fx"] = nval 
print df1

程序的执行结果:

   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
   ax  bx  cx  dx  ex   fx
a  10  11  12  13  14  100
b  15  16  17  18  19  101
c  20  21  22  23  24  102
d  25  26  27  28  29  103
e  30  31  32  33  34  104
f  35  36  37  38  39  105
g  40  41  42  43  44  106
h  45  46  47  48  49  107
i  50  51  52  53  54  108
j  55  56  57  58  59  109

  • 而insert函数可将插入的series放在指定位置。
import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print df1
print "*" * 21, "<- dataframe"
nval = val = np.arange(100, 110).reshape(10, 1)
df1["fx"] = nval 
print df1
print "*" * 21, "<- dataframe"
df1.insert(1, "gx", nval)
print df1
print "*" * 21, "<- dataframe"

程序的执行结果:

   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
   ax  bx  cx  dx  ex   fx
a  10  11  12  13  14  100
b  15  16  17  18  19  101
c  20  21  22  23  24  102
d  25  26  27  28  29  103
e  30  31  32  33  34  104
f  35  36  37  38  39  105
g  40  41  42  43  44  106
h  45  46  47  48  49  107
i  50  51  52  53  54  108
j  55  56  57  58  59  109
********************* <- dataframe
   ax   gx  bx  cx  dx  ex   fx
a  10  100  11  12  13  14  100
b  15  101  16  17  18  19  101
c  20  102  21  22  23  24  102
d  25  103  26  27  28  29  103
e  30  104  31  32  33  34  104
f  35  105  36  37  38  39  105
g  40  106  41  42  43  44  106
h  45  107  46  47  48  49  107
i  50  108  51  52  53  54  108
j  55  109  56  57  58  59  109
********************* <- dataframe
  • loc[]来添加新的数据列。
import pandas as pd
import numpy as np
val = np.arange(10, 60).reshape(10, 5)
col = ["ax", "bx", "cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print df1
print "*" * 21, "<- dataframe"
nval = val = np.arange(100, 110).reshape(10, 1)
df1.loc[:, "ix"] = nval 
print df1
print "*" * 21, "<- dataframe"

程序的执行结果:

   ax  bx  cx  dx  ex
a  10  11  12  13  14
b  15  16  17  18  19
c  20  21  22  23  24
d  25  26  27  28  29
e  30  31  32  33  34
f  35  36  37  38  39
g  40  41  42  43  44
h  45  46  47  48  49
i  50  51  52  53  54
j  55  56  57  58  59
********************* <- dataframe
   ax  bx  cx  dx  ex   ix
a  10  11  12  13  14  100
b  15  16  17  18  19  101
c  20  21  22  23  24  102
d  25  26  27  28  29  103
e  30  31  32  33  34  104
f  35  36  37  38  39  105
g  40  41  42  43  44  106
h  45  46  47  48  49  107
i  50  51  52  53  54  108
j  55  56  57  58  59  109

13.3 concat多列连接

pandas有个concat函数可以连接多个dataframe数据组成一个更大的dataframe数据。

import pandas as pd
import numpy as np
val1 = np.arange(10, 40).reshape(10, 3)
val2 = np.arange(50, 80).reshape(10, 3)
col1 = ["ax", "bx", "cx"]
col2 = ["cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df2 = pd.DataFrame(val2, columns = col2, index = idx)
print df1
print "*" * 21, "<- dataframe"
print df2
print "*" * 21, "<- dataframe"
df3 = pd.concat([df1, df2[5:], df1[:5],df2], axis = 1)
print df3

程序执行结果:

********************
   ax  bx  cx
a  10  11  12
b  13  14  15
c  16  17  18
d  19  20  21
e  22  23  24
f  25  26  27
g  28  29  30
h  31  32  33
i  34  35  36
j  37  38  39
********************* <- dataframe
   cx  dx  ex
a  50  51  52
b  53  54  55
c  56  57  58
d  59  60  61
e  62  63  64
f  65  66  67
g  68  69  70
h  71  72  73
i  74  75  76
j  77  78  79
********************* <- dataframe
   ax  bx  cx  cx  dx  ex  ax  bx  cx  cx  dx  ex
a  10  11  12 NaN NaN NaN  10  11  12  50  51  52
b  13  14  15 NaN NaN NaN  13  14  15  53  54  55
c  16  17  18 NaN NaN NaN  16  17  18  56  57  58
d  19  20  21 NaN NaN NaN  19  20  21  59  60  61
e  22  23  24 NaN NaN NaN  22  23  24  62  63  64
f  25  26  27  65  66  67 NaN NaN NaN  65  66  67
g  28  29  30  68  69  70 NaN NaN NaN  68  69  70
h  31  32  33  71  72  73 NaN NaN NaN  71  72  73
i  34  35  36  74  75  76 NaN NaN NaN  74  75  76
j  37  38  39  77  78  79 NaN NaN NaN  77  78  79

从结果可以看出,连接的两个dataframe结构不同,即有的dataframe没有相应的行,那么数据行上无数据用NaN填充。

13.4 列的内容替换

可以通过赋值的方式更换列的数值。

import pandas as pd
import numpy as np
val1 = np.arange(10, 40).reshape(10, 3)
val2 = np.arange(50, 80).reshape(10, 3)
col1 = ["ax", "bx", "cx"]
col2 = ["cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df2 = pd.DataFrame(val2, columns = col2, index = idx)
print df1[:3]
print "*" * 21, "<- dataframe"
print df2[:3]
print "*" * 21, "<- dataframe"
df1.cx = df2.cx
print df1[:3]

这里df1里的cx列被换成了df2里的cx内容。

   ax  bx  cx
a  10  11  12
b  13  14  15
c  16  17  18
********************* <- dataframe
   cx  dx  ex
a  50  51  52
b  53  54  55
c  56  57  58
********************* <- dataframe
   ax  bx  cx
a  10  11  50
b  13  14  53
c  16  17  56

13.5 删除列

删除dataframe的列可以用del()、dataframe的pop函数、drop函数。del函数直接影响原dataframe,pop函数返回被删除的数据即某列,其结果是一个Series,而drop可以指定多列删除。

import pandas as pd
import numpy as np
val1 = np.arange(10, 40).reshape(10, 3)
val2 = np.arange(50, 80).reshape(10, 3)
col1 = ["ax", "bx", "cx"]
col2 = ["cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df2 = pd.DataFrame(val2, columns = col2, index = idx)
print "*" * 21
print df1[:3]
print "*" * 21
print df2[:3]
del df1["cx"]
print "*" * 21
print df1[:3]
df3 = df2.pop("cx")
print "+" * 21
print df2[:3]
print "-" * 21
print df3[:3]
print "/" * 21
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df4 = df1.drop(["ax", "cx"], axis = 1)
print df1[:3]
print df4[:3]

程序执行结果如下:

   ax  bx  cx
a  10  11  12
b  13  14  15
c  16  17  18
*********************
   cx  dx  ex
a  50  51  52
b  53  54  55
c  56  57  58
*********************
   ax  bx
a  10  11
b  13  14
c  16  17
+++++++++++++++++++++
   dx  ex
a  51  52
b  54  55
c  57  58
---------------------
a    50
b    53
c    56
Name: cx, dtype: int64
/////////////////////
   ax  bx  cx
a  10  11  12
b  13  14  15
c  16  17  18
   bx
a  11
b  14
c  17