14. Pandas的DataFrame行操作
本章主要围绕对dataframe行的各项操作展开。
14.1 append增加行
append函数可以将某dataframe添加到另一个dataframe的尾部组成一个新的dataframe,如果列不同,没有数据的对应填充NaN数据,append不影响原dataframe。
import pandas as pd
import numpy as np
val1 = np.arange(10, 40).reshape(10, 3)
val2 = np.arange(50, 80).reshape(10, 3)
col1 = ["ax", "bx", "cx"]
col2 = ["cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df2 = pd.DataFrame(val2, columns = col2, index = idx)
print "*" * 21
print df1[:3]
print "*" * 21
print df2[:3]
print "*" * 21
df3 = df1.append(df2)
print df1
print "*" * 21
print df3
程序执行的结果:
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
*********************
cx dx ex
a 50 51 52
b 53 54 55
c 56 57 58
*********************
ax bx cx
a 10 11 12
b 13 14 15
....
i 34 35 36
j 37 38 39
*********************
ax bx cx dx ex
a 10 11 12 NaN NaN
b 13 14 15 NaN NaN
....
i 34 35 36 NaN NaN
j 37 38 39 NaN NaN
a NaN NaN 50 51 52
b NaN NaN 53 54 55
....
i NaN NaN 74 75 76
j NaN NaN 77 78 79
14.2 concat连接多行
pandas的concat可以多列连接dataframe也可多行连接dataframe,区别在于axis的指定,当axis为0时是行连接。
import pandas as pd
import numpy as np
val1 = np.arange(10, 40).reshape(10, 3)
val2 = np.arange(50, 80).reshape(10, 3)
col1 = ["ax", "bx", "cx"]
col2 = ["cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df2 = pd.DataFrame(val2, columns = col2, index = idx)
print "*" * 21
print df1[:3]
print "*" * 21
print df2[:3]
print "*" * 21
df3 = pd.concat([df1,df2], axis = 0)
print df1[:3]
print "*" * 21
print df3[:2]
print df3[-2:]
程序的结果:
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
*********************
cx dx ex
a 50 51 52
b 53 54 55
c 56 57 58
*********************
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
*********************
ax bx cx dx ex
a 10 11 12 NaN NaN
b 13 14 15 NaN NaN
ax bx cx dx ex
i NaN NaN 74 75 76
j NaN NaN 77 78 79
14.3 行内容替换
通过dataframe对象名[label]赋值的方式修改对应行的数据。
import pandas as pd
import numpy as np
val1 = np.arange(10, 40).reshape(10, 3)
val2 = np.arange(50, 80).reshape(10, 3)
col1 = ["ax", "bx", "cx"]
col2 = ["cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df2 = pd.DataFrame(val2, columns = col2, index = idx)
print "*" * 21
print df1[:3]
print "*" * 21
print df2[:3]
print "*" * 21
df1.loc["a"] = df2.loc["c"]
print df1[:3]
df1.loc["a"] = [11, 22, 33]
print df1[:3]
程序的执行结果:
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
*********************
cx dx ex
a 50 51 52
b 53 54 55
c 56 57 58
*********************
ax bx cx
a NaN NaN 56
b 13 14 15
c 16 17 18
ax bx cx
a 11 22 33
b 13 14 15
c 16 17 18
14.4 删除行
删除dataframe的行和删除列一样有很多的方式。
- drop删除指定的各个行,用列表给出行的信息数据,返回值是原dataframe删除后的数据,原dataframe不受影响。
import pandas as pd
import numpy as np
val1 = np.arange(10, 40).reshape(10, 3)
val2 = np.arange(50, 80).reshape(10, 3)
col1 = ["ax", "bx", "cx"]
col2 = ["cx", "dx", "ex"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val1, columns = col1, index = idx)
df2 = pd.DataFrame(val2, columns = col2, index = idx)
print "*" * 21
print df1[:3]
print "*" * 21
print df2[:3]
print "*" * 21
df3 = df1.drop(["a", "c", "f"])
print df1[:3]
print df3[:3]
程序的执行结果:
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
*********************
cx dx ex
a 50 51 52
b 53 54 55
c 56 57 58
*********************
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
ax bx cx
b 13 14 15
d 19 20 21
e 22 23 24
-
利用切片的结果赋值给新的dataframe也是一种变相的删除。
-
利用布尔选择的结果也是一种变相的删除。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
col = ["ax", "bx", "cx"]
idx = list("abcdefghij")
df1 = pd.DataFrame(val, columns = col, index = idx)
print "*" * 21
print df1
print "*" * 21
bs = df1["ax"] > 33
df2 = df1[bs]
print df2
df2 = df1[:3]
print df2
程序的执行结果:
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
d 19 20 21
e 22 23 24
f 25 26 27
g 28 29 30
h 31 32 33
i 34 35 36
j 37 38 39
*********************
ax bx cx
i 34 35 36
j 37 38 39
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18