10. Pandas的DataFrame的访问
DataFrame是二维数据类型,每一列是Series,可以访问DataFrame的列再访问行,也可以用iloc、loc、at等属性来访问DataFrame。
10.1 []选择列
对DataFrame使用[]和对Series使用[]的结果不同,DataFrame使用了[]则是选择了一个字段所有数据即一列,而Series则是会得到某行的数据。
- DataFrame单列数据选择
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print "dataframe", "*" * 11
print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
print "series", "*" * 14
print ss
print df1["ax"]
print ss["a"]
程序执行结果:
dataframe ***********
ax bx cx
0 10 11 12
1 13 14 15
2 16 17 18
3 19 20 21
4 22 23 24
5 25 26 27
6 28 29 30
7 31 32 33
8 34 35 36
9 37 38 39
series **************
a 2
b 3
c 1
d 4
dtype: int64
0 10 # print df1["ax"]
1 13
2 16
3 19
4 22
5 25
6 28
7 31
8 34
9 37
Name: ax, dtype: int64
2 # print ss["a"]
- DataFrame多列数据选择,如果[]里给出多个列的名字组成的列表,则可以选择多列和Series一样。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
#print "dataframe", "*" * 11
#print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
#print "series", "*" * 14
#print ss
print df1[["ax", "cx"]]
print ss[["a", "d"]]
程序执行结果:
ax cx # print df1[["ax", "cx"]]
0 10 12
1 13 15
2 16 18
3 19 21
4 22 24
5 25 27
6 28 30
7 31 33
8 34 36
9 37 39
a 2 # print ss[["a", "d"]]
d 4
dtype: int64
10.2 loc[]选择行
dataFrame里可以通过loc[]的方式选择label标识的行数据。
- 通过label进行单行选择。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
print "series", "*" * 14
print ss
print df1.loc["a"]
print ss["a"]
程序执行结果:
dataframe ***********
ax bx cx
0 10 11 12
1 13 14 15
2 16 17 18
3 19 20 21
4 22 23 24
5 25 26 27
6 28 29 30
7 31 32 33
8 34 35 36
9 37 38 39
series **************
a 2
b 3
c 1
d 4
dtype: int64
ax 10 # print df1.loc["a"]
bx 11
cx 12
Name: 0, dtype: int64
2 # print ss["a"]
- 可以在loc[]的[]里给出列表list列出要选出的多行,从而得到所行数据。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
#print "dataframe", "*" * 11
#print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
#print "series", "*" * 14
#print ss
print df1.loc[["a","c"]]
print ss[["a", "c"]]
程序执行结果:
ax bx cx # print df1.loc[["a","c"]]
a 10 11 12
c 16 17 18
a 2 # print ss[["a", "c"]]
c 1
10.3 iloc[]选择行
与loc[]不同之处,iloc[]里是位置信息,而loc[]里是标签信息。
- iloc[]选择单行数据。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
#print "dataframe", "*" * 11
#print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
#print "series", "*" * 14
#print ss
print df1.iloc[1]
print ss[1]
程序执行结果:
ax 13 # print df1.iloc[1]
bx 14
cx 15
Name: b, dtype: int64
3 # print ss[1]
- iloc[]选择多行数据,只需在[]给出要选择行的位置信息组成的列表即可。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
ss = pd.Series([2,3,1,4], index=list("abcd"))
print "series", "*" * 14
print ss
print df1.iloc[[0, 1, 3]]
print ss[[0, 1, 3]]
程序执行结果:
dataframe ***********
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
d 19 20 21
e 22 23 24
f 25 26 27
g 28 29 30
h 31 32 33
i 34 35 36
j 37 38 39
series **************
a 2
b 3
c 1
d 4
dtype: int64
ax bx cx # print df1.iloc[[0, 1, 3]]
a 10 11 12
b 13 14 15
d 19 20 21
a 2 # print ss[[0, 1, 3]]
b 3
d 4
dtype: int64
10.4 at[]选择标签指定某值
DataFrame有行和列的概念,在at[]给出行和列label信息可以选择对应行列上的数据值。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
print df1.at["b", "bx"]
程序执行结果
dataframe ***********
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
d 19 20 21
e 22 23 24
f 25 26 27
g 28 29 30
h 31 32 33
i 34 35 36
j 37 38 39
14 # print df1.at["b", "bx"]
10.5 iat[]选择位置上的值
iat[]的[]里给出行和列的位置信息,即可选择位置上的数据。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
print df1.iat[1, 2]
iat[]里的1是第1行,2是第2列。
10.6 ix[]混合选择
ix[]的[]里可以是label数据和位置数据的混合使用。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
lbl = list("abcdefghij")
df1 = pd.DataFrame(val, columns = idx, index = lbl)
print "dataframe", "*" * 11
print df1
print df1.ix[[0,1,2], ["ax","cx"]]
print df1.ix[["ax","cx"], [0,1,2]]
程序结果:
dataframe ***********
ax bx cx
a 10 11 12
b 13 14 15
c 16 17 18
d 19 20 21
e 22 23 24
f 25 26 27
g 28 29 30
h 31 32 33
i 34 35 36
j 37 38 39
ax cx # print df1.ix[[0,1,2], ["ax","cx"]]
a 10 12
b 13 15
c 16 18
ax bx cx # print df1.ix[["ax","cx"], [0,1,2]]
ax NaN NaN NaN
cx NaN NaN NaN
从print df1.ix[["ax","cx"], [0,1,2]]
语句的结果可以看出在at、iat、ix等先给出的是行信息,后边是列信息。