3. Pandas的Series的index

3.1 Series的index

在创建Series时可以通过Series的构造函数指定或不指定Series的index属性，Series创建完后可以通过点儿(.)语法访问该Series的index属性。

import pandas as pd
s = pd.Series([2, 4, 5, 6])
print "s:\n", s
val = [2, 4, 5, 6]
idx =  "hello the cruel world".split()
t = pd.Series(val, index = idx, name = "col_name")
print "t:\n", t

执行结果：

s:
0    2
1    4
2    5
3    6
dtype: int64
t:
hello    2
the      4
cruel    5
world    6
Name: col_name, dtype: int64

s是没有指定index数据的，pandas会默认给s分配了位置信息(0~长度减1)作为s的index。t指定了其index(此时称label更合适)但指定了index的Series其实也是有整形的index的。所以对于指定了index的Series其实是有两套index指向Series的各个元素值的，一个是显式的index，通常用Series的构造函数的index参数给出，有时称label，其数据类型是object。另一个是隐式的index，从0到len()-1，称位置position，其数据类型是int64。一般情况下在pandas里使用Series和dataframe都用字符串作为index，而在学习Series时为了方便才用整形0～len()-1作为index的。所以一般要为每个Series、dataframe分配一个字符串型的index更为常用、常见。

import pandas as pd
s = pd.Series([2, 4, 5, 6])
val = [2, 4, 5, 6]
idx =  "hello the cruel world".split()
t = pd.Series(val, index = idx, name = "col_name")
print s.index
print t.index

执行结果：

Int64Index([0, 1, 2, 3], dtype='int64')
Index([u'hello', u'the', u'cruel', u'world'], dtype='object')

t的index的属性是object，而s的index属性则是int64。两者的values都是int64。如果index指定的是int64型的数据，那么又会怎样？

import pandas as pd
import numpy as np
val = [2, 4, 5, 6]
ii = range(10, 14)
s0 = pd.Series(val)
s1 = pd.Series(val, index = ii)
idx =  "hello the cruel world".split()
t = pd.Series(val, index = idx)
print s0.index
print s1.index
print t.index
print s0[0]
print s1[10]#,s1[0]#wrong
print t[0], t["hello"]

对于s1这个Series的实例对象，由于指定的index和0～len()-1的index都是整形数据，在通过index访问某数据时只能用创建s1时的index即ii里的元素来访问每个值，s1[0]则会报错，而s1[10]里的10在index的ii里则没有错。有些奇怪！不过没关系，一般在使用Pandas的series时通常Series构造函数的index都不会是int64的，如果非要int64的，那么Series函数就不要用index形参了。一般Series的形参index会是字符串列表。上边的例子会让人难以理解(int64一般标识位置数据但被用作了标签label，所以产生了理解上的异常)，但没关系，s1这种方式pandas的使用者或应用场景里是不会用的。 总结一下：创建Series的实例对象，有两种方式：一种是不指定Series构造函数的index，会自动有一个位置信息的index从0～len() -1自动为这个Series对象分配。另一种方式是常用字符列表作为Series对象的各个数据的标签label。

3.2 index的使用

在pandas里对Series的各个位置、标签上的数据的访问，可以通过loc、iloc、at、iat或ix来访问。带i的一般是通过位置相关得到数据，不带i的通过标签label来获得对应数据，ix既可以接收位置也可接收label。这里的loc、iloc等不是函数，可以理解为index的属性。

import pandas as pd
import numpy as np
val = [2, 4, 5, 6]
ii = range(10, 14)
s0 = pd.Series(val)
s1 = pd.Series(val, index = ii)
idx =  "hello the cruel world".split()
t = pd.Series(val, index = idx)
print "s0", "*" * 11
print s0.iloc[0]
print s0.iat[3]
print s0.loc[0]
print s0.at[3]
print "s2", "*" * 11
print s1.iloc[1]
print s1.iat[2]
print s1.loc[11]
print s1.at[12]
#print s1.iloc[11]#wrong
#print s1.iat[12]
#print s1.loc[1] #wrong
#print s1.at[2]
print "t", "*" * 12
print t.iloc[0]
print t.iat[2]
print t.loc["hello"]
print t.at["cruel"]
print t.ix[1]
print t.ix["the"]

3.3 多index选择

可以在Series对象里使用多个index，用列表存储，一次可以从Series对象里提取出多条数据。

import pandas as pd
import numpy as np
val = [2, 4, 5, 6]
ii = range(10, 14)
s0 = pd.Series(val)
s1 = pd.Series(val, index = ii)
idx =  "hello the cruel world".split()
t = pd.Series(val, index = idx)
print "s0", "*" * 11
print s0[[0,1, 2]]
print "s1", "*" * 11
print s1[[10,11, 12]]
print s1[[0,1, 2]]
print "t", "*" * 12
print t[[0,1, 2]]
print t[["hello","the", "cruel"]]
print "-" * 14
print s0.iloc[[0, 2]],'\n'
print s1.loc[[10, 12]],'\n'
print t.ix[["hello","the", "cruel"]]

程序的执行结果：

s0 ***********
0    2
1    4
2    5
dtype: int64
s1 ***********
10    2
11    4
12    5
dtype: int64
0   NaN
1   NaN
2   NaN
dtype: float64
t ************
hello    2
the      4
cruel    5
dtype: int64
hello    2
the      4
cruel    5
dtype: int64
--------------
0    2
2    5
dtype: int64 

10    2
12    5
dtype: int64 

hello    2
the      4
cruel    5
dtype: int64

语句print s1[[0,1, 2]]的结果为何是？

0   NaN
1   NaN
2   NaN
dtype: float64

原因s1里没有int64的位置为0、1、2的，而label有10、11、12。

3.4 建议

在创建Series对象时，建议不同int64作为label，用字符串作为Series的label在后续pandas的使用比较常见且有必要。int64可以作为位置信息，但一般从0开始，非0开始的位置信息建议不要用。