9. Pandas的DataFrame属性
前一章介绍了如何将其他类型的数据转为pandas里的DataFrame,本章介绍一下dataframe的一些属性。为了更好的演示,可以先读一下前章节的iris.data文件到dataframe里,https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data 这个文件是csv格式的可以用read_csv函数读取。
import pandas as pd
fn = "iris.data"
cols_name = ['sepal length', 'sepal width', 'petal length', 'petal width', 'class']
df = pd.read_csv(fn, names = cols_name)
print df
程序运行结果:
sepal length sepal width petal length petal width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
.....
9.1 columns属性
columns属性可以获得dataframe有那些列,即dataframe的index。
import pandas as pd
fn = "iris.data"
cols_name = ['sepal length', 'sepal width', 'petal length', 'petal width', 'class']
df = pd.read_csv(fn, names = cols_name)
print df[:3]
print df.columns
程序执行结果:
sepal length sepal width petal length petal width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
Index([u'sepal length', u'sepal width', u'petal length', u'petal width',
u'class'],
dtype='object')
程序里是通过read_csv函数的names参数来指定生成的dataframe对象df的colums的,如果dataframe是通过pandas的DataFrame构造函数来创建需要使用columns形参来指定嗯dataframe对象的colums信息。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df1
print df1.columns
print df1.index
9.2 shape属性
shape属性是描述dataframe的形状的。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df1
print df1.shape
程序执行结果:
ax bx cx
0 10 11 12
1 13 14 15
2 16 17 18
3 19 20 21
4 22 23 24
5 25 26 27
6 28 29 30
7 31 32 33
8 34 35 36
9 37 38 39
(10, 3)
9.3 size属性
dataframe的size属性返回的是dataframe的value的个数。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df.shape
print df1.size
9.4 values属性
values属性是返回当前dataframe的数据和index、columns相对应。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df1.values
9.5 dtypes属性
dtypes属性是描述当前dataframe的里的每列值的数据类型。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df1.dtypes
程序执行结果:
ax int64
bx int64
cx int64
dtype: object
9.6 ndim属性
dataframe的ndim属性和numpy的ndim意思一样。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print df1.shape
print df1.ndim
程序执行结果:
(10, 3)
2
9.7 T属性
dataframe的T属性,实际是转置的意思。
import pandas as pd
import numpy as np
val = np.arange(10, 40).reshape(10, 3)
idx = ["ax", "bx", "cx"]
df1 = pd.DataFrame(val, columns = idx)
print "df1", "*" * 13
print df1
print "df1.T", "*" * 11
print df1.T
程序执行结果:
df1 *************
ax bx cx
0 10 11 12
1 13 14 15
2 16 17 18
3 19 20 21
4 22 23 24
5 25 26 27
6 28 29 30
7 31 32 33
8 34 35 36
9 37 38 39
df1.T ***********
0 1 2 3 4 5 6 7 8 9
ax 10 13 16 19 22 25 28 31 34 37
bx 11 14 17 20 23 26 29 32 35 38
cx 12 15 18 21 24 27 30 33 36 39
即列变行、行变列。