NumPy 数据类型

NumPy支持比Python更丰富的数据类型，并且允许对数据类型进行精细控制。了解数据类型有助于优化内存使用、提高计算效率并避免潜在的错误。

1. NumPy中的基本数据类型

NumPy的数据类型是通过 dtype 对象来表示的。下表列出了常用数据类型及其范围：

类型名称	简写	描述	字节大小
`bool_`	`bool`	布尔值（True或False）	1
`int8`	`i1`	有符号8位整数	1
`int16`	`i2`	有符号16位整数	2
`int32`	`i4`	有符号32位整数	4
`int64`	`i8`	有符号64位整数	8
`uint8`	`u1`	无符号8位整数	1
`uint16`	`u2`	无符号16位整数	2
`uint32`	`u4`	无符号32位整数	4
`uint64`	`u8`	无符号64位整数	8
`float16`	`f2`	半精度浮点数	2
`float32`	`f4`	单精度浮点数	4
`float64`	`f8`	双精度浮点数（Python float的默认精度）	8
`complex64`	`c8`	两个32位浮点数表示的复数	8
`complex128`	`c16`	两个64位浮点数表示的复数	16
`str_`	`S`, `U`	字符串类型（`S`表示字节串，`U`表示Unicode）	可变

注意：Python的默认整数类型（如int）对应NumPy的 int64 或 int32（取决于平台），默认浮点类型对应 float64。

2. 查看数组的数据类型：`dtype` 属性

每个NumPy数组都有一个 dtype 属性，返回其数据类型对象。

import numpy as np

arr1 = np.array([1, 2, 3])
print("arr1.dtype =", arr1.dtype)

arr2 = np.array([1.0, 2.0, 3.0])
print("arr2.dtype =", arr2.dtype)

arr3 = np.array([True, False])
print("arr3.dtype =", arr3.dtype)

输出：

arr1.dtype = int64
arr2.dtype = float64
arr3.dtype = bool

3. 创建数组时指定数据类型

可以在创建数组时通过 dtype 参数显式指定数据类型，以控制内存和精度。

# 指定为 int32
a = np.array([1, 2, 3], dtype=np.int32)
print("int32 array:", a.dtype)

# 指定为 float32
b = np.array([1.5, 2.5, 3.5], dtype=np.float32)
print("float32 array:", b.dtype)

# 使用简写形式
c = np.array([1, 2, 3], dtype='i4')   # i4 表示 int32
print("使用简写 i4:", c.dtype)

输出：

int32 array: int32
float32 array: float32
使用简写 i4: int32

4. 数据类型转换：`astype()` 方法

使用 astype() 方法可以将数组转换为新的数据类型（返回副本，不会修改原数组）。

arr = np.array([1, 2, 3])   # 默认 int64
print("原数组类型:", arr.dtype)

# 转换为 float64
arr_float = arr.astype(np.float64)
print("转换为 float64:", arr_float.dtype)

# 转换为 bool
arr_bool = arr.astype(bool)
print("转换为 bool:", arr_bool)
print("bool 数组:", arr_bool)

# 转换为 int32（注意浮点转整数会截断小数部分）
float_arr = np.array([1.2, 2.7, 3.0])
int_arr = float_arr.astype(np.int32)
print("浮点转整数:", int_arr)

输出：

原数组类型: int64
转换为 float64: float64
转换为 bool: True
bool 数组: [ True  True  True]
浮点转整数: [1 2 3]

注意： 浮点数转换为整数时，小数部分会被截断（而不是四舍五入）。如果数据中包含 NaN 或无穷大，转换会导致错误。

5. 数据类型对象 `dtype` 的属性

dtype 对象本身也有许多有用的属性，可以获取类型的信息。

dt = np.dtype('float64')
print("数据类型对象:", dt)
print("名称:", dt.name)
print("字节大小:", dt.itemsize)
print("字符代码:", dt.char)
print("类型:", dt.type)   # 对应的Python类型

输出：

数据类型对象: float64
名称: float64
字节大小: 8
字符代码: d
类型:

6. 复数类型

NumPy支持复数类型 complex64（实部和虚部各为32位浮点）和 complex128（各为64位浮点）。

# 创建复数数组
comp_arr = np.array([1+2j, 3+4j, 5+6j])
print("默认复数类型:", comp_arr.dtype)

# 指定为 complex64
comp64 = np.array([1+2j, 3+4j], dtype=np.complex64)
print("complex64 itemsize:", comp64.itemsize)  # 8 字节（实部+虚部各4字节）

# 访问实部和虚部
print("实部:", comp_arr.real)
print("虚部:", comp_arr.imag)

输出：

默认复数类型: complex128
complex64 itemsize: 8
实部: [1. 3. 5.]
虚部: [2. 4. 6.]

7. 字符串类型

NumPy也可以存储字符串，有两种表示：|S 表示固定长度的字节串，<U 表示固定长度的Unicode字符串。长度需要在创建时指定。

# 创建长度为5的字节串数组
str_arr = np.array(['hello', 'world'], dtype='S5')
print("字节串数组:", str_arr)
print("数据类型:", str_arr.dtype)  # '|S5'

# 创建Unicode数组（最大长度10）
uni_arr = np.array(['你好', 'NumPy'], dtype='U10')
print("Unicode数组:", uni_arr)
print("数据类型:", uni_arr.dtype)  # '<U10'

输出：

字节串数组: [b'hello' b'world']
数据类型: |S5
Unicode数组: ['你好' 'NumPy']
数据类型: <U10

注意：字符串长度必须一致，如果字符串超过指定长度会被截断；短于长度的会补空字符。

8. 类型转换的注意事项

溢出：当转换到更小的整数类型时，数值可能会溢出（不会报错，但结果会按二进制截断）。
精度丢失：从高精度浮点转换为低精度浮点会丢失精度。
布尔转换：非零数值转换为 True，零转换为 False。
复数转实数：直接转换会丢弃虚部（但会引发警告？实际上 astype(float) 只能用于实部为0的复数，否则会抛出异常）。

# 溢出示例
big_int = np.array([300], dtype=np.int16)  # 300在int16范围内(-32768~32767)
print("int16 300:", big_int)

# 转换到 int8 会溢出（范围-128~127）
overflow = big_int.astype(np.int8)
print("int8 溢出结果:", overflow)  # 300-256=44

# 布尔转换
nums = np.array([0, 1, -1, 3.5])
print("布尔转换:", nums.astype(bool))

输出：

int16 300: [300]
int8 溢出结果: [44]
布尔转换: [False  True  True  True]

9. 数据类型与性能

选择合适的数据类型可以显著影响内存占用和计算速度：

如果数据范围较小（如0-255），可以使用 uint8 替代 int64，内存减少8倍。
在GPU计算中，使用 float32 通常比 float64 快得多。
但过小的类型可能导致溢出或精度不足，需要权衡。

10. 结构化数据类型（进阶）

NumPy还支持复合数据类型（类似C的结构体），用于处理表格数据。例如：

# 定义一个结构化数据类型：名字（字符串），年龄（int32），身高（float64）
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('height', 'f8')])

# 创建结构化数组
people = np.array([('Alice', 30, 165.5), ('Bob', 25, 175.2)], dtype=dt)
print("结构化数组:\n", people)
print("所有人的年龄:", people['age'])

输出：

结构化数组:
 [('Alice', 30, 165.5) ('Bob', 25, 175.2)]
所有人的年龄: [30 25]

总结： 数据类型是NumPy高效计算的基石。通过 dtype 可以灵活控制数据的存储和转换，从而优化性能和内存使用。下一章我们将学习如何通过索引和切片访问数组元素。

NumPy教程