NumPy 随机数生成

在科学计算、机器学习、模拟等领域，随机数生成是一项基础且重要的功能。NumPy的random模块提供了丰富的随机数生成函数，支持各种概率分布，并且性能优异。本章将详细介绍如何使用NumPy生成随机数。

1. 设置随机种子

随机种子用于初始化随机数生成器。设置相同的种子可以保证随机数的可重复性，这在调试和结果复现时非常重要。

import numpy as np

# 设置随机种子
np.random.seed(42)

# 生成随机数
print(np.random.rand(3))   # 每次运行结果相同

输出：

[0.37454012 0.95071431 0.73199394]

注意：np.random.seed 会影响后续所有随机函数（在旧接口中）。新接口使用 Generator 对象，种子在创建生成器时设置。

2. 新旧随机数生成器接口

NumPy 1.17 之后引入了新的随机数生成器系统，推荐使用 np.random.default_rng() 创建生成器对象，它提供了更灵活、更透明的位生成器管理。

# 旧接口（仍可用，但推荐新接口）
from numpy.random import RandomState
rng_old = RandomState(42)
print(rng_old.rand(3))

# 新接口（推荐）
rng = np.random.default_rng(42)   # 传入种子
print(rng.random(3))               # 生成 [0,1) 均匀分布

输出：

[0.37454012 0.95071431 0.73199394]
[0.77395605 0.43887844 0.85859792]

新旧接口的随机数算法可能不同，即使相同种子结果也可能不同。新接口使用更现代的算法（PCG64）。后续示例将主要使用新接口。

3. 基本随机数生成

3.1 均匀分布

random(size)：生成 [0, 1) 区间均匀分布的随机数。
uniform(low, high, size)：生成 [low, high) 区间均匀分布的随机数。

rng = np.random.default_rng(42)

# 生成3个 [0,1) 随机数
print("rng.random(3):", rng.random(3))

# 生成2行3列 [0,1) 随机数矩阵
print("rng.random((2,3)):\n", rng.random((2,3)))

# 生成 [5,10) 之间的均匀分布
print("rng.uniform(5,10,4):", rng.uniform(5,10,4))

输出：

rng.random(3): [0.77395605 0.43887844 0.85859792]
rng.random((2,3)):
 [[0.69736803 0.09417735 0.97562235]
 [0.7611397  0.78606431 0.12811363]]
rng.uniform(5,10,4): [5.42467924 7.36137497 9.79611933 5.87953153]

3.2 标准正态分布

normal(loc, scale, size) 生成正态分布，其中 loc 是均值（默认为0），scale 是标准差（默认为1）。

# 标准正态分布 (均值0, 标准差1)
print("rng.normal(size=5):", rng.normal(size=5))

# 自定义均值和标准差
print("rng.normal(100, 10, 3):", rng.normal(100, 10, 3))

输出：

rng.normal(size=5): [0.54071741 0.55154385 1.11561399 1.0675796  0.8570842 ]
rng.normal(100, 10, 3): [106.79757552 100.11441007 101.91608112]

3.3 随机整数

integers(low, high, size, endpoint=False) 生成随机整数，区间为 [low, high)，若 endpoint=True 则包含 high。

# 生成 [0,10) 内的5个整数
print("rng.integers(0,10,5):", rng.integers(0,10,5))

# 生成 [1,6] 包含6的整数
print("rng.integers(1,6,4,endpoint=True):", rng.integers(1,6,4,endpoint=True))

# 二维整数数组
print("rng.integers(0,5,(2,3)):\n", rng.integers(0,5,(2,3)))

输出：

rng.integers(0,10,5): [0 1 4 2 5]
rng.integers(1,6,4,endpoint=True): [2 5 5 2]
rng.integers(0,5,(2,3)):
 [[4 2 1]
 [2 4 4]]

3.4 其他常见分布

函数	分布	参数
`binomial(n, p, size)`	二项分布	n:试验次数, p:成功概率
`poisson(lam, size)`	泊松分布	lam:λ参数
`exponential(scale, size)`	指数分布	scale:1/λ
`gamma(shape, scale, size)`	伽马分布	shape:形状, scale:尺度
`beta(a, b, size)`	贝塔分布	a,b:形状参数
`chisquare(df, size)`	卡方分布	df:自由度

# 二项分布：抛10次硬币，正面概率0.5，进行5组实验
print("rng.binomial(10,0.5,5):", rng.binomial(10,0.5,5))

# 泊松分布：λ=3
print("rng.poisson(3,6):", rng.poisson(3,6))

输出：

rng.binomial(10,0.5,5): [7 5 7 4 3]
rng.poisson(3,6): [3 3 4 3 2 2]

4. 随机抽样与打乱

4.1 `choice`：从给定数组中随机选择

choice(a, size, replace=True, p=None) 从数组 a 中随机抽取元素，可指定是否放回、概率权重。

population = np.array(['red', 'blue', 'green', 'yellow'])

# 随机抽取3个（可重复）
print("choice with replace:", rng.choice(population, size=3))

# 不放回抽样
print("choice without replace:", rng.choice(population, size=3, replace=False))

# 指定概率
prob = [0.1, 0.2, 0.3, 0.4]
print("choice with probability:", rng.choice(population, size=5, p=prob))

输出：

choice with replace: ['red' 'green' 'yellow']
choice without replace: ['yellow' 'green' 'blue']
choice with probability: ['yellow' 'yellow' 'green' 'yellow' 'blue']

4.2 `shuffle` 和 `permutation`

shuffle 原地打乱数组，permutation 返回打乱后的副本或随机排列。

arr = np.arange(10)
print("原数组:", arr)

# shuffle 原地打乱
rng.shuffle(arr)
print("shuffle 后:", arr)

# permutation 返回副本，不修改原数组
arr2 = np.arange(10)
perm = rng.permutation(arr2)
print("permutation 结果:", perm)
print("原数组未变:", arr2)

输出：

原数组: [0 1 2 3 4 5 6 7 8 9]
shuffle 后: [4 1 8 3 0 9 6 5 7 2]
permutation 结果: [3 8 1 4 2 9 7 5 6 0]
原数组未变: [0 1 2 3 4 5 6 7 8 9]

4.3 多维数组的打乱

shuffle 对于多维数组，默认仅打乱第一轴（行），行内元素保持不变。

mat = np.arange(12).reshape(4,3)
print("原矩阵:\n", mat)
rng.shuffle(mat)
print("shuffle 后（打乱行）:\n", mat)

输出：

原矩阵:
 [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
shuffle 后（打乱行）:
 [[ 6  7  8]
 [ 3  4  5]
 [ 9 10 11]
 [ 0  1  2]]

5. 随机数生成器的高级用法

新接口的 Generator 对象还提供了其他有用的方法：

permuted(x, axis=None)：独立地沿指定轴打乱数组（与 shuffle 不同，它不会改变数组的原始顺序关系）。
bytes(n)：生成 n 个随机字节。
支持多种位生成器（PCG64、MT19937 等），可通过 np.random.Generator 指定。

# permuted 示例：每列独立打乱
mat = np.arange(12).reshape(4,3)
print("原矩阵:\n", mat)
mat_permuted = rng.permuted(mat, axis=0)   # 沿行方向（每列独立打乱）
print("permuted (axis=0):\n", mat_permuted)

# 随机字节
print("rng.bytes(8):", rng.bytes(8))

输出：

原矩阵:
 [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
permuted (axis=0):
 [[ 9  7  5]
 [ 3  4  8]
 [ 6  1  2]
 [ 0 10 11]]
rng.bytes(8): b'\x82\x1f\x97\xd4\xb9\xf2\xd0\xdd'

6. 可重复性与并行生成

在需要并行生成随机数时，可以为每个线程/进程创建独立的 Generator 实例，并使用不同的种子或通过 SeedSequence 派生。

from numpy.random import SeedSequence, Generator, PCG64

# 创建主种子序列
ss = SeedSequence(12345)

# 为3个进程派生3个生成器
child_seeds = ss.spawn(3)
rngs = [Generator(PCG64(s)) for s in child_seeds]

# 每个生成器产生独立的随机数流
for i, r in enumerate(rngs):
    print(f"rng{i}:", r.random(3))

输出：

rng0: [0.22733602 0.31675834 0.79736546]
rng1: [0.67625467 0.39110955 0.33281393]
rng2: [0.59830875 0.18673419 0.67275604]

提示：

对于大多数应用，使用 np.random.default_rng() 创建生成器即可，它会自动选择较好的位生成器。
设置种子时，推荐使用 default_rng(seed) 而非全局 np.random.seed，避免影响其他代码。
需要最高性能时，可以指定位生成器如 PCG64 或 SFC64。

总结

NumPy的随机数生成模块功能强大且灵活。通过新接口 Generator，我们可以方便地生成各种分布的随机数，进行随机抽样和打乱操作，并确保可重复性和并行安全性。掌握这些工具，将为数据模拟、机器学习实验等提供坚实基础。下一章我们将学习NumPy的统计函数。

NumPy教程