NumPy 数组合并与切割

在实际数据处理中，经常需要将多个数组合并成一个，或者将一个数组切割成多个子数组。NumPy提供了丰富的合并和切割函数，能够灵活地处理各种维度的数组。

一、数组合并

1. `np.concatenate()` - 沿现有轴拼接

concatenate 是最基本的合并函数，沿指定的轴（axis）将多个数组拼接在一起。除了拼接轴外，其他轴的形状必须相同。

import numpy as np

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])

# 沿 axis=0（垂直方向）拼接
c = np.concatenate((a, b), axis=0)
print("concatenate axis=0:\n", c)

# 沿 axis=1（水平方向）拼接，需要调整b的形状
b_resized = b.reshape(2, 1)   # 变为 (2,1)
d = np.concatenate((a, b_resized), axis=1)
print("concatenate axis=1:\n", d)

# 一维数组拼接
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
z = np.concatenate((x, y))
print("1D concatenate:", z)

输出：

concatenate axis=0:
 [[1 2]
 [3 4]
 [5 6]]
concatenate axis=1:
 [[1 2 5]
 [3 4 6]]
1D concatenate: [1 2 3 4 5 6]

2. `np.vstack()` 和 `np.hstack()` - 垂直与水平堆叠

vstack 相当于 concatenate(axis=0)，hstack 相当于 concatenate(axis=1)，但用法更简便。

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# vstack：垂直堆叠，相当于行增加（一维数组会变成二维）
v = np.vstack((a, b))
print("vstack:\n", v)

# hstack：水平堆叠，相当于列增加（一维数组保持一维）
h = np.hstack((a, b))
print("hstack:", h)

# 二维数组示例
c = np.array([[1, 2], [3, 4]])
d = np.array([[5, 6]])
print("vstack 2D:\n", np.vstack((c, d)))
print("hstack 2D:\n", np.hstack((c, d.T)))   # d.T 转置为 (2,1)

输出：

vstack:
 [[1 2 3]
 [4 5 6]]
hstack: [1 2 3 4 5 6]
vstack 2D:
 [[1 2]
 [3 4]
 [5 6]]
hstack 2D:
 [[1 2 5]
 [3 4 6]]

3. `np.dstack()` - 沿深度（第三维）堆叠

dstack 沿第三维（axis=2）堆叠数组。如果数组是二维，会先被扩展为三维（形状为 (rows, cols, 1)）。

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
c = np.dstack((a, b))
print("dstack shape:", c.shape)
print("dstack:\n", c)

# 一维数组示例
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
z = np.dstack((x, y))   # 先扩展为 (1,3,1) 然后堆叠
print("1D dstack shape:", z.shape)

输出：

dstack shape: (2, 2, 2)
dstack:
 [[[1 5]
  [2 6]]

 [[3 7]
  [4 8]]]
1D dstack shape: (1, 3, 2)

4. `np.stack()` - 沿新轴堆叠

stack 会在指定的轴上插入一个新的维度，然后将数组沿该轴堆叠。所有数组必须具有相同的形状。

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# 沿新轴 0 堆叠，结果形状 (2,3)
s0 = np.stack((a, b), axis=0)
print("stack axis=0:\n", s0)

# 沿新轴 1 堆叠，结果形状 (3,2)
s1 = np.stack((a, b), axis=1)
print("stack axis=1:\n", s1)

# 对于二维数组
c = np.array([[1, 2], [3, 4]])
d = np.array([[5, 6], [7, 8]])
s2 = np.stack((c, d), axis=2)   # 新轴在最后，形状 (2,2,2)
print("stack axis=2 shape:", s2.shape)

输出：

stack axis=0:
 [[1 2 3]
 [4 5 6]]
stack axis=1:
 [[1 4]
 [2 5]
 [3 6]]
stack axis=2 shape: (2, 2, 2)

5. `np.column_stack()` 和 `np.row_stack()`

column_stack 将一维数组按列堆叠成二维数组；row_stack 相当于 vstack。

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.column_stack((a, b))
print("column_stack:\n", c)   # shape (3,2)

d = np.row_stack((a, b))
print("row_stack:\n", d)      # shape (2,3)

输出：

column_stack:
 [[1 4]
 [2 5]
 [3 6]]
row_stack:
 [[1 2 3]
 [4 5 6]]

二、数组切割

1. `np.split()` - 沿指定轴分割

split 将数组沿指定轴分割成多个子数组。可以指定分割点列表或等分数量。

arr = np.arange(12).reshape(3, 4)
print("原数组:\n", arr)

# 等分：将数组沿 axis=1 分成 2 个等份（必须能整除）
first, second = np.split(arr, 2, axis=1)
print("split into 2 along axis=1:")
print("第一块:\n", first)
print("第二块:\n", second)

# 指定分割点：在索引 1 和 3 处分割（即第1列之后和第3列之后）
s1, s2, s3 = np.split(arr, [1, 3], axis=1)
print("split at [1,3] along axis=1:")
print("第一块:\n", s1)
print("第二块:\n", s2)
print("第三块:\n", s3)

输出：

原数组:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
split into 2 along axis=1:
第一块:
 [[0 1]
 [4 5]
 [8 9]]
第二块:
 [[ 2  3]
 [ 6  7]
 [10 11]]
split at [1,3] along axis=1:
第一块:
 [[0]
 [4]
 [8]]
第二块:
 [[ 1  2]
 [ 5  6]
 [ 9 10]]
第三块:
 [[ 3]
 [ 7]
 [11]]

2. `np.array_split()` - 允许不等分分割

与 split 类似，但允许不等分（当无法整除时，前几个子数组会多一个元素）。

arr = np.arange(10)
print("原数组:", arr)

# 分成 3 份（无法整除）
parts = np.array_split(arr, 3)
for i, part in enumerate(parts):
    print(f"part {i}:", part)

输出：

原数组: [0 1 2 3 4 5 6 7 8 9]
part 0: [0 1 2 3]
part 1: [4 5 6]
part 2: [7 8 9]

3. `np.hsplit()` - 水平分割（按列）

相当于 split(axis=1) 的快捷方式。

arr = np.arange(12).reshape(3, 4)
print("原数组:\n", arr)

# 分成 2 块
h1, h2 = np.hsplit(arr, 2)
print("hsplit 2:\n", h1, "\n", h2)

# 指定分割点
h1, h2, h3 = np.hsplit(arr, [1, 3])   # 在列索引1和3处分割
print("hsplit at [1,3]:")
print(h1, h2, h3, sep='\n')

输出：

原数组:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
hsplit 2:
 [[0 1]
 [4 5]
 [8 9]]
 [[ 2  3]
 [ 6  7]
 [10 11]]
hsplit at [1,3]:
[[0]
 [4]
 [8]]
[[ 1  2]
 [ 5  6]
 [ 9 10]]
[[ 3]
 [ 7]
 [11]]

4. `np.vsplit()` - 垂直分割（按行）

相当于 split(axis=0) 的快捷方式。

arr = np.arange(12).reshape(4, 3)
print("原数组:\n", arr)

v1, v2 = np.vsplit(arr, 2)
print("vsplit 2:\n", v1, "\n", v2)

v1, v2, v3 = np.vsplit(arr, [1, 3])   # 在行索引1和3处分割
print("vsplit at [1,3]:")
print(v1, v2, v3, sep='\n')

输出：

原数组:
 [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
vsplit 2:
 [[0 1 2]
 [3 4 5]]
 [[ 6  7  8]
 [ 9 10 11]]
vsplit at [1,3]:
[[0 1 2]]
[[3 4 5]
 [6 7 8]]
[[ 9 10 11]]

5. `np.dsplit()` - 沿深度（第三维）分割

沿第三维（axis=2）分割数组，要求数组至少是三维。

arr = np.arange(24).reshape(2, 3, 4)
print("原数组 shape:", arr.shape)

# 沿第三维分成 2 份
d1, d2 = np.dsplit(arr, 2)
print("dsplit 2: d1 shape", d1.shape, "d2 shape", d2.shape)

# 在索引 1 处分割
d1, d2 = np.dsplit(arr, [1])
print("dsplit at [1]: d1 shape", d1.shape, "d2 shape", d2.shape)

输出：

原数组 shape: (2, 3, 4)
dsplit 2: d1 shape (2, 3, 2) d2 shape (2, 3, 2)
dsplit at [1]: d1 shape (2, 3, 1) d2 shape (2, 3, 3)

三、视图与副本的说明

合并函数（如 concatenate, stack, vstack 等）总是返回新的数组（副本），修改合并后的数组不会影响原数组。
切割函数（如 split, hsplit, vsplit, dsplit, array_split）返回的是原数组的视图列表，即子数组与原数组共享内存。因此，修改子数组会影响原数组。

# 切割返回视图的示例
arr = np.arange(12).reshape(3, 4)
pieces = np.hsplit(arr, 2)
pieces[0][0,0] = 999
print("修改子数组后原数组:\n", arr)   # 原数组被修改

# 合并返回副本的示例
a = np.array([1,2,3])
b = np.array([4,5,6])
c = np.hstack((a,b))
c[0] = 999
print("修改合并数组后原数组 a:", a)   # 原数组不变

输出：

修改子数组后原数组:
 [[999   1   2   3]
 [  4   5   6   7]
 [  8   9  10  11]]
修改合并数组后原数组 a: [1 2 3]

提示： 如果需要切割后得到独立副本，可以使用 copy() 方法，例如 sub_arr = arr[:2].copy()。

总结

本节介绍了NumPy中用于数组合并和切割的常用函数。合并函数包括 concatenate、vstack、hstack、dstack、stack、column_stack 等，切割函数包括 split、hsplit、vsplit、dsplit、array_split 等。理解它们之间的区别以及视图/副本的行为，能够帮助你在数据处理中灵活构建和拆分数据集。接下来，我们将学习NumPy的数组运算。

上一章: 数组形状操作下一章: 复制与视图

NumPy教程

NumPy 数组 合并与切割

一、数组合并

1. np.concatenate() - 沿现有轴拼接

2. np.vstack() 和 np.hstack() - 垂直与水平堆叠

3. np.dstack() - 沿深度（第三维）堆叠

4. np.stack() - 沿新轴堆叠

5. np.column_stack() 和 np.row_stack()