Appendix E — dplyr 速查表

dplyr 函数配合管道操作符使用，并且要求数据满足整洁格式。

在整洁数据中：

每个变量独占一列
每个观测值或样本独占一行
管道操作符 x |> f(y) 等价于 f(x,y)

library(dplyr)

E.1 汇总观测值

对列应用汇总函数生成汇总统计表。汇总函数接收向量输入并返回单个值输出（参见汇总函数章节）。

summarize(.data, ...)：创建汇总统计表

::: {.cell}
```
mtcars |> summarize(avg = mean(mpg))
```
::: {.cell-output .cell-output-stdout}
```
# A tibble: 1 × 1
    avg
  <dbl>
1  20.1
```
::: :::
count(.data, ..., wt = NULL, sort = FLASE, name = NULL)：按 ... 变量分组的行数统计。类似函数还有 tally(), add_count(), add_tally()。

::: {.cell}
```
mtcars |> count(cyl)
```
::: {.cell-output .cell-output-stdout}
```
# A tibble: 3 × 2
    cyl     n
  <dbl> <int>
1     4    11
2     6     7
3     8    14
```
::: :::

E.2 分组观测

group_by(.data, ..., .add = FALSE, .drop = TRUE) 创建分组副本，dplyr 函数会对每个分组单独操作后合并结果

::: {.cell}
```
mtcars |>
  group_by(cyl) |>
  summarize(avg = mean(mpg))
```
::: {.cell-output .cell-output-stdout}
```
# A tibble: 3 × 2
    cyl   avg
  <dbl> <dbl>
1     4  26.7
2     6  19.7
3     8  15.1
```
::: :::

rowwise(.data, ...) 将数据转换为逐行分组模式，便于按行运算（适合处理列表类列数据），详见 tidyr 速查表中的列表列工作流

::: {.cell}

starwars |>
  rowwise() |>
  mutate(film_count = length(films))

::: {.cell-output .cell-output-stdout}

# A tibble: 87 × 15
# Rowwise: 
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 6 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>, film_count <int>

::: :::

ungroup(x, ...) 返回非分组数据副本

E.3 操作观测

E.3.1 提取观测

行操作函数返回满足条件的行子集表

filter(.data, ..., .preserve = FALSE) 筛选符合逻辑条件的行

::: {.cell}

mtcars |> filter(mpg > 20)

::: {.cell-output .cell-output-stdout}

# A tibble: 14 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6 160     110  3.9   2.62  16.5     0     1     4     4
 2  21       6 160     110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6 258     110  3.08  3.22  19.4     1     0     3     1
 5  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
 6  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
 7  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
 8  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
 9  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
10  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
11  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
12  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
13  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
14  21.4     4 121     109  4.11  2.78  18.6     1     1     4     2

::: :::

distinct(.data, ..., .keep_all = FALSE) 删除重复行

::: {.cell}
```
mtcars |> distinct(gear)
```
::: {.cell-output .cell-output-stdout}
```
# A tibble: 3 × 1
   gear
  <dbl>
1     4
2     3
3     5
```
::: :::

slice(.data, ...,, .preserve = FALSE) 按位置选择行

::: {.cell}

mtcars |> slice(10:15)

::: {.cell-output .cell-output-stdout}

# A tibble: 6 × 11
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
2  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
3  16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
4  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
5  15.2     8  276.   180  3.07  3.78  18       0     0     3     3
6  10.4     8  472    205  2.93  5.25  18.0     0     0     3     4

::: :::

slice_sample(.data, ..., n, prop, weight_by = NULL, replace = FALSE) 随机抽样行，支持数量抽样（n）和比例抽样（prop）

::: {.cell}

mtcars |> slice_sample(n = 5, replace = TRUE)

::: {.cell-output .cell-output-stdout}

# A tibble: 5 × 11
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  26       4  120.    91  4.43  2.14  16.7     0     1     5     2
2  19.2     8  400    175  3.08  3.84  17.0     0     0     3     2
3  10.4     8  472    205  2.93  5.25  18.0     0     0     3     4
4  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
5  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4

::: :::

slice_min(.data, order_by, ..., n, prop, with_ties = TRUE) 与 slice_max() 选择极值行

::: {.cell}

mtcars |> slice_min(mpg, prop = 0.25)

::: {.cell-output .cell-output-stdout}

# A tibble: 8 × 11
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  10.4     8  472    205  2.93  5.25  18.0     0     0     3     4
2  10.4     8  460    215  3     5.42  17.8     0     0     3     4
3  13.3     8  350    245  3.73  3.84  15.4     0     0     3     4
4  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
5  14.7     8  440    230  3.23  5.34  17.4     0     0     3     4
6  15       8  301    335  3.54  3.57  14.6     0     1     5     8
7  15.2     8  276.   180  3.07  3.78  18       0     0     3     3
8  15.2     8  304    150  3.15  3.44  17.3     0     0     3     2

::: :::

slice_head(.data, ..., n, prop) 与 slice_tail() 选择首尾行

::: {.cell}

mtcars |> slice_head(n = 5)

::: {.cell-output .cell-output-stdout}

# A tibble: 5 × 11
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2

::: :::

E.3.1.1 适用于 `filter()` 的逻辑运算符

==
<
<=
is.na()
%in%
|
xor()
!=
>
>=
!is.na()
!
&
其他运算符参见 ?base::Logic 和 ?Comparison

E.3.2 排序观测

arrange(.data, ..., .by_group = FALSE) 按列值升序排列，配合 desc() 实现降序

::: {.cell}

mtcars |> arrange(mpg)

::: {.cell-output .cell-output-stdout}

# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  10.4     8  472    205  2.93  5.25  18.0     0     0     3     4
 2  10.4     8  460    215  3     5.42  17.8     0     0     3     4
 3  13.3     8  350    245  3.73  3.84  15.4     0     0     3     4
 4  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 5  14.7     8  440    230  3.23  5.34  17.4     0     0     3     4
 6  15       8  301    335  3.54  3.57  14.6     0     1     5     8
 7  15.2     8  276.   180  3.07  3.78  18       0     0     3     3
 8  15.2     8  304    150  3.15  3.44  17.3     0     0     3     2
 9  15.5     8  318    150  2.76  3.52  16.9     0     0     3     2
10  15.8     8  351    264  4.22  3.17  14.5     0     1     5     4
# ℹ 22 more rows

:::

mtcars |> arrange(desc(mpg))

::: {.cell-output .cell-output-stdout}

# A tibble: 32 × 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
 2  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
 3  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
 4  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
 5  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
 6  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
 7  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
 8  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
 9  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
10  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
# ℹ 22 more rows

::: :::

E.3.3 添加观测

add_row(.data, ..., .before = NULL, .after = NULL) 添加一行或多行数据

::: {.cell}

cars |> add_row(speed = 1, dist = 1)

::: {.cell-output .cell-output-stdout}

   speed dist
1      4    2
2      4   10
3      7    4
4      7   22
5      8   16
6      9   10
7     10   18
8     10   26
9     10   34
10    11   17
11    11   28
12    12   14
13    12   20
14    12   24
15    12   28
16    13   26
17    13   34
18    13   34
19    13   46
20    14   26
21    14   36
22    14   60
23    14   80
24    15   20
25    15   26
26    15   54
27    16   32
28    16   40
29    17   32
30    17   40
31    17   50
32    18   42
33    18   56
34    18   76
35    18   84
36    19   36
37    19   46
38    19   68
39    20   32
40    20   48
41    20   52
42    20   56
43    20   64
44    22   66
45    23   54
46    24   70
47    24   92
48    24   93
49    24  120
50    25   85
51     1    1

::: :::

E.4 操作变量

E.4.1 提取变量

列操作函数返回列的子集（向量或表格）

pull(.data, var = -1, name = NULL, ...) 按名称或位置提取列向量

::: {.cell}

mtcars |> pull(wt)

::: {.cell-output .cell-output-stdout}

 [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070
[13] 3.730 3.780 5.250 5.424 5.345 2.200 1.615 1.835 2.465 3.520 3.435 3.840
[25] 3.845 1.935 2.140 1.513 3.170 2.770 3.570 2.780

::: :::

select(.data, ...) 提取列生成新表

::: {.cell}

mtcars |> select(mpg, wt)

::: {.cell-output .cell-output-stdout}

# A tibble: 32 × 2
     mpg    wt
   <dbl> <dbl>
 1  21    2.62
 2  21    2.88
 3  22.8  2.32
 4  21.4  3.22
 5  18.7  3.44
 6  18.1  3.46
 7  14.3  3.57
 8  24.4  3.19
 9  22.8  3.15
10  19.2  3.44
# ℹ 22 more rows

::: :::

relocate(.data, ..., .before = NULL, .after = NULL) 调整列位置

::: {.cell}

mtcars |> relocate(mpg, cyl, after = last_col())

::: {.cell-output .cell-output-stdout}

# A tibble: 32 × 11
     mpg   cyl after  disp    hp  drat    wt  qsec    vs    am  gear
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6     4  160    110  3.9   2.62  16.5     0     1     4
 2  21       6     4  160    110  3.9   2.88  17.0     0     1     4
 3  22.8     4     1  108     93  3.85  2.32  18.6     1     1     4
 4  21.4     6     1  258    110  3.08  3.22  19.4     1     0     3
 5  18.7     8     2  360    175  3.15  3.44  17.0     0     0     3
 6  18.1     6     1  225    105  2.76  3.46  20.2     1     0     3
 7  14.3     8     4  360    245  3.21  3.57  15.8     0     0     3
 8  24.4     4     2  147.    62  3.69  3.19  20       1     0     4
 9  22.8     4     2  141.    95  3.92  3.15  22.9     1     0     4
10  19.2     6     4  168.   123  3.92  3.44  18.3     1     0     4
# ℹ 22 more rows

::: :::

E.4.1.1 `select()` 和 `across()` 辅助函数

mtcars |> select(mpg:cyl)

# A tibble: 32 × 2
     mpg   cyl
   <dbl> <dbl>
 1  21       6
 2  21       6
 3  22.8     4
 4  21.4     6
 5  18.7     8
 6  18.1     6
 7  14.3     8
 8  24.4     4
 9  22.8     4
10  19.2     6
# ℹ 22 more rows

contains(match)
num_range(prefix, range)
:，例如 mpg:cyl
ends_with(match)
all_of(x) 或 any_of(x, ..., vars)
!，例如 !gear
starts_with(match)
matches(match)
everything()

E.4.2 批量操作多列

df <- tibble(x_1 = c(1, 2), x_2 = c(3, 4), y = c(4, 5))

across(.cols, .fun, ..., .name = NULL) 对多列进行统一汇总或转换操作

::: {.cell}
```
df |> summarize(across(everything(), mean))
```
::: {.cell-output .cell-output-stdout}
```
# A tibble: 1 × 3
    x_1   x_2     y
  <dbl> <dbl> <dbl>
1   1.5   3.5   4.5
```
::: :::

c_across(.cols) 在行式数据中跨列运算

::: {.cell}

df |> 
  rowwise() |>
  mutate(x_total = sum(c_across(1:2)))

::: {.cell-output .cell-output-stdout}

# A tibble: 2 × 4
# Rowwise: 
    x_1   x_2     y x_total
  <dbl> <dbl> <dbl>   <dbl>
1     1     3     4       4
2     2     4     5       6

::: :::

E.4.3 创建新变量

对列应用向量化函数（接收向量输入并返回等长向量，参见向量化函数章节）生成新列

mutate(.data, ..., .keep = "all", .before = NULL, .after = NULL) 创建新列，类似函数 add_column()

::: {.cell}

mtcars |> mutate(gpm = 1 / mpg)

::: {.cell-output .cell-output-stdout}

# A tibble: 32 × 12
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb    gpm
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4 0.0476
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4 0.0476
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1 0.0439
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1 0.0467
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 0.0535
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1 0.0552
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 0.0699
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2 0.0410
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2 0.0439
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4 0.0521
# ℹ 22 more rows

:::

mtcars |> mutate(mtcars, gpm = 1 / mpg, .keep = "none")

::: {.cell-output .cell-output-stdout}

# A tibble: 32 × 12
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb    gpm
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4 0.0476
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4 0.0476
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1 0.0439
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1 0.0467
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2 0.0535
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1 0.0552
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4 0.0699
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2 0.0410
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2 0.0439
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4 0.0521
# ℹ 22 more rows

::: :::

rename(.data, ...) 重命名列，rename_with() 可通过函数批量重命名

::: {.cell}

mtcars |> rename(miles_per_gallon = mpg)

::: {.cell-output .cell-output-stdout}

# A tibble: 32 × 11
   miles_per_gallon   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1             21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2             21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3             22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4             21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5             18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6             18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7             14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8             24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9             22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10             19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# ℹ 22 more rows

::: :::

E.5 向量化函数

E.5.1 适用于 `mutate()`

E.5.2 位移操作

dplyr::lag() 向前位移
dplyr::lead() 向后位移

E.5.3 累积计算

dplyr::cumall() 累积逻辑与
dply::cumany() 累积逻辑或
cummax() 累积最大值
dplyr::cummean() 累积均值
cummin() 累积最小值
cumprod() 累积乘积
cumsum() 累积求和

E.5.4 排序计算

dplyr::cume_dist() 累积分布比例
dplyr::dense_rank() 紧密排名（无间隔）
dplyr::min_rank() 最小排名
dplyr::ntile() 分位数分组
dplyr::percent_rank() 标准化排名
dplyr::row_number() 行编号（首位优先）

E.5.5 数学运算

+, -, /, ^, %/%, %% 算术运算符
log(), log2(), log10() 对数函数
<, <=, >, >=, !=, == 比较运算符
dplyr::between() 区间判断
dplyr::near() 近似相等判断

E.5.6 扩展功能

dplyr::case_when() 多条件判断

::: {.cell}

starwars |>
  mutate(type = case_when(
    height > 200 | mass > 200 ~ "large",
    species == "Droid" ~ "robot",
    TRUE ~ "other"
  ))

::: {.cell-output .cell-output-stdout}

# A tibble: 87 × 15
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
 1 Luke Sk…    172    77 blond      fair       blue            19   male  mascu…
 2 C-3PO       167    75 <NA>       gold       yellow         112   none  mascu…
 3 R2-D2        96    32 <NA>       white, bl… red             33   none  mascu…
 4 Darth V…    202   136 none       white      yellow          41.9 male  mascu…
 5 Leia Or…    150    49 brown      light      brown           19   fema… femin…
 6 Owen La…    178   120 brown, gr… light      blue            52   male  mascu…
 7 Beru Wh…    165    75 brown      light      blue            47   fema… femin…
 8 R5-D4        97    32 <NA>       white, red red             NA   none  mascu…
 9 Biggs D…    183    84 black      light      brown           24   male  mascu…
10 Obi-Wan…    182    77 auburn, w… fair       blue-gray       57   male  mascu…
# ℹ 77 more rows
# ℹ 6 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>, type <chr>

::: :::

dplyr::coalesce() 首有效值选取
dplyr::if_else() 元素级条件判断
dplyr::na_if() 特定值替换为 NA
pmax() 元素级最大值
pmin() 元素级最小值

E.6 汇总函数

E.6.1 适用于 `summarize()`

E.6.2 计数类

dplyr::n() 计数
dplyr::n_distinct() 唯一值计数
sum(!is.na()) 非空值计数

E.6.3 位置度量

mean() 均值
median() 中位数

E.6.4 逻辑运算

mean() 真值比例
sum() 真值数量

E.6.5 顺序提取

dplyr::first() 首元素
dplyr::last() 末元素
dplyr::nth() 指定位置元素

E.6.6 极值提取

quantile() 分位数
min() 最小值
max() 最大值

E.6.7 离散度量

IQR() 四分位距
mad() 绝对中位差
sd() 标准差
var() 方差

E.7 行名处理

整洁数据不使用行名（存储于列外部的变量）。需要行名操作时：

tibble::rownames_to_column() 行名转列

::: {.cell}
```
a <- rownames_to_column(mtcars, var = "C")
```
:::

tibble::columns_to_rownames() 列转行名

::: {.cell}

column_to_rownames(a, var = "C")

::: {.cell-output .cell-output-stdout}

    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
11 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
12 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
13 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
14 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
15 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
17 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
18 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
19 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
20 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
21 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
22 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
23 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
24 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
25 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
26 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
27 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
28 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
29 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
30 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
31 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
32 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

::: :::

配套函数 tibble::has_rownames() 和 tibble::remove_rownames()

E.8 表格合并

x <- tribble(
   ~A,  ~B, ~C,
  "a", "t",  1,
  "b", "u",  2,
  "c", "v",  3
)

y <- tribble(
   ~A,  ~B, ~D,
  "a", "t",  3,
  "b", "u",  2,
  "d", "w",  1
)

E.8.1 横向合并

bind_cols(..., .name_repair) 横向拼接表格（不自动匹配列名）

E.8.2 纵向叠加

bind_rows(..., .id = NULL) 纵向合并，.id 参数可添加来源标识列

E.8.3 关系连接

扩展连接通过匹配键合并表格列：

left_join(x, y) 左连接
right_join(x, y) 右连接
inner_join(x, y) 内连接
full_join(x, y) 全连接

过滤连接按另一个表筛选当前表：

semi_join(x, y) 交集筛选
anti_join(x, y) 差集筛选

嵌套连接：

nest_join(x, y) 内连接结果嵌套存储

E.8.4 连接键设置

多列匹配：by = join_by(col1, col2)

::: {.cell}

left_join(x, y, by = join_by(A))

::: {.cell-output .cell-output-stdout}

# A tibble: 3 × 5
  A     B.x       C B.y       D
  <chr> <chr> <dbl> <chr> <dbl>
1 a     t         1 t         3
2 b     u         2 u         2
3 c     v         3 <NA>     NA

:::

left_join(x, y, by = join_by(A, B))

::: {.cell-output .cell-output-stdout}

# A tibble: 3 × 4
  A     B         C     D
  <chr> <chr> <dbl> <dbl>
1 a     t         1     3
2 b     u         2     2
3 c     v         3    NA

::: :::

跨名匹配：by = join_by(col1 == col2)

::: {.cell}

left_join(x, y, by = join_by(C == D))

::: {.cell-output .cell-output-stdout}

# A tibble: 3 × 5
  A.x   B.x       C A.y   B.y  
  <chr> <chr> <dbl> <chr> <chr>
1 a     t         1 d     w    
2 b     u         2 b     u    
3 c     v         3 a     t

::: :::

同名后缀参数：suffix

::: {.cell}

left_join(x, y, by = join_by(C == D), suffix = c("1", "2"))

::: {.cell-output .cell-output-stdout}

# A tibble: 3 × 5
  A1    B1        C A2    B2   
  <chr> <chr> <dbl> <chr> <chr>
1 a     t         1 d     w    
2 b     u         2 b     u    
3 c     v         3 a     t

::: :::

E.8.5 集合运算

intersect(x, y) 交集
setdiff(x, y) 差集
union(x, y) 并集（去重）
setequal() 集合相等判断

# 设置随机种子
set.seed(123)

# 生成两个随机字母向量
x = sample(letters, 5)
y = sample(letters, 8)
print(x)

[1] "o" "s" "n" "c" "j"

print(y)

[1] "r" "v" "k" "e" "t" "n" "w" "s"

# 交集  
intersect(x, y)

[1] "s" "n"

# 差集
setdiff(x, y)

[1] "o" "c" "j"

# 并集
union(x, y)

 [1] "o" "s" "n" "c" "j" "r" "v" "k" "e" "t" "w"

# 集合相等
setequal(x, y)

[1] FALSE

E.1 汇总观测值

E.2 分组观测

E.3 操作观测

E.3.1 提取观测

E.3.1.1 适用于 filter() 的逻辑运算符

E.3.2 排序观测

E.3.3 添加观测

E.4 操作变量

E.4.1 提取变量

E.4.1.1 select() 和 across() 辅助函数

E.4.2 批量操作多列

E.4.3 创建新变量

E.5 向量化函数

E.5.1 适用于 mutate()

E.5.2 位移操作

E.5.3 累积计算

E.5.4 排序计算

E.5.5 数学运算

E.5.6 扩展功能

E.6 汇总函数

E.6.1 适用于 summarize()

E.6.2 计数类

E.6.3 位置度量

E.6.4 逻辑运算

E.6.5 顺序提取

E.6.6 极值提取

E.6.7 离散度量

E.7 行名处理

E.8 表格合并

E.8.1 横向合并

E.8.2 纵向叠加

E.8.3 关系连接

E.8.4 连接键设置

E.8.5 集合运算

E.3.1.1 适用于 `filter()` 的逻辑运算符

E.4.1.1 `select()` 和 `across()` 辅助函数

E.5.1 适用于 `mutate()`

E.6.1 适用于 `summarize()`