Introduction#
We’re delighted to announce the release of dplyr 0.8.2 on CRAN 🍉 !
This is a minor maintenance release in the 0.8.* series, addressing a collection of
issues since the
0.8.1
and
0.8.0
versions.
top_n() and top_frac()#
top_n() has been around for a long time in dplyr , as a convenient wrapper around filter() and min_rank() , to select top (or bottom) entries in each group of a tibble.
In this release,
top_n()
is no longer
limited to a constant number of entries per group, its n argument is now quoted
to be evaluated later in the context of the group.
Here are the top half countries, i.e. n() / 2, in terms of life expectancy in 2007.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
gapminder::gapminder %>%
filter(year == 2007) %>%
group_by(continent) %>%
top_n(n() / 2, lifeExp)
#> # A tibble: 70 x 6
#> # Groups: continent [5]
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Algeria Africa 2007 72.3 33333216 6223.
#> 2 Argentina Americas 2007 75.3 40301927 12779.
#> 3 Australia Oceania 2007 81.2 20434176 34435.
#> 4 Austria Europe 2007 79.8 8199783 36126.
#> 5 Bahrain Asia 2007 75.6 708573 29796.
#> 6 Belgium Europe 2007 79.4 10392226 33693.
#> 7 Benin Africa 2007 56.7 8078314 1441.
#> 8 Canada Americas 2007 80.7 33390141 36319.
#> 9 Chile Americas 2007 78.6 16284741 13172.
#> 10 China Asia 2007 73.0 1318683096 4959.
#> # … with 60 more rowstop_frac() is new convenience shortcut for the top n percent, i.e.
gapminder::gapminder %>%
filter(year == 2007) %>%
group_by(continent) %>%
top_frac(0.5, lifeExp)
#> # A tibble: 70 x 6
#> # Groups: continent [5]
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Algeria Africa 2007 72.3 33333216 6223.
#> 2 Argentina Americas 2007 75.3 40301927 12779.
#> 3 Australia Oceania 2007 81.2 20434176 34435.
#> 4 Austria Europe 2007 79.8 8199783 36126.
#> 5 Bahrain Asia 2007 75.6 708573 29796.
#> 6 Belgium Europe 2007 79.4 10392226 33693.
#> 7 Benin Africa 2007 56.7 8078314 1441.
#> 8 Canada Americas 2007 80.7 33390141 36319.
#> 9 Chile Americas 2007 78.6 16284741 13172.
#> 10 China Asia 2007 73.0 1318683096 4959.
#> # … with 60 more rowstbl_vars() and group_cols()#
tbl_vars()
now returns a dplyr_sel_vars
object that keeps track of the grouping variables. This information powers
group_cols()
, which can now be used
in every function that uses tidy selection of columns.
Functions in the tidyverse and beyond may start to use the tbl_vars() / group_cols() duo, starting from tidyr and this pull request
# pak::pkg_install("tidyverse/tidyr#668")
iris %>%
group_by(Species) %>%
tidyr::gather("flower_att", "measurement", -group_cols())
#> # A tibble: 600 x 3
#> # Groups: Species [3]
#> Species flower_att measurement
#> <fct> <chr> <dbl>
#> 1 setosa Sepal.Length 5.1
#> 2 setosa Sepal.Length 4.9
#> 3 setosa Sepal.Length 4.7
#> 4 setosa Sepal.Length 4.6
#> 5 setosa Sepal.Length 5
#> 6 setosa Sepal.Length 5.4
#> 7 setosa Sepal.Length 4.6
#> 8 setosa Sepal.Length 5
#> 9 setosa Sepal.Length 4.4
#> 10 setosa Sepal.Length 4.9
#> # … with 590 more rowsgroup_split(), group_map(), group_modify()#
group_split()
always keeps
a ptype attribute to track the prototype of the splits.
mtcars %>%
group_by(cyl) %>%
filter(hp < 0) %>%
group_split()
#> list()
#> attr(,"ptype")
#> # A tibble: 0 x 11
#> # … with 11 variables: mpg <dbl>, cyl <dbl>, disp <dbl>, hp <dbl>,
#> # drat <dbl>, wt <dbl>, qsec <dbl>, vs <dbl>, am <dbl>, gear <dbl>,
#> # carb <dbl>group_map() and group_modify() benefit from this in the edge case where there are no groups.
mtcars %>%
group_by(cyl) %>%
filter(hp < 0) %>%
group_map(~.x)
#> list()
#> attr(,"ptype")
#> # A tibble: 0 x 10
#> # … with 10 variables: mpg <dbl>, disp <dbl>, hp <dbl>, drat <dbl>,
#> # wt <dbl>, qsec <dbl>, vs <dbl>, am <dbl>, gear <dbl>, carb <dbl>
mtcars %>%
group_by(cyl) %>%
filter(hp < 0) %>%
group_modify(~.x)
#> # A tibble: 0 x 11
#> # Groups: cyl [0]
#> # … with 11 variables: cyl <dbl>, mpg <dbl>, disp <dbl>, hp <dbl>,
#> # drat <dbl>, wt <dbl>, qsec <dbl>, vs <dbl>, am <dbl>, gear <dbl>,
#> # carb <dbl>Thanks#
Thanks to all contributors for this release.
@abirasathiy , @ajkroeg , @alejandroschuler , @anuj2054 , @arider2 , @arielfuentes , @artidata , @BenPVD , @bkmontgom , @brodieG , @cderv , @clanker , @clemenshug , @CSheehan1 , @danielecook , @dannyparsons , @daskandalis , @davidbaniadam , @DavisVaughan , @deliciouslytyped , @earowang , @fkatharina , @hadley , @Hardervidertsie , @iago-pssjd , @IndrajeetPatil , @jackdolgin , @jangorecki , @jimhester , @jjesusfilho , @jonjhitchcock , @jxu , @krlmlr , @laresbernardo , @lionel- , @LukeGoodsell , @madmark81 , @MarkusBerroth , @matheus-donato , @mattfidler , @MatthieuStigler , @md0u80c9 , @michaelhogersosis , @MikeJohnPage , @MJL9588 , @moodymudskipper , @mwillumz , @Nelson-Gon , @qdread , @randomgambit , @rcorty , @romainfrancois , @romatik , @spressi , @sstoeckl , @stephLH , @urskalbitzer , @vpanfilov , and @ZahraEconomist .




