This function is a wrapper around the system command ps
that can
be used to benchmark (peak) memory and CPU usage of parallel R code.
By taking snapshots the memory usage of R processes at a regular interval
,
the function dynamically builds up a profile of their usage of system
resources.
Usage
syrup(expr, interval = 0.5, peak = FALSE, env = caller_env())
Arguments
- expr
An expression.
- interval
The interval at which to take snapshots of respirce usage. In practice, there's an overhead on top of each of these intervals.
- peak
Whether to return rows for only the "peak" memory usage. Interpreted as the
id
with the maximumrss
sum. Defaults toFALSE
, but may be helpful to setpeak = TRUE
for potentially very long-running processes so that the tibble doesn't grow too large.- env
The environment to evaluate
expr
in.
Value
A tibble with columns id
and time
and a number of columns from
ps::ps()
output describing memory and CPU usage. Notably, the process ID
pid
, parent process ID ppid
, percent CPU usage, and resident set size
rss
(a measure of memory usage).
Details
While much of the verbiage in the package assumes that the supplied
expression will be distributed across CPU cores, there's nothing specific
about this package that necessitates the expression provided to syrup()
is
run in parallel. Said another way, syrup()
will work just fine
with "normal," sequentially-run R code (as in the examples). That said,
there are many better, more fine-grained tools for the job in the case of
sequential R code, such as Rprofmem()
, the
profmem
package, the bench package, and packages in
the R-prof GitHub organization.
Loosely, the function works by:
Setting up another R process (call it
sesh
) that queries system information usingps::ps()
at a regular interval,Evaluating the supplied expression,
Reading the queried system information back into the main process from
sesh
,Closing
sesh
, and thenReturning the queried system information.
Note that information on the R process sesh
is filtered out from the results
automatically.
Examples
# pass any expression to syrup. first, sequentially:
res_syrup <- syrup({res_output <- Sys.sleep(1)})
res_syrup
#> # A tibble: 3 × 8
#> id time pid ppid name pct_cpu rss vms
#> <dbl> <dttm> <int> <int> <chr> <dbl> <bch:byt> <bch:byt>
#> 1 1 2024-07-18 14:12:10 5678 1602 R NA 269MB 928MB
#> 2 2 2024-07-18 14:12:11 5678 1602 R 0 269MB 928MB
#> 3 3 2024-07-18 14:12:11 5678 1602 R 0 269MB 928MB
# to snapshot memory and CPU information more (or less) often, set `interval`
syrup(Sys.sleep(1), interval = .01)
#> # A tibble: 13 × 8
#> id time pid ppid name pct_cpu rss vms
#> <dbl> <dttm> <int> <int> <chr> <dbl> <bch:byt> <bch:byt>
#> 1 1 2024-07-18 14:12:12 5678 1602 R NA 274MB 933MB
#> 2 2 2024-07-18 14:12:12 5678 1602 R 0 274MB 933MB
#> 3 3 2024-07-18 14:12:12 5678 1602 R 0 274MB 933MB
#> 4 4 2024-07-18 14:12:12 5678 1602 R 0 274MB 933MB
#> 5 5 2024-07-18 14:12:12 5678 1602 R 0 274MB 933MB
#> 6 6 2024-07-18 14:12:12 5678 1602 R 0 274MB 933MB
#> 7 7 2024-07-18 14:12:12 5678 1602 R 0 274MB 933MB
#> 8 8 2024-07-18 14:12:12 5678 1602 R 0 274MB 933MB
#> 9 9 2024-07-18 14:12:13 5678 1602 R 0 274MB 933MB
#> 10 10 2024-07-18 14:12:13 5678 1602 R 0 274MB 933MB
#> 11 11 2024-07-18 14:12:13 5678 1602 R 0 274MB 933MB
#> 12 12 2024-07-18 14:12:13 5678 1602 R 0 274MB 933MB
#> 13 13 2024-07-18 14:12:13 5678 1602 R 0 274MB 933MB
# use `peak = TRUE` to return only the snapshot with
# the highest memory usage (as `sum(rss)`)
syrup(Sys.sleep(1), interval = .01, peak = TRUE)
#> # A tibble: 1 × 8
#> id time pid ppid name pct_cpu rss vms
#> <dbl> <dttm> <int> <int> <chr> <dbl> <bch:byt> <bch:byt>
#> 1 1 2024-07-18 14:12:14 5678 1602 R NA 275MB 934MB
# results from syrup are more---or maybe only---useful when
# computations are evaluated in parallel. see package README
# for an example.