Skip to contents

This function is a wrapper around the system command ps that can be used to benchmark (peak) memory and CPU usage of parallel R code. By taking snapshots the memory usage of R processes at a regular interval, the function dynamically builds up a profile of their usage of system resources.

Usage

syrup(expr, interval = 0.5, peak = FALSE, env = caller_env())

Arguments

expr

An expression.

interval

The interval at which to take snapshots of respirce usage. In practice, there's an overhead on top of each of these intervals.

peak

Whether to return rows for only the "peak" memory usage. Interpreted as the id with the maximum rss sum. Defaults to FALSE, but may be helpful to set peak = TRUE for potentially very long-running processes so that the tibble doesn't grow too large.

env

The environment to evaluate expr in.

Value

A tibble with columns id and time and a number of columns from ps::ps() output describing memory and CPU usage. Notably, the process ID pid, parent process ID ppid, percent CPU usage, and resident set size rss (a measure of memory usage).

Details

While much of the verbiage in the package assumes that the supplied expression will be distributed across CPU cores, there's nothing specific about this package that necessitates the expression provided to syrup() is run in parallel. Said another way, syrup() will work just fine with "normal," sequentially-run R code (as in the examples). That said, there are many better, more fine-grained tools for the job in the case of sequential R code, such as Rprofmem(), the profmem package, the bench package, and packages in the R-prof GitHub organization.

Loosely, the function works by:

  • Setting up another R process (call it sesh) that queries system information using ps::ps() at a regular interval,

  • Evaluating the supplied expression,

  • Reading the queried system information back into the main process from sesh,

  • Closing sesh, and then

  • Returning the queried system information.

Note that information on the R process sesh is filtered out from the results automatically.

Examples

# pass any expression to syrup. first, sequentially:
res_syrup <- syrup({res_output <- Sys.sleep(1)})

res_syrup
#> # A tibble: 3 × 8
#>      id time                  pid  ppid name  pct_cpu       rss       vms
#>   <dbl> <dttm>              <int> <int> <chr>   <dbl> <bch:byt> <bch:byt>
#> 1     1 2024-07-03 16:44:52  5724  1611 R          NA     268MB     927MB
#> 2     2 2024-07-03 16:44:52  5724  1611 R           0     268MB     927MB
#> 3     3 2024-07-03 16:44:53  5724  1611 R           0     268MB     927MB

# to snapshot memory and CPU information more (or less) often, set `interval`
syrup(Sys.sleep(1), interval = .01)
#> # A tibble: 14 × 8
#>       id time                  pid  ppid name  pct_cpu       rss       vms
#>    <dbl> <dttm>              <int> <int> <chr>   <dbl> <bch:byt> <bch:byt>
#>  1     1 2024-07-03 16:44:54  5724  1611 R          NA     268MB     928MB
#>  2     2 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#>  3     3 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#>  4     4 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#>  5     5 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#>  6     6 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#>  7     7 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#>  8     8 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#>  9     9 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#> 10    10 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#> 11    11 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#> 12    12 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#> 13    13 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB
#> 14    14 2024-07-03 16:44:54  5724  1611 R           0     268MB     928MB

# use `peak = TRUE` to return only the snapshot with
# the highest memory usage (as `sum(rss)`)
syrup(Sys.sleep(1), interval = .01, peak = TRUE)
#> # A tibble: 1 × 8
#>      id time                  pid  ppid name  pct_cpu       rss       vms
#>   <dbl> <dttm>              <int> <int> <chr>   <dbl> <bch:byt> <bch:byt>
#> 1     1 2024-07-03 16:44:55  5724  1611 R          NA     269MB     928MB

# results from syrup are more---or maybe only---useful when
# computations are evaluated in parallel. see package README
# for an example.