R (rlang) and Plots everywhere

Data science and Jupyter notebook can sometimes get exhausting. What about debugging, version control, code reviewing and so on. Coming from a Software Engineering background it‘s like losing 50% of the stuff you were used to.

To mitigate those problems I recently partially switched from Python to R with many improvements. For local Python coding, JetBrains PyCharm is my tool of choice and Jupyter notebooks for remote coding. With R it is RStudio Desktop and for remote, there is RStudio Server, which is almost like the desktop version within a browser. This allows one to develop and analyze data from any device with a browser.

R-Studio Server with running R-Notebook sample

Setup Overview

To setup such an R environment I rely on containers with podman or docker and on my beloved traefik-CE reverse proxy.  Additionally, I added Shiny as a publishing system and cron jobs for automatic data integration.

Overview of the setup; all services inside a container

User cron jobs are executed inside the server container and update datasets. Shiny further serves web pages with nice tables and plots. A subdirectory is therefore mapped to the Shiny server for easy deployment but could be further improved.

Update: Full example setup available at https://gitlab.com/2er0/pres/tree/1st-vhlug.

RStudio Server setup

The company rocker provides ready to use images with base setups for RStudio Server and Shiny. Thanks, people for the nice base images. For my setup, I further extended the image with my mainly used R-packages like ggplot2 and dplyr. This saves time if the container is recreated and you don’t want to install the packages always manually. This way it is automated via the container build.

FROM rocker/tidyverse:latest

RUN apt-get update -qq \
    && apt-get install -y libcurl4-gnutls-dev \
       libcairo2-dev libxt-dev wget gdebi-core \
       pandoc pandoc-citeproc libudunits2-dev \
       libjpeg-dev libgdal-dev cron vim \
    && /etc/init.d/cron start \
    && Rscript -e \
     "devtools::install_github('bnosac/cronR')" \
    && install2.r --error --deps TRUE \
       tidyr stringr

The rocker/tidyverse image already provides more R libraries like tidyverse, dplyr, devtools. What it not provides, is a command-line tool like wget for fetching data from some web page.

Shiny Server setup

Rocker also provides an image with the Shiny server. This image is ready to use, only the server config has to be modified or exchanged. The default config states the following, which has to be adapted to:

# original
server {
  listen 3838;
  …

# fixed
server {
  listen 3838 0.0.0.0
  …

Shiny then needs an address to listen for new requests and 0.0.0.0 means any local address. Also setting an executing user for the Shiny server is best practice with the setting run_as. Look up the Shiny server documentation.

Additional packages

To install required R-libraries on the Shiny server, a small workaround is required. Usually, the user running shiny is unprivileged and is therefore not allowed to install R-libraries to the system. A solution for this is to check if some directory doesn’t exist, the script is executed by the Shiny server. In this case, add a new path as a library path.

# check if executed on the Shiny server
if(!dir.exists("~/pro")) {
  # create if not existing
  dir.create("/home/{run_as}/{some-dir}", 
      recursive = TRUE)
  # add user library path to running process
  .libPaths(c("/home/{run_as}/{some-dir}", 
      .libPaths()))
}

# check if library exists
# if not install it
if(!require(DT)) install.packages("DT", repos = "https://cloud.r-project.org/")
library(DT)

Tying it all together with docker-compose

To not expose our internal services directly, I suggest using a reverse proxy, like traefik-CE or NGINX. With a compose script all services can be setup together, e.g.:

version: "3.7"

services:
  rstudio:
    build: .
    # local dockerfile for building the image
    # with pre installed R libraries
    # RStudio @ port 8787
    # look up possible environment variables
    # such as USER & PASSWORD
    networks:
      - proxy
    volumes:
      - {some-volume}:/home/{user}/pro
      # some volume for deploying Shiny-apps
      # to the Shiny server
  rshiny:
    image: rocker/shiny:latest
    # Shiny server @ port 3838
    networks:
      - proxy
    volumes:
      - {conf}:/etc/shiny-server/shiny-server.conf
      # map the config into the shiny space
      - {some-volume}:/srv/shiny-server/
      # map some volume between rstudio and 
      # shiny server for deploying Shiny-apps
    user: shiny
networks:
  proxy:
    external:
      name: proxy_net
      # external network from where traefik can
      # reach the service for proxying the
      # requests

Result

RStudio server login screen behind a reverse proxy, which handles the TLS-1.3 session with automatic Let’s Encrypt certificates.

RStudio with authentication screen and TLS-1.3 session behind a reverse proxy

Finally, the desired RStudio in the browser running remotely on some server or in some cloud, wherever you want.

RStudio Server with running R-Notebook sample

And the Shiny server currently serving the default Shiny sample application.

Base example of a Shiny application, served by the Shiny server

Links

fhLUG: Lightning Talk
R (rlang) / RStudio
Reverse Proxy
Container