Installing your own R packages

Installing your own R packages on the cluster.

Introduction

Please also refer to the FAQ on installing popular R packages here: http://otrs.pik-potsdam.de/otrs/customer.pl?Action=CustomerFAQZoom;ItemID=177

In most cases, installing your own R packages on the cluster is straightforward. Typically, within R install.packages() works fine, or from the command line, R CMD INSTALL <package archive="">.

You may, however, come across a package that depends on a system library which we have installed in a non-standard location.

This guide will walk through the installation of a couple of such packages. The intention is not to provide a foolproof guide for installing a particular package, or set of packages, but to show typical errors that you might see, how to start to interpret them and modify the installationto correct them.

Example 1 - package "rgeos"

We'll start by attempting to install the R package "rgeos". This is an interface to the `"Geometry Engine - Open Source" library <https: geos="" trac.osgeo.org=""></https:>`__, a C++ library which we have installed in the cluster system libraries directory (/p/system/) and made available as a module (geos/3.6.1)

First, we load a recent R module:

# our optimised R was built with the Intel compilers, so we load this first:
module load intel/2018.1
# now load the version of R we need:
module load R/3.4.4

Before going any further, check that you only have these two modules loaded:

module list

Currently Loaded Modulefiles:
  1) intel/2018.1   2) R/3.4.4

First error: a missing library

Now we start R and use install.packages():

&gt; install.packages("rgeos")
Installing package into '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4'
(as 'lib' is unspecified)
trying URL 'https://ftp.gwdg.de/pub/misc/cran/src/contrib/rgeos_0.3-28.tar.gz'
Content type 'application/octet-stream' length 252833 bytes (246 KB)
==================================================
downloaded 246 KB

* installing *source* package 'rgeos' ...
** package 'rgeos' successfully unpacked and MD5 sums checked
configure: CC: icc
configure: CXX: icpc
configure: rgeos: 0.3-28
checking for /usr/bin/svnversion... yes
configure: svn revision: 572
checking for geos-config... no
no
configure: error: geos-config not found or not executable.
ERROR: configuration failed for package 'rgeos'
* removing '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/rgeos'
* restoring previous '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/rgeos'

The downloaded source packages are in
        '/p/tmp/R_tmp/RtmpkJJdOQ/downloaded_packages'
Warning message:
In install.packages("rgeos") :
  installation of package 'rgeos' had non-zero exit status

We see that the installation failed, (non-zero exit status warning at the bottom). Working backwards, we see ERROR: configuration failed for package 'rgeos', and the line before that gives us a precise error: configure: error: geos-config not found or not executable.

The R installation doesn't know where to find the components of the base GEOS library, in particular the program which can inform it about the setup (geos-config). We know that GEOS is installed via a module, which should also set this path up correctly.

So, let's quit R, and check:

module avail geos

geos/3.3.3 geos/3.5.0 geos/3.6.1

There's a couple, so we'll pick the latest. Does it provide geos-config?

module load geos/3.6.1

which geos-config
/p/system/packages/geos/3.6.1/bin/geos-config

Yes! So we load the GEOS module before starting R again.

Now, module list will show three loaded modules:

module list
Currently Loaded Modulefiles:
  1) intel/2018.1   2) R/3.4.4        3) geos/3.6.1

Success!

Now, let's start R again, and run install.packages('rgeos')

* installing *source* package 'rgeos' ...
** package 'rgeos' successfully unpacked and MD5 sums checked
configure: CC: icc -std=c99
configure: CXX: icpc
configure: rgeos: 0.3-28
checking for /usr/bin/svnversion... yes
configure: svn revision: 572
checking for geos-config... /p/system/packages/geos/3.6.1/bin/geos-config
checking geos-config usability... yes
configure: GEOS version: 3.6.1
checking geos version at least 3.2.0... yes
checking geos-config clibs... yes
checking geos_c.h  presence and usability... yes
checking geos: linking with libgeos_c... yes
configure: PKG_CPPFLAGS:  -I/p/system/packages/geos/3.6.1/include
configure: PKG_LIBS:  -L/p/system/packages/geos/3.6.1/lib -lgeos -L/p/system/packages/geos/3.6.1/lib -lgeos_c
configure: creating ./config.status
config.status: creating src/Makevars
** libs
icpc  -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -daal=parallel -qopenmp -xCORE-AVX2 -fPIC  -c dummy.cc -o dummy.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c init.c -o init.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c local_stubs.c -o local_stubs.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos.c -o rgeos.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_R2geos.c -o rgeos_R2geos.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_R2geosMP.c -o rgeos_R2geosMP.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_bbox.c -o rgeos_bbox.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_buffer.c -o rgeos_buffer.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_coord.c -o rgeos_coord.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_geos2R.c -o rgeos_geos2R.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_linearref.c -o rgeos_linearref.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_misc.c -o rgeos_misc.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_poly2nb.c -o rgeos_poly2nb.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_predicate_binary.c -o rgeos_predicate_binary.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_predicate_unary.c -o rgeos_predicate_unary.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_topology.c -o rgeos_topology.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_topology_binary.c -o rgeos_topology_binary.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_validate.c -o rgeos_validate.o
icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include"
    -I/usr/local/include   -fpic  -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC  -c rgeos_wkt.c -o rgeos_wkt.o
icpc -shared -L/p/system/packages/R/3.4.4/lib64/R/lib -qopenmp -o rgeos.so dummy.o init.o local_stubs.o rgeos.o rgeos_R2geos.o rgeos_R2geosMP.o rgeos_bbox.o
    rgeos_buffer.o rgeos_coord.o rgeos_geos2R.o rgeos_linearref.o rgeos_misc.o rgeos_poly2nb.o rgeos_predicate_binary.o
      rgeos_predicate_unary.o rgeos_topology.o rgeos_topology_binary.o rgeos_validate.o rgeos_wkt.o -L/p/system/packages/geos/3.6.1/lib -lgeos
        -L/p/system/packages/geos/3.6.1/lib -lgeos_c -L/p/system/packages/R/3.4.4/lib64/R/lib -lR
installing to /home/linstead/R/x86_64-pc-linux-gnu-library/3.4/rgeos/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (rgeos)

The downloaded source packages are in
    '/p/tmp/R_tmp/RtmpoxQvcB/downloaded_packages'

We see no errors or warnings, and near the beginning, we see our change to the compiler options taking effect: configure: CC: icc -std=c99

Example 2 - missing libraries, but no helper config program

In the rgeos example, the R package needs to know where to find the GEOS (C++) library. It used a handy tool provided by GEOS, geos-config, to get these settings. But not every library comes with such a tool. In this example, we'll install a package that needs to be told explicitly where to find a library.

install.packages("udunits2")

Let's try to install the R package udunits2. This package provides an interface in R to the udunits library, a C library for the manipulation and conversion of units of physical quantities. The latest version is installed on the cluster in /p/system/packages/udunits/2.2.26/ (and available via module udunits/2.2.26). First we'll try a simple install.packages('udunits2') and see what happens.

* installing *source* package 'udunits2' ...
** package 'udunits2' successfully unpacked and MD5 sums checked
checking for gcc... icc -std=c99
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether icc -std=c99 accepts -g... yes
checking for icc -std=c99 option to accept ISO C89... none needed
checking for XML_ParserCreate in -lexpat... yes
checking how to run the C preprocessor... icc -std=c99 -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking udunits2.h usability... no
checking udunits2.h presence... no
checking for udunits2.h... no
checking for ut_read_xml in -ludunits2... no
-----Error: libudunits2.a not found-----
     If the udunits2 library is installed in a non-standard location,
     use --configure-args='--with-udunits2-lib=/usr/local/lib' for example,
     or --configure-args='--with-udunits2-include=/usr/include/udunits2'
     replacing paths with appropriate values for your installation.
     You can alternatively use the UDUNITS2_INCLUDE and UDUNITS2_LIB
     environment variables.
     If udunits2 is not installed, please install it.
     It is required for this package.
ERROR: configuration failed for package 'udunits2'
* removing '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/udunits2'
* restoring previous '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/udunits2'

The downloaded source packages are in
    '/p/tmp/R_tmp/RtmpZSRurP/downloaded_packages'
Warning message:
In install.packages("udunits2") :
  installation of package 'udunits2' had non-zero exit status

There's a lot of information here, but it tells us fairly explicitly what the problem is (Error: libudunits2.a not found),and indeed, /p/system isn't a standard location.

Simply loading the udunits/2.2.26 module won't be enough though - we need to tell R directly where to find the library (the error message tells us how!).

The method suggested (--configure-args='--with-udunits2-lib=/usr/local/lib')applies for installations done directly from source (i.e. if you download the source code, unpack it, configure and make it).

We can pass these options to install.packages inside R though, which is more convenient. (Please read the documentation for install.packageshere).

We first load the UDUNITS module (module load udunits/2.2.26), then:

install.packages("udunits2", configure.args='--with-udunits2-lib=/p/system/packages/udunits/2.2.26/lib --with-udunits2-include=/p/system/packages/udunits/2.2.26/include')

Notice that we passed the recommended configuration options via the configure.args parameter to install.packages.

How do we know what the correct paths are? By looking at the udunits module:

module show udunits/2.2.26
-------------------------------------------------------------------
/p/system/modulefiles/tools/udunits/2.2.26:

module-whatis    Enable usage for udunits version 2.2.26
setenv           UDUNITSROOT /p/system/packages/udunits/2.2.26
setenv           UDUNITS2_LIB /p/system/packages/udunits/2.2.26/lib
setenv           UDUNITS2_INCLUDE /p/system/packages/udunits/2.2.26/include
module           load intel/2018.1
module           load compiler/gnu/7.3.0
module           load expat bison
prepend-path     PATH /p/system/packages/udunits/2.2.26/bin
prepend-path     INCLUDE /p/system/packages/udunits/2.2.26/include
prepend-path     LD_LIBRARY_PATH /p/system/packages/udunits/2.2.26/lib
prepend-path     MANPATH /p/system/packages/udunits/2.2.26/share/man
-------------------------------------------------------------------

We see a few things here, but the INCLUDE and LD_LIBRARY_PATH parts tell us where the components are.

Example 3: Rmpi

Rmpi is an R interface to the MPI library for writing parallel applications. On the cluster, the supported MPI libraries are those bundled withthe Intel Cluster tools, most recently the intel/2018.1 and intel/2018.3 modules.

Let's try a simple installation (install.packages("Rmpi"))

* installing *source* package 'Rmpi' ...
** package 'Rmpi' successfully unpacked and MD5 sums checked
checking for gcc... icc -std=c99
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether icc -std=c99 accepts -g... yes
checking for icc -std=c99 option to accept ISO C89... none needed
checking for pkg-config... /usr/bin/pkg-config
checking if pkg-config knows about OpenMPI... no
checking how to run the C preprocessor... icc -std=c99 -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking mpi.h usability... no
checking mpi.h presence... no
checking for mpi.h... no
configure: error: "Cannot find mpi.h header file"
ERROR: configuration failed for package 'Rmpi'
* removing '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/Rmpi'
* restoring previous '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/Rmpi'

The downloaded source packages are in
        '/p/tmp/R_tmp/RtmpLx7H93/downloaded_packages'
Warning message:
In install.packages("Rmpi") :
  installation of package 'Rmpi' had non-zero exit status

As expected, it fails :)

The error:

checking for mpi.h... no
configure: error: "Cannot find mpi.h header file"

gives us a hint that the package installer can't find components of the MPI library.

But this installer isn't as helpful as udunits2 above, which gave us a suggestion about what to try next. Where do we go from here? A good starting point is always to head for the documentation for the package in question: Rmpi documentation

But sadly, not all documentation is complete or up to date! For this package (a fairly important R package in high-performance computing circles), the linked PDF has no installation instructions, and the linked README contains a URL which no longer exists.

So we must dig further.

Examining the source code of an R package.

The CRAN page for the package contains a link to the source code, so we'll fetch this, unpack it, at look at the options for configuring it.

# fetch the code from the CRAN link "Package source":
wget https://cran.r-project.org/src/contrib/Rmpi_0.6-7.tar.gz
# unpack it:
tar xvzf Rmpi_0.6-7.tar.gz
cd Rmpi
# look at the options for configuration:
./configure --help
.
.
# some interesting looking parameters:
Optional Packages:

  --with-Rmpi-include=INCLUDE_PATH
                          location of MPI header files
  --with-Rmpi-libpath=LIB_PATH
                          location of MPI library files
  --with-Rmpi-type=MPI_TYPE
                          the type of MPI: OPENMPI,LAM,MPICH,MPICH2, or CRAY
  --with-mpi=LIB_PATH     location of top-level MPI directory

Strangely, these options are listed under "Optional packages", though MPI seems to be mandatory (as one would expect for an interface to an MPI library!).

So, after looking into the module file for the Intel tools, and the directories it points to, we can construct the correct installer command.

install.packages("Rmpi", configure.args="--with-Rmpi-include=/p/system/packages/intel/parallel_studio_xe_2018_update1/compilers_and_libraries/linux/mpi/include64/
    --with-Rmpi-libpath=/p/system/packages/intel/parallel_studio_xe_2018_update1/compilers_and_libraries/linux/mpi/lib64/ --with-Rmpi-type=OPENMPI")

Note that we set MPI-type to OPENMPI, not INTEL. The configuration options don't indicate that Intel is available, but OpenMPI is a close possibility.

An aside: testing MPI code on a login node

To load/test this module on a login node, you'll need to set:

export I_MPI_FABRICS=shm:shm

before starting R (Do not set this for jobs submitted via SLURM though.)

Example 4: rgdal

The following modules need to be loaded (with "module load").

  • an R version (e.g. R/3.4.4 or R/3.3.2)
  • GDAL (e.g. gdal/2.2.4)
  • PROJ4 (e.g. proj4/5.0.1)

Then, in R:

install.packages("rgdal", configure.args=c('--with-proj-include=/p/system/packages/proj4/5.0.1/include','--with-proj-lib=/p/system/packages/proj4/5.0.1/lib'))

Note that the path to the PROJ4 library must match the version loaded. Use "module show" to find the correct path it you're not using version 5.0.1.

The paths to the GDAL libraries are found automatically at install time via the "gdal-config" tool, once a gdal module is loaded.

Other combinations of versions of R, PROJ4, and GDAL may also be possible, but the above versions are known to install correctly.

Conclusions.

Installing R packages can sometimes be quite tricky, and require special knowledge about underlying tools and libraries. It's rarely the case that a package cannot be installed at all.

Here are some general guidelines, based on our experience with this problem:

  • Read the error messages carefully. Sometimes the error messages tell us directly what is missing, expected or what to try next.
  • Ask a colleague. At PIK, very many R modules are commonly used by many scientists. You may find that a colleague has experience with the problem you're having.
  • Read the package documentation. Though not always useful, it may give you a hint about configuration options, without having to dig into the sources.
  • Dig into the sources. Have a look at what options the configure script in the source code accepts. These will often give you a hint about what to try, especially if you have an explicit error.
  • Ask IT services (cluster-support@pik-potsdam.de). We have some experience of a few problematic modules. However, given the huge range of packages available for R, and not being professional R programmers ourselves, it may take us a long time to be able to investigate a particular error message. We can certainly advise on which modules are available, and can install third-party libraries on request.

Document converted to ReStructured Text (.rst) with pandoc -f markdown -t rst README.md &gt; README.rst</https:></https:></package>

Docutils System Messages

System Message: ERROR/3 (<string>); backlink

Anonymous hyperlink mismatch: 1 references but 0 targets. See "backrefs" attribute for IDs.