Introduction

This is a basic introduction to R, showcasing features and idiosyncrasies that I’ve needed/stumbled upon while automatize some matrix computations. For completeness, I’ve included the resulting script at the end.

Installation

Should be packaged by your distribution; e.g.:

(root)# pacman -S r
...

R, Rscript

There are two main ways to start a R session: either use the R binary, or Rscript. The former is a regular top-level, while the latter helps with running standalone scripts. Examples are provided below and in the next section.

As a side note, those two functions might come in handy to terminate a script/session:

stop(), which terminates a script with an error message, and returning to the parent shell with a non-zero error code:
```
$ echo 'stop("ko"); print("hello");' | Rscript /dev/stdin
Error: ko
Execution halted
$ echo $?
1
```

q()/quit(), which terminates a script, with a zero exit code:

$ echo 'q(); print("hello");' | Rscript /dev/stdin
$ echo $?
0

Note: The function print() does essentially what you’d expect it to do. It will be demonstrated later.

Documentation

There’s >3k pages reference manual (.pdf). Fortunately, it can be queried easily from the CLI: inline documentation for function is available via help(function) within a R prompt. This can be scripted:

$ echo "help(help)" | R -q --no-save < /dev/stdin | sed $'s/[^[:print:]\t]//g'
> help(help)
help                   package:utils                   R Documentation

_D_o_c_u_m_e_n_t_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     ‘help’ is the primary interface to the help systems.

_U_s_a_g_e:

     help(topic, package = NULL, lib.loc = NULL,
          verbose = getOption("verbose"),
          try.all.packages = getOption("help.try.all.packages"),
          help_type = getOption("help_type"))

_A_r_g_u_m_e_n_t_s:

   topic: usually, a name or character string specifying the topic for
          which help is sought.  A character string (enclosed in
...

Or better, with Rscript:

% echo "help(help)" | Rscript /dev/stdin | sed $'s/[^[:print:]\t]//g'
...

Comments

# Note the single comment-form, as in sh(1).

# Multi-line comments are simply a sequence
# of single line comment.

Variables, strings, complex numbers

There are (at least) two different variables assignment operators, =, <- and ->. There are subtle differences; <- seems to be the most idiomatic.

The four following assignments forms are equivalent; note how the last one could be convenient in a “meta-programming” setting; observe the various ways to access/print a variable’s value:

x <- 42
y = 42
42 -> z
assign("t", 42)

x
print(y)
cat(z, ",", t, "\n")

[1] 42
[1] 42
42 , 42

Strings can be represented either with simple or double quotes; single quotes are naturally escaped within a double-quoted strings, and vice-versa. Double-quoted strings are the “default”:

x <- "'hello'"
y = '"world"'

print(x)
print(y)

[1] "'hello'"
[1] "\"world\""

Let’s conclude this section with some elementary complex numbers manipulation, observe how an isolated i is interpreted as a variable:

1i
Im(2+6i)
Re(2+6i)
Mod(4i)
1+i

[[1] 0+1i
[1] 6
[1] 2
[1] 4
Error: object 'i' not found
Execution halted

Vectors, matrices

Assuming a linear memory model (say, C’s arrays), how should one go about representing matrices?

Well, it depends.

Conceptually, a list of lists would do. But there are other options. A common one is to equip a simple list with some dimensional meta-data, telling us where to break either columns or rows. For example [1 2 3 4 5 6] (this informal list notation with square brackets in not valid R) could be interpreted as a 2×3 matrix:

1 2 3
4 5 6

Or as following 3x2 matrix

1 2
3 4
5 6

You may think that a list of lists is “simpler”, but there are performance advantages to keeping it flat, especially as your “matrices” grow in size, or dimension (tensor).

That’s to say, for a software intended for scientific computation, the flat list approach is likely to be favored, and this is indeed the idiomatic way to implement matrices in R.

Such simple “non-recursive” lists are called “atomic vectors” in R. A key characteristic, besides the fact that they their elements cannot be atomic vectors, is that all their elements must be have the same type (it’s likely that they map to some C array): you can’t have strings mixed with complex numbers for example.

They are created with the c() primitive.

For example c(c(1, 2), c(3, 4)) is the flat list [1 2 3 4], not a list of lists: there’s some automatic flattening going on. Here’s some more automagic: c(c(1, 2), c("a", "b")) implicitly coerces the integers to strings: this becomes the flat list ["1" "2" "3" "4"]

And as usual, automagic stuff is erratic, so be careful:

> xs <- c(1, 2, 3, 4)
> xs[1]
[1] 1
> class(xs[1])
[1] "numeric"
> xs[2] <- 2i
> class(xs[1])
[1] "complex"
> xs
[1] 1+0i 0+2i 3+0i 4+0i
> xs[1]
[1] 1+0i

ys <- c("1", "2", "3")
> class(ys[1])
[1] "character"
> ys
[1] "1" "2" "3"
> ys[1] <- 1
> class(ys[1])
[1] "character"
> ys[1] <- 1i
> ys
[1] "0+1i" "2"    "3"
> class(ys[1])
[1] "character"

Note: This previous examples also demonstrate:

The use of class() to determine the type of a variable;
The use of [n] to access individual elements of a vector;
More subtle, but the fact that vector indexing starts at 1, not at 0.

Let’s get back to this idea of implementing matrices, and see how we can transform a flat list into a matrix. The dimensional meta-data can be accessed and set via the dim() function. The matrix() function, besides wrapping a call to dim(), allows to re-arrange the input vector so as to be able to interpret the dimension not just as a column information (default), but as a row:

m <- c(1:6)
m
dim(m) <- c(2, 3)
m
dim(m) <- c(3, 2)
m
m <- matrix(c(1:6), 2, 3, byrow = TRUE)
m
dim(m)
m <- matrix(c(1:6), nrow=2, ncol=3, byrow = FALSE)
m
dim(m)

[1] 1 2 3 4 5 6
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[1] 2 3
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
[1] 2 3

Note: Mind the use of the colon : to generate list of numbers, and the fact that we can used named parameters without naming name, assuming we keep them in the correct order.

A variety of matrix operations are available. Here’s a small example demonstrating the transpose (t()), complex conjugation (Conj(); combining both, we have the dagger operator then), and matrix multiplication (infix operator %*%)

psi     <-  matrix(c(1+1i, 2+1i, 3+1i, 4+1i), 1, 4)
psistar <-  Conj(t(psi))
psistar %*% psi

     [,1] [,2]  [,3]  [,4]
[1,] 2+0i 3-1i  4-2i  5-3i
[2,] 3+1i 5+0i  7-1i  9-2i
[3,] 4+2i 7+1i 10+0i 13-1i
[4,] 5+3i 9+2i 13+1i 17+0i

CLI arguments

The full command line corresponding to the call to the R executable is available as a vector of strings via commandArgs():

(shell)# R -q --no-save
> commandArgs()
[1] "/usr/lib64/R/bin/exec/R" "-q"
[3] "--no-save"
> is.vector(commandArgs()); class(commandArgs())
[1] TRUE
[1] "character"

Parameters for a R script can be distinguished from parameters to the R executable by the use of the --args flags on one hand, and of trailingOnly = TRUE parameter to commandArgs() on the other:

(shell)# R -q --no-save --args foo bar baz --save
> commandArgs()
[1] "/usr/lib64/R/bin/exec/R" "-q"
[3] "--no-save"               "--args"
[5] "foo"                     "bar"
[7] "baz"                     "--save"
> commandArgs(trailingOnly = TRUE)
[1] "foo"    "bar"    "baz"    "--save"
> ^D

Note that --save wasn’t interpreted by R. Rscript’s arguments are conveniently wrapped with a --args (remember, this is just a frontend to the R executable):

(shell)# echo 'commandArgs()' | Rscript /dev/stdin foo bar baz --no-save
[1] "/usr/lib64/R/bin/exec/R" "--no-echo"
[3] "--no-restore"            "--file=/dev/stdin"
[5] "--args"                  "foo"
[7] "bar"                     "baz"
[9] "--no-save"
(shell)# echo 'commandArgs(trailingOnly = TRUE)' | Rscript /dev/stdin foo bar baz --no-save
[1] "foo"       "bar"       "baz"       "--no-save"

Functions/Procedures

You’ve got a “regular” function(arg0, arg1, ...) { ... return(...) } syntax to create anonymous functions; coupled with assignment to actually name the function. Here’s a recursive factorial:

fact <- function(n) { if (n <= 1) return(1) else return(n*fact(n-1)) }
fact(5)

[1] 120

Note: The use of parentheses after return is mandatory. The return itself is optional.

Iteration, conditions

We’ve just saw recursivity, which provides us with elementary iteration. When acting on vectors/lists, one can rely on the various *apply() implementation:

apply(), which is a fold acting on either a matrix’s columns rows, or individual elements (“margin”)

Usage:

  apply(X, MARGIN, FUN, ..., simplify = TRUE)

Arguments:

     X: an array, including a matrix.

MARGIN: a vector giving the subscripts which the function will be
        applied over.  E.g., for a matrix ‘1’ indicates rows, ‘2’
        indicates columns, ‘c(1, 2)’ indicates rows and columns.
        Where ‘X’ has named dimnames, it can be a character vector
        selecting dimension names.

   FUN: the function to be applied: see ‘Details’.  In the case of
        functions like ‘+’, ‘%*%’, etc., the function name must be
        backquoted or quoted.

   ...: optional arguments to ‘FUN’.

simplify: a logical indicating whether results should be simplified if
        possible.

For example:

m <- matrix(c(1:8), 2, 4)
> m
   [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8
> apply(m, 1, function(x) { cat("row", x, "\n") })
row 1 3 5 7
row 2 4 6 8
NULL
> apply(m, 2, function(x) { cat("column", x, "\n") })
column 1 2
column 3 4
column 5 6
column 7 8
NULL
> apply(m, c(1, 2), function(x) { cat("row/column", x, "\n") })
row/column 1
row/column 2
row/column 3
row/column 4
row/column 5
row/column 6
row/column 7
row/column 8
NULL
> apply(m, 1, function(x) { return(sum(x)) })
[1] 16 20
> apply(m, 2, function(x) { return(sum(x)) })
[1]  3  7 11 15

lapply() or sapply()/vapply(): “regular” apply functions acting on either vectors or lists. The former returns a list (“list” apply), the two others return a vector. There are some further subtlety that can hardly be covered more thoroughly here than they are in the manual.
tapply() is yet another apply-like function, but this one allows to work on ragged-arrays (non-rectangular). Again, you may want to refer to the man page.

Besides, you’ve got all your basic loops but the for(;;):

# foreach
for (i in 5:1) { cat(i, "\n") }
5
4
3
2
1
# While loop; note the absence of ++/-- operators
> i <- 5; while (i > 0) { cat(i, "\n"); i <- i-1 }
5
4
3
2
1

Regular conditions work, with optional braces for single line, but there are some unusual syntactical subtleties:

# syntax error
if (x == 0)
	0
else
	x

# syntax error
if (x == 0) 0
else x

# syntax error
if (x == 0) {0}
else        {x}

# OK
if (x == 0)
0 else x

# OK
if (x == 0) 0 else x

# OK
if (x == 0) {0
}else       {x}

Installing packages

By default, packages are installed to /usr/lib/R/library, which shouldn’t be writeable by an ordinary user.

More precisely, packages are looked for in the locations registered in/via .libPaths(), which should contain /usr/lib/R/library by default.

There are two ways to “import” a package, require() and library(). The former returns a boolean, allowing us to check whether the package has been installed.

The following excerpt installs to a temporary location a package, if necessary, and loads it.

tmp <- "/tmp/R/lib"
.libPaths(tmp)

# mkdir(1) -p $tmp
# NOTE: the = signs below are used not assignments, but the
# syntax to set named parameters.
dir.create(tmp, recursive = TRUE, mode = "0755")

loadpkg <- function(p) {
	# By default, require(foo) and require("foo") are equivalent:
	# meaning, require(p) will try to install a package named "p", and
	# not the package referenced by the value of the variable p.
	#
	# The character.only parameter allows to change that.
	# The same behavior applies for library()
	if (!require(p, character.only = TRUE)) {
		install.packages(p, repos='http://cran.us.r-project.org', lib=tmp)

		# Now that the package is installed, we can import it, and as we
		# have no use for "require"'s returned value, we can rely on "library"
		# instead.
		library(p, character.only = TRUE)
	}
}
loadpkg("xtable")

$\LaTeX$ matrix export: xtable, bashful

It’s possible to export an R matrix to $\LaTeX$ via the xtable R package. By default, the package is more intended to export general tables: a few options are required to output a bare matrix.

It’s possible then to have a R script printing $\LaTeX$ to a file, and \input{} this file from your .tex. Running the R script can be performed either in a Makefile, or directly in the .tex via the bashful for example. The former seems to be preferable as soon as you’ve got more than one or two results, as I don’t think there’s some caching mechanism built-in bashful, and running extra-processes slows down the $\LaTeX$ compilation.

I cover in particular complex ($\mathbb{C}$) matrix export in a separate article; let me inline the relevant code sample:

library("xtable")

xtable <- function(x, ...) {
	if (class(x[[1]]) == "complex") {
		z <- sapply(x, function(y) {
			if (y == 0)     return(as.character(0))
			if (Im(y) == 0) return(as.character(Re(y)))

			t <- ""
			if (Re(y) != 0) t <- as.character(Re(y))

			if (Im(y) == 1) {
				if (Re(y) == 0) t <- "i"
				else            t <- paste(t, "+i")
			} else if (Im(y) == -1)
				t <- paste(t, "-i")
			else {
				if (Re(y) == 0)     t <- paste(Im(y), "i")
				else if (Im(y) > 0) t <- paste(t, "+", Im(y), "i")
				else                t <- paste(t, Im(y), "i")
			}
			return(t)
		})
		dim(z) <- dim(x)
		xtable::xtable(z, ...)
	} else
		xtable::xtable(x, ...)
}

sigma_y  <- matrix(c(0, 1i, -1i, 0), 2, 2)

m <- xtable(
	sigma_y,
	align=rep("",ncol(sigma_y)+1))

digits(m) <- xdigits(m)

print(m,
	floating = FALSE, tabular.environment = "pmatrix",
	hline.after=NULL, include.rownames=FALSE, include.colnames=FALSE
)

% latex table generated in R 4.3.0 by xtable 1.8-4 package
% Thu Jun 15 01:48:46 2023
\begin{pmatrix}{}
  0 &  -i \\
   i & 0 \\
  \end{pmatrix}

Script example

#!/bin/Rscript

# Quirky, but does the job.

# Computes expectation values for a composite
# system built from two quantum spins.
#
# Used with either with 1 or 5 arguments:
#	<operator> [wave-function (uu, ud, du, dd)]
#
# Each argument is parsed and evaluated, so you can
# use "tau_x %*% sigma_x" as an operator for example.
#
# If only an operator is provided (1 arg), its 4x4 matrix form
# is displayed as LaTeX on stdout.
#
# Otherwise, evaluates the expectation value for the operator
# for a system in a state described by the wave-function.

tmp <- "/tmp/R/lib"
.libPaths(tmp)
dir.create(tmp, recursive = TRUE, mode = "0755")

loadpkg <- function(p) {
	if (!require(p, character.only = TRUE)) {
		install.packages(p, repos='http://cran.us.r-project.org', lib=tmp)
		library(p, character.only = TRUE)
	}
}

# 2x2 identity and Pauli matrices
id2 = matrix(c(1,  0,  0,  1), 2, 2)
pz  = matrix(c(1,  0,  0, -1), 2, 2)
px  = matrix(c(0,  1,  1,  0), 2, 2)
py  = matrix(c(0, 1i, -1i, 0), 2, 2)

# Ugprade subsystem spin component operators to the composite system
sigma_z = kronecker(pz, id2)
tau_z   = kronecker(id2, pz)

sigma_x = kronecker(px, id2)
tau_x   = kronecker(id2, px)

sigma_y = kronecker(py, id2)
tau_y   = kronecker(id2, py)

# Expectation value computation
avg <- function(L, psi) {
	return((Conj(t(psi)) %*% L %*% psi)[1])
}

# Evaluate CLI arguments (e.g. "interpret" "tau_z %*% sigma_z"
# as the corresponding observable)
#
# Note that we need a list (lapply), as some of the arguments will be
# vectors already.
args <- lapply(
	commandArgs(trailingOnly = TRUE),
	function(x) { return(eval(parse(text=x))) }
)

# LaTeX export
loadpkg("xtable")

# See https://tales.mbivert.com/on-exporting-r-complex-matrix-latex/
xtable <- function(x, ...) {
	if (class(x[[1]]) == "complex") {
		z <- sapply(x, function(y) {
			if (y == 0)     return(as.character(0))
			if (Im(y) == 0) return(as.character(Re(y)))

			t <- ""
			if (Re(y) != 0) t <- as.character(Re(y))

			if (Im(y) == 1) {
				if (Re(y) == 0) t <- "i"
				else            t <- paste(t, "+i")
			} else if (Im(y) == -1)
				t <- paste(t, "-i")
			else {
				if (Re(y) == 0)     t <- paste(Im(y), "i")
				else if (Im(y) > 0) t <- paste(t, "+", Im(y), "i")
				else                t <- paste(t, Im(y), "i")
			}
			return(t)
		})
		dim(z) <- dim(x)
		xtable::xtable(z, ...)
	} else
		xtable::xtable(x, ...)
}

if (length(args) == 1) {
	x <- xtable(args[[1]], align=rep("",ncol(args[[1]])+1))

	# 1.00 -> 1, in our peculiar case
	digits(x) <- xdigits(x)
	print(x,
		floating = FALSE, tabular.environment = "pmatrix",
		hline.after=NULL, include.rownames=FALSE, include.colnames=FALSE
	)
	q()
} else if (length(args) != 5) stop("Incomplete wave function")

x <- avg(args[[1]], unlist(args[2:5]))

# avoids some 0+0i; refinable
if (x == 0) {cat(0, "\n")} else {cat(x, "\n")}

Comments

By email, at mathieu.bivert chez:

On R's Language: Introduction

Table Of Contents

Introduction

Installation

R, Rscript

Documentation

Comments

Variables, strings, complex numbers

Vectors, matrices

CLI arguments

Functions/Procedures

Iteration, conditions

Installing packages

\(\LaTeX\) matrix export: xtable, bashful

Script example

Comments