Introduction
This is a basic introduction to R, showcasing features and idiosyncrasies that I’ve needed/stumbled upon while automatize some matrix computations. For completeness, I’ve included the resulting script at the end.
Installation
Should be packaged by your distribution; e.g.:
(root)# pacman -S r
...
R, Rscript
There are two main ways to start a R session: either use the R
binary,
or Rscript
. The former is a regular top-level, while the latter helps with
running standalone scripts. Examples are provided below and in the next section.
As a side note, those two functions might come in handy to terminate a script/session:
stop()
, which terminates a script with an error message, and returning to the parent shell with a non-zero error code:$ echo 'stop("ko"); print("hello");' | Rscript /dev/stdin Error: ko Execution halted $ echo $? 1
q()/quit()
, which terminates a script, with a zero exit code:$ echo 'q(); print("hello");' | Rscript /dev/stdin $ echo $? 0
Note: The function print()
does essentially what you’d expect
it to do. It will be demonstrated later.
Documentation
There’s >3k pages reference manual (.pdf). Fortunately, it can
be queried easily from the CLI: inline documentation for function
is
available via help(function)
within a R prompt. This can be scripted:
$ echo "help(help)" | R -q --no-save < /dev/stdin | sed $'s/[^[:print:]\t]//g'
> help(help)
help package:utils R Documentation
_D_o_c_u_m_e_n_t_a_t_i_o_n
_D_e_s_c_r_i_p_t_i_o_n:
‘help’ is the primary interface to the help systems.
_U_s_a_g_e:
help(topic, package = NULL, lib.loc = NULL,
verbose = getOption("verbose"),
try.all.packages = getOption("help.try.all.packages"),
help_type = getOption("help_type"))
_A_r_g_u_m_e_n_t_s:
topic: usually, a name or character string specifying the topic for
which help is sought. A character string (enclosed in
...
Rscript
:
% echo "help(help)" | Rscript /dev/stdin | sed $'s/[^[:print:]\t]//g'
...
Comments
# Note the single comment-form, as in sh(1).
# Multi-line comments are simply a sequence
# of single line comment.
Variables, strings, complex numbers
There are (at least) two different variables assignment
operators, =
, <-
and ->
. There are subtle differences;
<-
seems to be the most idiomatic.
The four following assignments forms are equivalent; note how the last one could be convenient in a “meta-programming” setting; observe the various ways to access/print a variable’s value:
x <- 42
y = 42
42 -> z
assign("t", 42)
x
print(y)
cat(z, ",", t, "\n")
[1] 42
[1] 42
42 , 42
Strings can be represented either with simple or double quotes; single quotes are naturally escaped within a double-quoted strings, and vice-versa. Double-quoted strings are the “default”:
x <- "'hello'"
y = '"world"'
print(x)
print(y)
[1] "'hello'"
[1] "\"world\""
Let’s conclude this section with some elementary complex numbers
manipulation, observe how an isolated i
is interpreted as a
variable:
1i
Im(2+6i)
Re(2+6i)
Mod(4i)
1+i
[[1] 0+1i
[1] 6
[1] 2
[1] 4
Error: object 'i' not found
Execution halted
Vectors, matrices
Assuming a linear memory model (say, C’s arrays), how should one go about representing matrices?
Well, it depends.
Conceptually, a list of lists would do. But there are other
options. A common one is to equip a simple list with some
dimensional meta-data, telling us where to break either columns or rows.
For example [1 2 3 4 5 6]
(this informal list notation with square
brackets in not valid R) could be interpreted as a 2×3
matrix:
1 2 3
4 5 6
Or as following 3x2 matrix
1 2
3 4
5 6
You may think that a list of lists is “simpler”, but there are performance advantages to keeping it flat, especially as your “matrices” grow in size, or dimension (tensor).
That’s to say, for a software intended for scientific computation, the flat list approach is likely to be favored, and this is indeed the idiomatic way to implement matrices in R.
Such simple “non-recursive” lists are called “atomic vectors” in R. A key characteristic, besides the fact that they their elements cannot be atomic vectors, is that all their elements must be have the same type (it’s likely that they map to some C array): you can’t have strings mixed with complex numbers for example.
They are created with the c()
primitive.
For example c(c(1, 2), c(3, 4))
is the flat list [1 2 3 4]
, not
a list of lists: there’s some automatic flattening going on. Here’s some
more automagic: c(c(1, 2), c("a", "b"))
implicitly coerces the integers
to strings: this becomes the flat list ["1" "2" "3" "4"]
And as usual, automagic stuff is erratic, so be careful:
> xs <- c(1, 2, 3, 4)
> xs[1]
[1] 1
> class(xs[1])
[1] "numeric"
> xs[2] <- 2i
> class(xs[1])
[1] "complex"
> xs
[1] 1+0i 0+2i 3+0i 4+0i
> xs[1]
[1] 1+0i
ys <- c("1", "2", "3")
> class(ys[1])
[1] "character"
> ys
[1] "1" "2" "3"
> ys[1] <- 1
> class(ys[1])
[1] "character"
> ys[1] <- 1i
> ys
[1] "0+1i" "2" "3"
> class(ys[1])
[1] "character"
Note: This previous examples also demonstrate:
- The use of
class()
to determine the type of a variable; - The use of
[n]
to access individual elements of a vector; - More subtle, but the fact that vector indexing starts at 1, not at 0.
Let’s get back to this idea of implementing matrices, and see how we
can transform a flat list into a matrix. The dimensional meta-data can
be accessed and set via the dim()
function. The matrix()
function,
besides wrapping a call to dim()
, allows to re-arrange the input
vector so as to be able to interpret the dimension not just as a column
information (default), but as a row:
m <- c(1:6)
m
dim(m) <- c(2, 3)
m
dim(m) <- c(3, 2)
m
m <- matrix(c(1:6), 2, 3, byrow = TRUE)
m
dim(m)
m <- matrix(c(1:6), nrow=2, ncol=3, byrow = FALSE)
m
dim(m)
[1] 1 2 3 4 5 6
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[1] 2 3
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[1] 2 3
Note: Mind the use of the colon :
to generate
list of numbers, and the fact that we can used named parameters
without naming name, assuming we keep them in the correct order.
A variety of matrix operations are available. Here’s a small example
demonstrating the transpose (t()
), complex conjugation (Conj()
;
combining both, we have the dagger operator then), and matrix
multiplication (infix operator %*%
)
psi <- matrix(c(1+1i, 2+1i, 3+1i, 4+1i), 1, 4)
psistar <- Conj(t(psi))
psistar %*% psi
[,1] [,2] [,3] [,4]
[1,] 2+0i 3-1i 4-2i 5-3i
[2,] 3+1i 5+0i 7-1i 9-2i
[3,] 4+2i 7+1i 10+0i 13-1i
[4,] 5+3i 9+2i 13+1i 17+0i
CLI arguments
The full command line corresponding to the call to the R
executable
is available as a vector of strings via commandArgs()
:
(shell)# R -q --no-save
> commandArgs()
[1] "/usr/lib64/R/bin/exec/R" "-q"
[3] "--no-save"
> is.vector(commandArgs()); class(commandArgs())
[1] TRUE
[1] "character"
Parameters for a R script can be distinguished from parameters
to the R
executable by the use of the --args
flags on one
hand, and of trailingOnly = TRUE
parameter to commandArgs()
on the other:
(shell)# R -q --no-save --args foo bar baz --save
> commandArgs()
[1] "/usr/lib64/R/bin/exec/R" "-q"
[3] "--no-save" "--args"
[5] "foo" "bar"
[7] "baz" "--save"
> commandArgs(trailingOnly = TRUE)
[1] "foo" "bar" "baz" "--save"
> ^D
Note that --save
wasn’t interpreted by R
. Rscript
’s arguments
are conveniently wrapped with a --args
(remember, this is just a frontend
to the R
executable):
(shell)# echo 'commandArgs()' | Rscript /dev/stdin foo bar baz --no-save
[1] "/usr/lib64/R/bin/exec/R" "--no-echo"
[3] "--no-restore" "--file=/dev/stdin"
[5] "--args" "foo"
[7] "bar" "baz"
[9] "--no-save"
(shell)# echo 'commandArgs(trailingOnly = TRUE)' | Rscript /dev/stdin foo bar baz --no-save
[1] "foo" "bar" "baz" "--no-save"
Functions/Procedures
You’ve got a “regular” function(arg0, arg1, ...) { ... return(...) }
syntax to create anonymous functions; coupled with assignment to
actually name the function. Here’s a recursive factorial:
fact <- function(n) { if (n <= 1) return(1) else return(n*fact(n-1)) }
fact(5)
[1] 120
Note: The use of parentheses after return
is mandatory.
The return
itself is optional.
Iteration, conditions
We’ve just saw recursivity, which provides us with elementary
iteration. When acting on vectors/lists, one can rely on the various
*apply()
implementation:
apply()
, which is a fold acting on either a matrix’s columns rows, or individual elements (“margin”)For example:Usage: apply(X, MARGIN, FUN, ..., simplify = TRUE) Arguments: X: an array, including a matrix. MARGIN: a vector giving the subscripts which the function will be applied over. E.g., for a matrix ‘1’ indicates rows, ‘2’ indicates columns, ‘c(1, 2)’ indicates rows and columns. Where ‘X’ has named dimnames, it can be a character vector selecting dimension names. FUN: the function to be applied: see ‘Details’. In the case of functions like ‘+’, ‘%*%’, etc., the function name must be backquoted or quoted. ...: optional arguments to ‘FUN’. simplify: a logical indicating whether results should be simplified if possible.
m <- matrix(c(1:8), 2, 4) > m [,1] [,2] [,3] [,4] [1,] 1 3 5 7 [2,] 2 4 6 8 > apply(m, 1, function(x) { cat("row", x, "\n") }) row 1 3 5 7 row 2 4 6 8 NULL > apply(m, 2, function(x) { cat("column", x, "\n") }) column 1 2 column 3 4 column 5 6 column 7 8 NULL > apply(m, c(1, 2), function(x) { cat("row/column", x, "\n") }) row/column 1 row/column 2 row/column 3 row/column 4 row/column 5 row/column 6 row/column 7 row/column 8 NULL > apply(m, 1, function(x) { return(sum(x)) }) [1] 16 20 > apply(m, 2, function(x) { return(sum(x)) }) [1] 3 7 11 15
lapply()
orsapply()/vapply()
: “regular” apply functions acting on either vectors or lists. The former returns a list (“list” apply), the two others return a vector. There are some further subtlety that can hardly be covered more thoroughly here than they are in the manual.tapply()
is yet another apply-like function, but this one allows to work on ragged-arrays (non-rectangular). Again, you may want to refer to the man page.
Besides, you’ve got all your basic loops but the for(;;)
:
# foreach
for (i in 5:1) { cat(i, "\n") }
5
4
3
2
1
# While loop; note the absence of ++/-- operators
> i <- 5; while (i > 0) { cat(i, "\n"); i <- i-1 }
5
4
3
2
1
Regular conditions work, with optional braces for single line, but there are some unusual syntactical subtleties:
# syntax error
if (x == 0)
0
else
x
# syntax error
if (x == 0) 0
else x
# syntax error
if (x == 0) {0}
else {x}
# OK
if (x == 0)
0 else x
# OK
if (x == 0) 0 else x
# OK
if (x == 0) {0
}else {x}
Installing packages
By default, packages are installed to /usr/lib/R/library
,
which shouldn’t be writeable by an ordinary user.
More precisely, packages are looked for in the locations
registered in/via .libPaths()
, which should contain
/usr/lib/R/library
by default.
There are two ways to “import” a package, require()
and library()
.
The former returns a boolean, allowing us to check whether the package
has been installed.
The following excerpt installs to a temporary location a package, if necessary, and loads it.
tmp <- "/tmp/R/lib"
.libPaths(tmp)
# mkdir(1) -p $tmp
# NOTE: the = signs below are used not assignments, but the
# syntax to set named parameters.
dir.create(tmp, recursive = TRUE, mode = "0755")
loadpkg <- function(p) {
# By default, require(foo) and require("foo") are equivalent:
# meaning, require(p) will try to install a package named "p", and
# not the package referenced by the value of the variable p.
#
# The character.only parameter allows to change that.
# The same behavior applies for library()
if (!require(p, character.only = TRUE)) {
install.packages(p, repos='http://cran.us.r-project.org', lib=tmp)
# Now that the package is installed, we can import it, and as we
# have no use for "require"'s returned value, we can rely on "library"
# instead.
library(p, character.only = TRUE)
}
}
loadpkg("xtable")
\(\LaTeX\) matrix export: xtable, bashful
It’s possible to export an R matrix to \(\LaTeX\) via the xtable
R
package. By default, the package is more intended to export general
tables: a few options are required to output a bare matrix.
It’s possible then to have a R script printing \(\LaTeX\) to a file, and \input{}
this file from your .tex
. Running the R script can be performed either
in a Makefile
, or directly in the .tex
via the bashful
for example. The former seems to be preferable as soon as you’ve got more than
one or two results, as I don’t think there’s some caching mechanism built-in
bashful
, and running extra-processes slows down the \(\LaTeX\) compilation.
I cover in particular complex (\(\mathbb{C}\)) matrix export in a separate article; let me inline the relevant code sample:
library("xtable")
xtable <- function(x, ...) {
if (class(x[[1]]) == "complex") {
z <- sapply(x, function(y) {
if (y == 0) return(as.character(0))
if (Im(y) == 0) return(as.character(Re(y)))
t <- ""
if (Re(y) != 0) t <- as.character(Re(y))
if (Im(y) == 1) {
if (Re(y) == 0) t <- "i"
else t <- paste(t, "+i")
} else if (Im(y) == -1)
t <- paste(t, "-i")
else {
if (Re(y) == 0) t <- paste(Im(y), "i")
else if (Im(y) > 0) t <- paste(t, "+", Im(y), "i")
else t <- paste(t, Im(y), "i")
}
return(t)
})
dim(z) <- dim(x)
xtable::xtable(z, ...)
} else
xtable::xtable(x, ...)
}
sigma_y <- matrix(c(0, 1i, -1i, 0), 2, 2)
m <- xtable(
sigma_y,
align=rep("",ncol(sigma_y)+1))
digits(m) <- xdigits(m)
print(m,
floating = FALSE, tabular.environment = "pmatrix",
hline.after=NULL, include.rownames=FALSE, include.colnames=FALSE
)
% latex table generated in R 4.3.0 by xtable 1.8-4 package
% Thu Jun 15 01:48:46 2023
\begin{pmatrix}{}
0 & -i \\
i & 0 \\
\end{pmatrix}
Script example
#!/bin/Rscript
# Quirky, but does the job.
# Computes expectation values for a composite
# system built from two quantum spins.
#
# Used with either with 1 or 5 arguments:
# <operator> [wave-function (uu, ud, du, dd)]
#
# Each argument is parsed and evaluated, so you can
# use "tau_x %*% sigma_x" as an operator for example.
#
# If only an operator is provided (1 arg), its 4x4 matrix form
# is displayed as LaTeX on stdout.
#
# Otherwise, evaluates the expectation value for the operator
# for a system in a state described by the wave-function.
tmp <- "/tmp/R/lib"
.libPaths(tmp)
dir.create(tmp, recursive = TRUE, mode = "0755")
loadpkg <- function(p) {
if (!require(p, character.only = TRUE)) {
install.packages(p, repos='http://cran.us.r-project.org', lib=tmp)
library(p, character.only = TRUE)
}
}
# 2x2 identity and Pauli matrices
id2 = matrix(c(1, 0, 0, 1), 2, 2)
pz = matrix(c(1, 0, 0, -1), 2, 2)
px = matrix(c(0, 1, 1, 0), 2, 2)
py = matrix(c(0, 1i, -1i, 0), 2, 2)
# Ugprade subsystem spin component operators to the composite system
sigma_z = kronecker(pz, id2)
tau_z = kronecker(id2, pz)
sigma_x = kronecker(px, id2)
tau_x = kronecker(id2, px)
sigma_y = kronecker(py, id2)
tau_y = kronecker(id2, py)
# Expectation value computation
avg <- function(L, psi) {
return((Conj(t(psi)) %*% L %*% psi)[1])
}
# Evaluate CLI arguments (e.g. "interpret" "tau_z %*% sigma_z"
# as the corresponding observable)
#
# Note that we need a list (lapply), as some of the arguments will be
# vectors already.
args <- lapply(
commandArgs(trailingOnly = TRUE),
function(x) { return(eval(parse(text=x))) }
)
# LaTeX export
loadpkg("xtable")
# See https://tales.mbivert.com/on-exporting-r-complex-matrix-latex/
xtable <- function(x, ...) {
if (class(x[[1]]) == "complex") {
z <- sapply(x, function(y) {
if (y == 0) return(as.character(0))
if (Im(y) == 0) return(as.character(Re(y)))
t <- ""
if (Re(y) != 0) t <- as.character(Re(y))
if (Im(y) == 1) {
if (Re(y) == 0) t <- "i"
else t <- paste(t, "+i")
} else if (Im(y) == -1)
t <- paste(t, "-i")
else {
if (Re(y) == 0) t <- paste(Im(y), "i")
else if (Im(y) > 0) t <- paste(t, "+", Im(y), "i")
else t <- paste(t, Im(y), "i")
}
return(t)
})
dim(z) <- dim(x)
xtable::xtable(z, ...)
} else
xtable::xtable(x, ...)
}
if (length(args) == 1) {
x <- xtable(args[[1]], align=rep("",ncol(args[[1]])+1))
# 1.00 -> 1, in our peculiar case
digits(x) <- xdigits(x)
print(x,
floating = FALSE, tabular.environment = "pmatrix",
hline.after=NULL, include.rownames=FALSE, include.colnames=FALSE
)
q()
} else if (length(args) != 5) stop("Incomplete wave function")
x <- avg(args[[1]], unlist(args[2:5]))
# avoids some 0+0i; refinable
if (x == 0) {cat(0, "\n")} else {cat(x, "\n")}
Comments
By email, at mathieu.bivert chez: