Linux Goodies

In Pursuit Of The Perfect O/S



A Review of the Free R Mathematics Language

Amazon Computers

R, the Free Statistics Language

R mesh plot

The plot you see above is an example of a mesh plot produced by the matrix scripting language R.

R is a language created by and used by mathematicians. R is an open source clone of the commercial language S. R is an object oriented language, and every declared function is an object. The object oriented nature makes some syntax seem peculiar, but that's because things are being done by object functions instead of language intrinsics as with other most languages.

Object oriented languages can accomplish some things that are difficult or impossible in non object oriented languages. As an example, the object nature of R provides the ability to pass a function name to a function, and have that passed function executed within the called function.

In Linux, R has a simple command line interface. The Windows version comes with a GUI interface. R has a nice history mechanism, allowing the user to scroll back through, and modify if desired, previous commands. The history is not maintained from one invocation of R to another.

R is heavily populated with statistical functions, and also contains some signal processing functions such as filtering, interpolation, and regression. R programs can be ran in batch mode, but when that is done there is no user interaction. In batch mode, input parameters must be from files, and output is to a file. The user can create a file in their home directory named .Rprofile and use it to auto-load any R modules of their own that they often used. That makes R routines quickly accessible by entering the language with just R, and then executing the functions.

R Syntax

R has the most convenient methodology for providing function calls with variable arguments. The argument list in a created function simply has the default values for optionally passed parameters defined in the function declaration statement. No logic needs to be created in the function to determine if parameters were passed or not. Most other math languages require the programmer to do some coding within a function to deal with parameters that may or not be passed. The defaulting or parsing of passed parameters in R is done completely by the R system. The following function declaration illustrates the method.

scalemat <- function(mat, sf=2){
m2 <- mat * sf

In the code example, the function scalemat will return a matrix scaled by a provided scale factor (sf). In the example, the sf parameter is defaulted to 2. Within the code the parameter sf can simply be used, and it will either have the defaulted valued if not passed to the function, or the passed value. Notice also that no return statement is used. The last variable created in a function is automatically the one passed back to the calling routine.

R has a list construct that can be used to package multiple non-similar arguments under one name. The elements of the list can be named when created, and the list can be returned by a function. The individual elements of the list can be accessed by index or by name, if names were assigned. The following code snippet illustrates the use of a list. Variable x is assigned a list of 3 elements. Element a is a scalar, element b is an array, and element c is a string.

x <- list(a=10, b=c(10,11,12), c="label")
u <- x$a
v <- x$b
w <- x$c

Unlike MATLAB, R does not auto-load user defined modules just because they are referenced. Modules have to be loaded with a source command or listed in the .Rprofile file. I find it works best to package related routines into module libraries so that when an R module is loaded with the source command, all relevant routines are loaded at once.

R uses some interesting syntax that takes a bit of getting used to. Even the equal sign equation nomenclature commonly used in other languages is different in R. A couple of example R equations are listed below:

x <- c(10,20,30,40)
y <- rbind(c(1,2),c(11,12),c(15,19))

As you can see, the <- operator is used, instead of the more common equal sign, to store values into variables. The first equation in the example stores an array of numbers into a variable named x. The c(...) operator is a function that creates the array. Notice the second equation. It creates a 3 row, 2 column matrix. The c(..) operator makes arrays, and the rbind operator combines arrays and matrices into rows. There is a cbind operator that combines arrays and matrices into columns.

Unlike MATLAB and Octave, R mathematical operations default to scalar operations. Special operators are used to specify matrix operations. For example, the following example illustrates multiplying a scalar cell by cell multiply of matrix A by matrix B, then the matrix multiply operator.

Scalar multiply:
C <- A * B

Matrix Multiply
C <- A %*% B

For help, the user can type help(topic) for specific documented help topics, or"subject") for a list of possible help topics pertinent to the supplied subject.

R comes with many function libraries, and even more can be obtained from the Comprehensive R Archive Network known as CRAN. The CRAN website offers documentation, FAQs, and downloads or many contributed packages.

R Input/Output

R has a flexible, though different looking, collection of I/O routines. It was fairly easy, for example, to create a function that could examine ASCII files that consist of columns of numbers separated by some character, such as a tab, a comma, or a colon. The routine can determine the separator and with a single instruction read in the entire file as a matrix with the scan command, specifying the separator. It is likewise easy to read in an entire binary file as a matrix using the readBin command. Various data types can be read with the readBin command, and the user can specify if the data file to be read is little or big endian. This feature allows R to work with data that may have been created on a different computer platform.

When I was presented with the problem of analyzing some spreadsheet data, I was pleasantly surprised to find that R has a read.csv command that can read and create an intelligible list of data from comma separated files (CSV files). R can also make many spreadsheet style graphs such as bar charts and pie charts. An illustration can be seen on the Linux Survey page. If you check out the R graphs, take time to fill out the survey if you wish.

For output, R has the commonly used print command for writing to the screen, but with a twist. To output multiple values in a single print command, one must put the message together with the paste command first. Below is an example of how to print out a message containing text and values:

# How to print a composite message
print(paste("X = ",x,"Y = ",Y))

You might notice in the the previous example that the # character signifies to R that the following information is comment only.

Writing to ASCII files is done with the write command, which has many optional arguments to control output format. To write out a binary file, one uses the writeBin command, which lets the user output integers, floating point, complex, and other data types. The writeBin command also lets the user specify whether the output is to be in little or big endian.

R Graphics

The graph at the upper image is an R color contour map of the sinx/x function. R can make labeled line contours as well. The graph you see at the bottom illustrates an R color contour map with a line contour map overlay. This shows that the line contour and color contour features can be combined. Users can make such maps with many options, including lines only, colors only, and different color schemes.

R scatter plot

R has an extensive integrated graphics library for doing 2D and 3D graphics. Being statistical in nature, R also offers a number of plots that help in the statistical analysis of data. For example, given a matrix with related data in columns, a simple call to a routine called pair will produce a window full of scatter graphs that plot each column versus each other column. This allows a quick qualitative determination if any of the functions are correlated with one another.

R also provides mechanisms for obtaining mouse position and button information from graphics windows. R can present image graphics as well, and do so with respectable speed. While R doesn't come with image I/O routines, the Comprehensive R Archive Network has download-able R routines for loading and saving fits file formats. Fits is a commonly used color capable graphic format used in astronomy.

As it happens, there are many utilities in Linux that can handle fits files, along with about every other graphics format you might have heard of. The utility convert from the imagemagick package is one that can convert just about any kind of graphics format to any other graphics format. In the process, it can also enhance, crop, or provide many other operations on the image during conversion. So having even just the fits file format available for R is sufficient in Linux, given the capabilities of the convert utility to convert anything else to fits format.

I converted a little utility from PDL to R that steps through a sequence of web cam astro-photos allowing me to select a reference point for frame alignment, and if desired to crop images. The graphic display of the images is reasonably fast, and mouse control is easy to use. I created this program in several matrix languages,and found that not all did the task well. But R handles the problem nicely, giving me a handy utility for cropping, aligning, and stacking lunar and planetary images. See 6 Inch Reflector Astrophotography for examples of the images this technique can produce.


In summary, I find R to be an excellent choice for a scripting matrix language. The syntax takes a bit of getting used to, but the speed and functionality of the language is impressive. It is well documented, and supports a wide variety of graphics presentation methods. It handles a wide range of data investigation techniques, including statistics, regression, filtering, and signal processing. It has flexible enough I/O capabilities to handle different data formats, making it quite applicable for data processing tasks. It is even capable of being used for some image processing.

Below is my subjective evaluation of some characteristics of R.

  • Freely available for MacOS, Windows, and Linux.

  • Very similar to the commercial language S.

  • Has a very large software archive (CRAN).

  • Especially good for working on statistical and time-series problems.

  • R is not limited to 2 dimensional arrays.

  • R has a richer variety of data types than many matrix languages, such as character, logical, integer, complex, and double.

  • R supports more data forms than just multi-dimensional matrices, such as arrays and lists.

  • R has a good enough collection of file i/o routines to allow a user to move files to and from most any external utility.

  • R can import comma separated variable (CSV) files from spreadsheets.

  • R has the easiest method of creating variable number of argument functions that I've ever seen.

  • R has 2D and 3D graphics support, and mouse clicks on graphs can return information to the R script.

  • R can make bar charts and pie charts in addition to common math language graphs

  • R has support to help in the generation of reports based upon analysis results.

  • R works well interactively and can be ran in batch mode.

  • Cons:
  • The use of the <- symbol instead of the more common = for data assignment can take a bit of getting used to.

  • While R is very fast at performing matrix operations, it slows down considerably when loops are used extensively.

  • For interactive use, R is a bit slow on high density graphs, like photographic images.

  • R doesn't directly read any graphic file formats, though Fits file packages are available from the CRAN archive. I found that adding PNM graphics file routines to be very easy. R can save a graph in jpeg,png,tiff, and bmp formats.

  • Because more data types are available, there's a steeper learning curve than with say, Octave.