Matrix Languages Head to Head
Contents
Intro
Matrix Language Advantages
Language Availability
Language Speed
Language Styles
Language Features
Benchmark Program
Language Syntax
Octave Notes
Scilab Notes
Euler Toolbox Notes
Yorick Notes
R Notes
PDL Notes
A Benchmark and Subjective Assessment of Matrix Languages
Are you searching for the best matrix language for your upcoming project?
How about for those testy math or engineering classes? Perhaps you're just
browsing, and would like to learn what matrix languages are all about. I've
reviewed a number of matrix languages on different pages of this web site, but
on this web page I will combine the individual reviews into more of a head to
head review.
That head to head will show results of a benchmark produced by each of
6 matrix languages, each running an image processing program on the same data.
I will also give my subjective assessment of each language as to what I see as
its particular strengths and weaknesses. Hopefully the assessment will give you
some insight for picking the language best suited to your programming style and
your projects.
Top
Why Matrix Languages in the First Place?
If you're new to the matrix language world, you may wonder why people in
math, engineering and science flock to such languages. The concepts offered by
matrix languages are discussed at Why Matrix Languages.
Basically there is a desire to handle numbers on a computer easily, yet with
speed. Easily tends to mean with a language that doesn't require you to do the
old edit, compile, link, and run sequence. Interpretive languages avoid all
that, so programming with them is much easier and code can be developed much
faster. You just write code and execute.
The potential problem with interpretive languages is lack of speed. It takes
much longer for an interpretive language to process large amounts of data,
especially if a lot of looping is involved. The solution that matrix languages
offer is speed. They accomplish that by having compiled routines that can
process large matrices of data very quickly. In that way, the looping is down
in the heart of compiled functions.
There are two trade-offs to matrix languages. One is that you must learn to
vectorize your solutions so that you do little to no looping within your
programs. That means you must learn to express your solutions as matrix
equations. The other limitation is that in order to do this speed magic, matrix
languages must potentially hold large amounts of data in memory. One nice thing
is that modern computers tend to have huge amounts of memory, making matrix
languages very useful and popular.
Here's a vectorize example. The first table shows a program to create
one constant random vector and a large number of additional vectors to be
dotted (dot product) to the constant vector. The program prints out the result
on each 50000th cycle. The first language listing may look strange. It's a
Forth like tool I wrote years ago before computers were big enough to run
matrix languages effectively. The program is interpretive, working on a record
of data at a time, and writing out the result as each record is processed. No
big memory needed, and no compilation either. The catch with this utility? It
uses a Reverse Polish Notation (rpn) language.
Non Vectorized Solution in Forth Like Language
0 = n | : initialize counter |
rnd rnd rnd v= e | : make initial random vector |
do | : start loop |
n 1 + = n | : increment counter |
rnd rnd rnd v= b | : make new random vector |
v@ b v@ e vdot = x | : do dot product |
n 50000 == if a . x . cr 0 = n then | : print, reset counter |
loop | : repeat loop |
Though the language is likely unfamiliar, the concept is clear. Create a
constant vector, then in a loop, produce a new random vector and dot product
the constant vector with the new one. On a specified count, print results to
the screen. In this cryptic language, the . means print. Each command
has to be re-interpreted on each loop. Even though the language does a
pseudo-compile to speed things up, it still can't keep up with a compiled
program. But it is very easy to code and run, and is about as fast as a modern
spreadsheet.
Here's a vectorized solution for the Euler Toolbox program:
Euler Toolbox Vectorized Solution
n=1000000; | : n = number of vectors |
y = random(3,1); | : create random row vector |
x = random(n,3); | : create matrix of n vectors |
z = x.y; | : dot all rows of x with y |
i = 1:50000:n; | : create index for every 50000th |
z[i] | : print each 50000th result |
Note that there are no explicit loops in the vectorized version. The number
of random vectors needed are generated with a single statement. All dot
products are produced with a single statement. The indices of the output
values is generated in single statement. Just referencing the resulting vector
(z) with the index variable and no trailing semi-colon lists every 50000th
result.
Get the idea? To vectorize, you create matrices of results with
matrix operators and matrix equations. There may be no loops in the
program at all. The result is a program that runs very fast.
Top
Decide on Your Criteria
What's Available?
To choose the best language for your purpose, it seems to me that there are
about 4 things to consider. First of all, you have to consider what's
available. I'm assuming we're talking free languages here, and Linux has
likely the most available free languages. In this article, I discuss Octave,
Scilab, Euler Toolbox, Yorick, R, and the PDL (Perl Data Language).
There are certainly others, including but not limited to
Freemat,Tela, and NumPy. I've reviewed all of the
languages compared on this page on other pages of this site, in addition
to Tela:
Octave
Euler Toolbox
Yorick
R
PDL
Scilab
Tela
Top
Language Speed
There are a lot of benchmarks out there for most of the common languages.
Some, like the Euler Matrix Toolbox (EMT for short), are less represented.
I wrote one application for image processing that works languages pretty
hard, and converted that to each of the languages discussed here. Details
are presented later, but for this particular task, impressions are
as follows, listed in order of speed:
Yorick | : Fast, nearly 3x faster than the others |
Octave | : Moderately fast, 2.4x slower than Yorick |
PDL | : About 3x slower than Yorick |
Euler Toolbox | : About 3x slower than Yorick |
R | : About 3x slower than Yorick |
Scilab | : About 5x slower than Yorick |
In truth, you should only use this benchmark data as a rough guide. It's
only one comparison, and though it used a number of matrix functions in each
language, surely isn't extensive in examination. Other tests using different
features will likely give different results. But it is possibly useful as you
begin your search, if speed is an important criteria for you.
Top
Style
You might be most interested in programming style. Each language is different
in that regard. Some may look like what you've used before, others may look
quite foreign. The following table gives some gross insight into to the
respective language styles (more detail later):
Octave | : Very Similar to Matlab, Fortran-like |
Scilab | : Somewhat Similar to Matlab |
Euler Toolbox | : Basic-like, simple and easy to learn |
Yorick | : Very C-like |
R | : Roughly C-like |
PDL | : C-like, but PDL objects are unique |
Top
Features Emphasized
It may be useful to consider what features may best suit your particular
feature needs, irrespective of language style. Each language is good in a
general purpose sense, like general data processing. But each excels at
something. The following table gives some hints:
Octave | : Signal Processing & file i/o, large user base |
Scilab | : Signal Processing, Xcos symbolic utility |
Euler Toolbox | : Simplicity and instructional ability |
Yorick | : Speed and scientific support |
R | : Statistics, has large user base |
PDL | : Image file flexibility and all-in-one shop |
That's very concise and likely in need of elaboration, which will come
later.
Top
The Benchmark
I've worked with all of the languages compared in this article in the past,
with the exception of the Euler Toolbox. I'd played with it before, but I
thought I needed to take on a couple of bigger projects to get more familiar
with it. One of the programs I wrote in the Euler Toolbox was a new look at an
old Yorick program. The Yorick program is one I've used to process lunar and
planetary images taken with some of my telescopes and my Celestron NexImage
astro-camera. Some examples of processed images can be viewed at ETX Astro Photos.
The camera is a web cam modified for astro-photography use. It slides into
the focuser of a telescope, replacing the eyepiece. It produces avi
movie files of my selected targets. My goal is to align and correlate each
successive frame with the first frame of a target movie, and ultimately average
all the acceptable (chosen by correlation) frames together to produce an image
free of web cam pixelization, and mostly free of atmospheric distortion. In
matrix language programming, the result isn't a big program in lines of code,
but a program that gives the languages lots of work to do, as correlating
images takes a lot of mathematical processing.
I don't believe any of the matrix programs I use can read avi movie
files directly. To solve that problem I used the Linux mplayer program
to split out each frame of the movie to a pnm image file. The pnm family
of file structures are very simple formats that some of the matrix languages
could already read, and is easy to code in the languages that lack a pnm
reader.
The old Yorick program is an early prototype, and as I experimented and
tuned my solution, the code became pretty unreadable. But the concept that
evolved is simple enough, so I wrote the Euler version from scratch, taking
advantage of all that I'd learned in making the prototype. The result is a
program of much cleaner code, and it produces images every bit as good as the
ones produced by the old Yorick program.
Since the program isn't too big in the sense of lines of code,
and does employ quite a bit of file i/o as well as math calculations,
I thought it would make a good exercise for programming again in the
other matrix languages I have available. In addition, it seemed that
it would provide a pretty good benchmark for comparing the languages.
The exercise also proved that though each of these languages has some
emphasis that makes it most suitable for certain problems, all of the languages
are good for general purpose work. In two of the languages, Euler Toolbox and
Octave, I did have to write routines to handle image files, but otherwise all
languages had the features that could be packaged to do what I needed.
Though I haven't looked, I suspect that I could have found image file
i/o routines for Octave.
The Benchmark Program
The goal of the benchmark program is to improve a single image, say of the
moon, like this:
Raw Albategnius Crater Image
You might think you could just use a photo manipulation program like Gimp to
sharpen the raw image, but that doesn't work. Even just a small amount of
sharpening on a single image will produce terrible pixel noise as shown
below.
Sharpened Single Frame
But if a few dozen images are aligned and averaged together, the graininess
goes away, and sharpening produces images that reveal the detail nearly down to
the resolution of the particular telescope.
Processed Albategnius Crater Image
The Benchmark Results
The much cleaner Euler Toolbox program was translated into the PDL language,
the R language, the Yorick language, the Octave language. and the Scilab
language. In the benchmark test, each program processed 154 images of Jupiter,
taken with my NexStar 5SE and
the Celestron NexImage camera. Each image (frame) was 640x480 pixels. Each
image was aligned with the first image, and a correlation coefficient
calculated as to correlation with the 1st image. Then each program used a
correlation criteria to select images to combine into a final output. It's
possible that I didn't use each language's optimal way of solving the problem,
but I did solve it the same way in each. The table below shows the time
required for each program to do the required operations.
Language | Time (sec) | Yorick Ratio |
Yorick | 34 | 1.0 |
Octave | 83 | 2.4 |
PDL | 106 | 3.1 |
Euler | 110 | 3.2 |
R | 116 | 3.4 |
Scilab | 199 | 5.8 |
The Yorick Ratio column shows the ratio of each language's time
to the Yorick solution time, which was the fastest. So Octave, for example,
took 2.4 times a long to solve this particular problem than did Yorick.
I've typically found this type of speed result to be true. Yorick often
outperforms the other languages on the types of problems I work on, though just
doing specific matrix tests sometimes doesn't reveal that. I was pleasantly
surprised to find that the old GTK Linux version of the Euler Toolbox kept up
well with the speed of the other languages. The difference in speed on this
problem between the Euler Toolbox, R, and the PDL is quite insignificant.
Of course, this is just a single benchmark, but it includes file i/o,
matrix math, and graphics. Certainly applying the languages to a different
problem might yield different results. But if speed is something of a
concern, this test might suggest the order in which you consider
languages discussed here.
The bottom line is, for most problems there isn't a terribly significant
speed difference between the languages tested. I suggest looking deeper to pick
your language.
Top
Some Syntactical Differences
If speed isn't a game changer, personal programming preferences and style
might be. The examples shown below are of a simple and contrived problem, to
compute miles per gallon given distance and gallons, and return the input data
as well as the answer. The multiple return requirement is just to show how the
different languages make that available.
All of the languages can simply pack compatibly sized objects into a larger
matrix and return that. But in this contrived case I wanted to show how in
principle you could return multiple arguments that may not be compatible enough
to share a matrix. In all examples, the names passed and returned don't have to
be the same, just match in number of arguments. Here's how the language
functions look:
Function Declaration - Euler Toolbox
function getmpg(dis, gal)
mpg = dis/gal;
return {dis, gal, mpg};
endfunction
To Use Function:
{dis, gal, mpg} = getmpg(100,20);
|
Very straight forward, much like Fortran or BASIC might look. Notice that
when multiple arguments are returned, they are enclosed within curly brackets.
Likewise, the call statement must use curly brackets with the expected number
of returned items.
Function Declaration - Octave
function [dis, gal, mpg] = getmpg(dis, gal)
mpg = dis/gal;
endfunction
To Use Function:
[dis, gal, mpg] = getmpg(100,20);
|
As you can see, Octave doesn't look much different from Euler Toolbox,
except the return arguments are indicated in the function declaration line, and
there is no return statement. Whatever is listed in the declaration will be
returned.
Function Declaration - Octave using Structure
function x = getmpg(dis, gal)
x.dis = dis;
x.gal = gal;
x.mpg = dis/gal;
endfunction
To Use Function:
x = getmpg(100,20);
To access values:
x.dis for distance
x.gal for gallons
x.mpg for mpg
|
Octave can also return different data types in a single container
using it's structure technique. While in this case all variables
were scalars, they could be different in size and type and be returned
in this manner. Similarly, Octave can return a cell array with
cells referencing different types of elements, and a cell2mat
routine has to be referenced to dereference the cell array references.
Function Declaration - Scilab
function [dis, gal, mpg] = getmpg(dis, gal)
mpg = dis/gal;
endfunction
To Use Function:
[dis, gal, mpg] = getmpg(100,20);
|
The general layout of functions in Scilab, as shown here, is quite like
(exactly like in this small case) Octave. The function declaration line is the
same, and the reference to the function is the same as in Octave
Function Declaration - Scilab using List
function x = getmpg(dis, gal)
mpg = dis/gal;
x = list(dis, gal, mpg);
endfunction
To Use Function:
x = getmpg(100,20);
To access values:
dis = x(1);
gal = x(2);
mpg = x(3);
|
Scilab can also return dissimilar elements in a list. While in this case
all returned variables are of the same type and size, with a list that's not a
requirement. You can then access the returned variables of the list by
index. While the list extraction nomenclature in Scilab is different than
that in R, the list concept in both languages is similar.
Function Declaration - Yorick
struct data{double *dis, *gal, *mpg;}
func getmpg(dis, gal){
mpg = dis/gal;
ret = data(dis=&dis, gal=&gal, mpg=&mpg);
return ret;
}
To Use Function:
x = getmpg(100,20);
To access values:
*x.dis
*x.gal
*x.mpg
|
Yorick looks a bit different. First of all, Yorick can only return one
argument, and if multiple values of different shape are desired, they can be
passed back as a C-style struct. Note the use of C-style pointer
nomenclature. You can get around the struct by having a function simply
pass back an array that holds pointers to the multiple arguments you wish
returned, as shown below:
Function Declaration - Yorick, w/o struct
func getmpg(dis, gal){
mpg = dis/gal;
ret = [&dis, &gal, &mpg];
return ret;
}
To Use Function:
x = getmpg(100,20);
To access values:
*x(1)
*x(2)
*x(3)
|
Again, this is a contrived situation, because the 3 scalar values could
easily be handed back in an array without the pointers. But this example shows
how you could hand back multiple variables by reference that may not be
scalars, rather arrays or matrices of different size. The passed arrays would
then be dereferenced with the pointer (*) indicator and the appropriate pointer
array index. Note that the the struct gives a name to each item, the simple
pointer array does not. But either can be used to return more than one
argument.
Function Declaration - R
getmpg <- func(dis, gal){
mpg <- dis/gal;
list(dis=dis, gal=gal, mpg=mpg);
}
To Use Function:
x = getmpg(100,20);
To access values:
x$dis
x$gal
x$mpg
|
R has no return statement, the last value calculated or listed before the end
of the routine is what's returned. If multiple values are desired to be
returned, a list can be created. The items in the list don't have to be named
as in the above example. You could just use list(dis, gal, mpg). If not
equated to names in a list, the arguments can be accessed by indexing the
result, like x[1] for dis, x[2] for gal, etc.
Function Declaration - PDL Returning Array
sub getmpg{
my ($dis, $gal) = @_;
or
my $dis = shift;
my $gal = shift;
my $mpg = $dis/$gal;
return($dis, $gal, $mpg);
}
To Use Function:
($dis, $gal, $mpg) = getmpg(100,20);
or:
@x = getmpg(100,20)
Then to access values:
$x[0]
$x[1]
$x[2]
|
Perl, as you can see, is different. The sub statement is what
declares a function (or subroutine). In most modern languages, passed values
are automatically placed into local variables. In Perl, an array of arguments
named @_ is always passed to subroutines. Programmers must
either use the shift statement to get values or variables from the array
into variables, or use the (.....) method of getting values
from the @_ array.
A potential gotcha in Perl is that variables are by default global.
So the my operator explicitly declares a variable to be local. If
multiple values are to be handed back, they can be put into an array (between
parentheses). If an array is returned, a single array variable may receive the
return array, and the variable must begin with the @ symbol to designate
it as an array variable. Individual variables ($ variables) can be used within
parenthesis to directly unpack the array into individual variables rather than
using an @ array variable that will have multiple elements.
The above example places the variables $dis, $gal, and $mpg into an array
for the return value. The array elements however, can in general be different
size or types of elements. In this case they are all single valued numeric
variables.
Function Declaration - PDL Returning Hash
sub getmpg{
my ($dis, $gal) = @_;
my $mpg = $dis/$gal;
return("dis"=>$dis, "gal"=>$gal, "mpg"=>$mpg);
}
To Use Function:
%x = getmpg(100,20);
Then to access values:
$x{"dis"}
$x{"gal"}
$x{"mpg"}
|
In the above illustration, I show another method of passing back multiple
values from a perl routine. Multiple elements can be passed back in a
hash construct, which is much like what other languages call a record or
structure. Notice that as perl uses the sigil $ to indicate a single
valued variable or piddle, a @ to indicate an array, it uses % to
indicate a hash.
Creating a hash looks much like creating an array except that
labels are associated with variables in the hash. Then when using the hash, the
labels can be used to reference the elements rather than referencing by numeric
index.
As indicated in the description of the contrived problem, each of these
languages can make it simple to just return such simple scalar variables in an
array, like return [dis, gal, mpg]. The nomenclature for each language
is slightly different, but all allow such a simple solution. The following
examples show how in each language you can put all 3 scalar variables into a
simple array called x, so the single array variable x can be returned. Thus, no
need in this case for multiple argument returns, lists, or structs.
Combine Variables into Array for Return
Euler Toolbox | : x = [dis, gal,mpg]; |
Octave | : x = [dis, gal, mpg] |
Scilab | : x = [dis, gal, mpg] |
Yorick | : x = [dis, gal, mpg] |
R | : x = c(dis, gal, mpg) |
PDL | : $x = pdl($dis, $gal, $mpg) |
Here's just a sampling of the syntax used
to do a few matrix operations in each of these languages:
Scaler Multiply Two Matrices
z = x .* y | : Octave |
z = x .* y | : Scilab |
z = x * y | : Euler |
z = x * y | : Yorick |
z <- x * y | : R |
$z = $x * $y | ; PDL |
Even in the simple example above, you can see a difference in nomenclature
between the languages. First of all, Octave and Scilab assume that basic math
operators are matrix operations. So if you simply want to multiply each
element of x by the corresponding element in y, Octave and Scilab need the
.* operator, the preceding dot designating the operation as a scalar
one. The other languages default basic math operators as scalar operators, and
use something different to indicate a matrix operation.
But whoa!. PDL variables are actually piddles or objects, and
designated by a leading dollar sign. A leading ampersand (@) designates a
variable as an array. See Review
PDL for more information about the PDL's peculiar syntax
Matrix Multiply Two Matrices
z = x * y | : Octave |
z = x * y | : Scilab |
z = x . y | : Euler |
z = x(,+) * y(+,) | : Yorick |
z <- x %*% y | : R |
$z = $x x $y | : PDL |
Now things get interesting. Since most of the languages use a simple
* as a scalar multiplier, how do they signify a matrix multiply? For
Octave and Scilab, it's simple -- they just drop the leading dot. For Euler,
dot is the multiply operator. And what's with Yorick and R? Here,
the PDL actually does something simple, it uses x as the operator.
Scale Sub-matrix by Scaler
x(1:2,1:2) *= 10 | : Octave |
x(1:2,1:2) = x(1:2,1:2)*10 | : Scilab |
x[1:2,1:2] = x[1:2,1:2]*10 | : Euler |
x(1:2,1:2) *= 10 | : Yorick |
x[1:2,1:2] <- x[1:2,1:2]*10 | : R |
($tmp = $x->slice("0:1,0:1") *= 10 | : PDL |
This operation doesn't vary so much, except for the PDL. You can see that
some languages use parenthesis for matrix indices, and some use brackets. Some
(Octave, Yorick, and PDL) have a *= operator which does both the
multiply and the store. Scilab, Euler and R don't have that convenience.
But again, what's up with the PDL? This strange nomenclature is the result
of the fact that PDL piddles aren't really matrices, but objects. So in some
cases, simple math statements don't work. Even indexing into a piddle uses the
interesting slice object function operator. In PDL there's also a
dice operator. One speaks of slicing and dicing piddles , if
that makes you interested. The odd nomenclature shown above for scaling
a sub-matrix portion makes more sense if broken into two statements:
$tmp = $x->slice("0:1,0:1");
$tmp *= 10;
|
That looks a bit less confusing. First a handle into the sub-matrix
is created, then that handle is scaled. The nomenclature in the previous
comparison list shows how one can combined both PDL statements into a single
statement. Also note that while all of the other languages index the 1st
element of an array or matrix with value 1 , PDL starts with 0 .
PDL also reverses the more common [row,col] indexing with [col,row], columns
being the first index. Some people think of it as [x,y].
There are certainly more syntactical differences between the languages, but
in general you'll find that Octave, Scilab, and Euler Toolbox nomenclature are
similar to one another. They all declare functions similar to the Euler sample
shown earlier. Yorick code looks very much like C code. R uses brackets like C,
but the language constructs are different.
The most unique is the PDL. Since the matrix holding container is an
object, not a matrix as in other languages, PDL leads to some quite strange
operations. The strange syntax has some advantages, but isn't so easy to learn.
Even with these simple examples, you can see that the PDL offers more than one
way to manipulate things. More to learn, but more likely you'll find a form of
expression that suits you.
Top
Other Than Syntax, What's Different
Each of these languages has something that gives it some unique
character or capability. That's true of other languages as well, these
just happen to be the ones I have access to.
The Octave Flavor
Octave, as you may have read, is highly compatible with the commercial
language Matlab. It was created back in the 90s by John W. Eaton, and named
after one of his professors (Octave Levenspiel). Many companies and schools use
Matlab, not so many individuals do. Why? Matlab isn't cheap, and all of the
languages presented here are free. So Octave is free. When I say Octave
is highly compatible with Matlab, I mean that it's syntax for programming and
even common i/o, graphics, and math functions are similar. If you know one,
you can easily learn the other. You can even easily port your code from one to
the other in most cases.
What else? Octave is heavily loaded with signal processing
functions. If you're going to do signal analysis or filtering, it's a good
choice. I worked for years doing time series analysis, and occasionally needed
some specific filter functions. Octave was my go-to for this because it has a
big collection of such functions.
You may have Octave available in your current Linux package installer. If
not, try GNU Octave. If you
happen to use Puppy Linux, try Slacko
Archive for a package called mathslacko.sfs. A kind Puppy Linux
user named Emil put together Gnuplot, Maxima, Octave, and R in an easy to use
package for Slacko Puppy, and the documentation says that at least parts of it
work in other versions.
Octave is the language of this selection that is most indicative of
what I call The Linux way of thinking. By that I mean it makes
the most of what's generally already available to augment its power.
Rather than have an internally developed graphics pack, it uses Gnuplot,
which is available on nearly every Linux system. Octave programs can
easily be ran in batch mode using the same technique as with a BASH
file, or any other shell scripting language. The first line of a
typical Linux script file has what's called a bang statement
which tells Linux what utility runs the script. It's called a bang
statement because its second character is an exclamation point.
Here's a few examples of that concept:
#!/bin/sh | : Indicates Bash Shell |
#!/bin/tcsh | : Indicates C Shell |
#!/usr/bin/octave | : Indicates Octave |
#!/usr/bin/perl | : Indicates Perl/PDL |
#!/usr/bin/yorick -i | : Indicates Yorick |
In each of the above cases, the code following the bang
statement would be interpreted by the indicated utility. None of the
other languages integrate with Linux quite that simply, though
Yorick comes close. Most, like the Yorick example, need not
only the bang statement, but some additional parameters. Some
also need some prep commands in the following code.
Octave basically works with 2 dimensional matrices. Functions can
return one or more arguments. In addition, Octave has a list type
container that can hold data of dissimilar size.
Octave is one of the most flexible in its support of file i/o. In
addition to simple load and save functions for matrices, it has a
decent complement of C style file functions, allowing the user
to likely read and write nearly any file structure, ASCII or
binary.
As mentioned before, Octave uses Gnuplot as it primary plotting
utility. Gnuplot can produce 2d and 3d plots with considerable
flexibility, and can also show images. The version I was using for
this project (version 3.2.3) didn't give me the ability to use the
mouse to select points on an image plot. It would do that on
a line plot, but not an image plot. Since I needed that ability for
this image processes task, I had my Octave program call an external
program (PDL) to do this task, and get the results from the
external utility.
The the user interface for most of these languages, including Octave, is
bleak to some programmers. They present a blank screen within which you type
commands. Of course, for all of them you can use an external editor to create
programs. In octave you generally create what's called m files. One nice
feature of Octave is that you can reference any m file function within an
interactive session or within a program without explicitly loading it. Octave
finds the functions if stored in the m file path. Most other languages require
you to load the functions or libraries by specific command, either by hand or
within functions that require them.
Top
The Scilab Flavor
Scilab is another language considered reasonably compatible with the
commercial language Matlab. Scilab was created by the French Institute for
Research in Computer Science and Automation. If not already available in your
package manager, you can get it at the Scilab Download site. It is
designed to be a general purpose matrix language, with particular support for
signal processing, statistics, and fluid dynamics. It includes full 2d and 3d
graphics support.
Scilab also includes Xcos, which provides a graphical method
of laying out dynamic process solutions, similar to a flowchart. It's
very handy for engineers in some fields of endeavor.
Scilab doesn't just present, even in Linux, a blank screen ready for
user input. Instead it includes a GUI, with a left side column showing
contents of the current working directory, and a right side column with
a variable browser and a command history window. In the largest column in
the center of the screen is the command window for user input.
Though like Octave in many ways, Scilab doesn't auto-load functions when
referenced, they must be loaded by the operator, or loaded explicitly within a
function that needs them. So as with the other languages described here other
than Octave (and PDL with the AutoLoader option), the general mode is to make
library files than contain several functions, and load them when needed.
Scilab works with 2 dimensional matrices. Functions can
return one or more arguments. It addition, Scilab has a list type
container that can hold data of dissimilar size.
Scilab has many of the Octave style i/o functions, but they are named
slightly differently, such as mopen to open a file instead of
fopen. In fact, a number of functions common in name between Octave and
Matlab have different names in Scilab. To help with that, Scilab includes a
considerable number of help screens to give guidance on converting Matlab code
to Scilab code, as well as tables which show which Scilab functions do what
specific Matlab functions do. Needless to say, Scilab, while comparable in
many ways, isn't as close syntactically to Matlab as is Octave.
I noticed that in working with Scilab, when it encounters a syntax error
when loading a file, it specifies the line number of the error without counting
any comment lines included in the code file. If several functions are in the
same library file, the error line indicated is not from the beginning of the
file, but from the beginning of the particular function with the error.
This makes coding difficult using an external editor. But Scilab includes a
syntax color highlighting editor that counts lines the way the Scilab loader
does. So it's better to use the Scilab editor to create code as it makes
debugging easier.
Top
The Euler Toolbox Flavor
I should note that my experience with Euler Toolbox is with the GTK Linux
version. That version is a bit behind the Windows version.
Windows is the O/S for which the developer currently does all his development.
You can run the up to date Windows version in Linux via Wine or Windows in a
Virtual Machine. I use the native Linux version because it is sufficient for my
needs, is more convenient, and runs faster because it's a native version.
My Puppy Linux version doesn't have the Euler Toolbox in its archive, so
I got it from the Debian archive. Puppy Linux
has a utility that can unpack a Debian deb file, and it installed
easily. The GTK Linux version is also available at the Sourceforge GTK Euler Toolbox site. The
Windows version is available at the Euler Math Toolbox site.
I consider Euler Matrix Toolbox (EMT) to be sort of an Octave Lite
language. It works with two dimensional matrices as does Octave, but
does not have a list type container (at least in Linux). It can,
as does Octave, allow a function to return more than one argument,
each can be a matrix, and each a different size.
EMT is not highly integrated into Linux, as is Octave. It is possible
to run EMT programs in a batch mode, but the only arguments you can hand to EMT
when starting it is the name of Euler function files. So a batch program
needing operator input would have to get it from a file or ask the user for
it.
EMT has far fewer file i/o features. It handles ASCII files very well. As to
binary, it can only read and write byte and integer files, not floating point.
So files from other languages must be converted to ASCII for EMT to use
them.
Unlike all of the other languages, EMT does not provide a function for
passing commands to the Linux system. There is a function named exec
ostensibly for this purpose, but though documented, it is not functional in
Linux. A solution I've used is to create a Linux pipe (fifo), and use a
receiver program (I use a simple bash script) that listens to the pipe.
Whatever comes to it from the pipe is handed to the system for execution. I
then created a simple EMT function that takes a string argument and writes it
to the pipe. This combination does what I expected the non-functional
exec function to do.
EMT has little string support. One thing I needed for the test
program was the ability to have an array of file names, one file name for each
frame I wanted to process. It was possible to read the list of file names in as
character arrays, and have a matrix of those, with each row holding the
characters of a file name. A simple function to re-concatenate the characters
into a string gave me the feature I needed. The Windows version of EMT has much
more string support, including the capability of keeping strings in arrays.
EMT has a pretty solid collection of math functions built in, including
general matrix manipulation, linear algebra, polynomial solutions, interval and
exact solution functions, and statistical functions. It also includes an FFT
utility.
What EMT has it spades is graphics capability. It can do 2d and very
impressive 3d plots, as well as display images. It gave me the ability to
interact with an image that I needed for the image processing program. EMT also
uses one of the easiest to learn language syntaxes, with few arcane
operators.
EMT also has, at least with respect to the other languages tested
here, a unique notebook feature. All commands entered into the
EMT window (as well as their outputs) can be saved as a notebook file.
Comments can be inserted anywhere within this notebook before saving,
to document the activity. The notebook files can be reloaded, and
the cursor keys will step up and down through the commands, skipping
over the command outputs, making it easy to run and/or modify and
run previous commands.
The creator of EMT, Dr. Rene Grothmann, is a professor. He created
EMT to use for mathematics instruction, and the notebook files
are a wonderful tool for that purpose. I've found that they are also
a handy tool for product development. I can reload a notebook and
be quickly back at a project with all of the exploratory commands I
used, and notes. Of course, the language is also fully capable of
doing industrial work as well.
What I like especially about EMT is that it is quite a small and easy
to install package. With some of the larger packages, it seems that when trying
to install them into Linux flavors that don't have them already in their
archives, you may spend a long time hunting down requirements. Less so with
EMT.
Top
The Yorick Flavor
Yorick is perhaps my favorite flavor. Probably because it uses
a distinctly C like style, and I've programmed a lot in C and
Java. So the Yorick style for the most part seems natural to me.
Debian and some of its derivatives have Yorick available through their
respective package managers. It's also available at the Yorick Homepage. I've successfully
downloaded from there and found it easy to get working in Puppy Linux.
As with Octave, the Yorick screen interface is very simplistic. Yorick
comes up with a blank screen, ready for the user to type instructions. In
fact, Yorick doesn't even have history support to allow the user to
up-arrow to previous commands. There is a utility commonly available in
many Linux versions that can solve this problem. It's named rlwrap.
You can use it with Yorick like this:
rlwrap -c yorick
Rlwrap runs Yorick, the interface looks the same, except now the
interface provides a history function as well as a file name
completion function.
The creator of Yorick is physicist David Munro, and Yorick reflects that in
the collection of science applicable functions. Yorick does 2 dimensional
matrices, but that's not the limit. It can go to 7 dimensions and perhaps
beyond. That's why, by the way, that the Yorick matrix multiply operator looks
so strange. When multiplying matrices of over 2 dimensions, there must be a way
to specify what's actually to be multiplied.
While Octave and EMT primarily work with double precision floating
point for all variables, Yorick has most of the data types available
in C. You can have byte arrays, integer arrays, string arrays, and of
course floating point arrays. This adds flexibility, but you must be
careful that you know what type of variable you're doing arithmetic
with. Dividing one integer variable into another integer value when you're
thinking floating point may well give you the wrong answer.
Yorick functions can only return a single value. That can be a number,
variable, matrix -- or a struct. The struct is useful if you need to
return a collection of dissimilar things, as in C. The awkward thing is that
the structs must be globally declared, but then can be used in functions. For
example, to declare a struct to store 2 double precision matrices, which may be
of different sizes,
struct data{double *x, *y;}
will do the trick. This struct declaration must occur outside of any
function declaration, but is then available for use within a function.
The struct capability makes Yorick quite flexible, but clearly having a bit
of C programming in your background is a plus.
Yorick has integrated graphics functions, all with cryptic names, like pli,
plt, plm, for plot x versus y, plot text, plot mesh. Strange names, but there
are plenty of routines (this is just a sample), so Yorick is quite plot
capable. It is likely that some help file searching will be necessary.
Its best to make libraries of related functions, as Yorick requires an
include command for each library. The include commands can be part of Yorick
programs, so the programs take care of themselves. Yorick is batchable, and
like Octave, EMT, and PDL, all functionally in a batch file is preserved. That
is, a batch file can interact with the user and present graphics.
Top
The R Flavor
R is another language that is very popular, and so is likely to be in
many Linux package archives. If you can find it there, its likely the
best way to install it. If not, you can get it at the Cran-R-Project. If you happen to use Slacko Puppy Linux,
you can get the mathslacko.sfs file at ibiblio Slacko Packages.
Hopefully the few code snippets give you a flavor of R syntax. It uses
brackets like C and Java, but is otherwise unique in it's expressions. In
general, the matrix manipulation functions look and work like those of most any
other matrix languages, other than the assignment operator used instead
of an equals symbol.
In Linux, R also presents just a blank screen for user input. It does
have built in history support. I believe the Windows version has
a GUI interface.
R can have variables of almost any type, byte, integer, and float. It
has conversion functions to convert from one to the other. As with the
other languages that have such variable flexibility, you must be conscious
of what type of variables you're doing math with. Dividing integers by
integers may not give you what you expect.
R has a reasonable assortment of file i/o functions, and I've never ran
across anything I couldn't read or write. The functions aren't as simply
constructed as the C style i/o functions of Octave however. R i/o functions
allow a number of options by name, which makes some calls a bit verbose. R can
read and write both ASCII and binary files. A sample of reading an ASCII file
follows:
x <- scan(fname,what=double(0),skip=3,sep=c("\t"));
|
Very flexible, but not easy to remember.
R, like Octave and PDL, has a large number of followers, and supplemental
functions for all three languages are probably available to simplify your
programming challenges. For example, there was a package for R that allowed me
to use it to load image files for my project.
R is heavily loaded with statistical math routines, and is used a lot in the
statistics field. I've used R for time series analysis and image processing
as well, so it makes a good general purpose matrix language.
R has a plentiful assortment of 2d and 3d graphics routines, making
it useful for generating plots of all manner. Like the i/o routines, the
plot functions often have a lot of named options one can use to tailor
a graph.
R can be ran in batch mode, but it's not as convenient as using Octave
for that purpose. To the R developers, batch means batch. That is,
R goes to a dark place and processes data, and writes it out to a file
(or files). But in batch it does not communicate with user, and cannot
present graphics as its running. That capability is only available from
the interactive R environment.
Top
The PDL Flavor
PDL is a matrix math extension than can be loaded into Perl. PDL is offered
by the developers as a One Stop Shop . By that they mean that within Perl
you have perhaps the most effective text handling tool available, as well as a
very capable report generating language. With the PDL extension added, you also
get a full featured math package. In other languages, you may often find
yourself resorting to an external program to convert data into some kind of
digestible form before getting at it with your chosen matrix language. With
PDL, you have all that capability in one package.
If the PDL isn't already in your Linux distro, check out the CPAN site for how to install the PDL in your
Linux system (you probably already have Perl). If PDL isn't in your Linux
package manager, it may be challenging to get installed. It that event be sure
you check out all of the module requirements of the PDL and be sure you have
them installed before attempting to install PDL. Then you will be ready to
install PDL.
If not already in your archive, the easiest way (assuming you've installed all
of the PDL requirements, is to use the cpan command. This command
directly gets the code from the CPAN site, compiles it and installs it. The
CPAN site also provides access to countless supplemental packages to further
enhance the PDL. Usually it can be installed as follows:
cpan -i PDL
After entering this command, go grab a soda or cup of coffee, sit back,
and relax. It will likely take awhile.
PDL provides a user interface utility called perldl. Perldl, like
most of the other Linux matrix languages, presents a blank screen ready for
user typing. The basic perldl doesn't have history support, but that's
available with an additional PDL extension. You can, as I do, just
run perldl with rlwrap as described in the Yorick section.
rlwrap -c perldl
You can also use the cpan directive, likely already in your system,
to install the ReadLine supplement to PDL, giving it a history
function. Just do the following:
cpan -i Term::ReadLine::Perl
PDL doesn't auto-load referenced functions in its basic form. You must
put loading commands (the use command) in library files so that
they'll have the functions they need. There is a PDL module named
AutoLoader that you can use , or even have always loaded
by default by editing the .perldlrc file. With this enhancement,
PDL, like Octave, will auto-load functions when they are referenced. This
works for individual function files, but libraries will still need to be
loaded by use commands.
PDL is also a language that can work with variables of many different
types. There are bytes, integers, strings, and floats. So the same precaution
mentioned before applies to PDL, be sure you know what kind of variables you're
doing math with.
Like Octave, PDL makes a lot of use of external utilities that already exist.
One of the main plotting utilities used by PDL is PGPLOT, for example.
Like Octave and R, PDL has a pretty large on-line repository of donated
libraries, possibly helping you with your problem solving. And, like Octave,
Perl (with PDL) is easily batchable. To run a Perl or PDL script in batch, make
the first line:
As with a shell script, Perl can be ran with just the appropriate
bang line, followed by the program. That's one of the things about
Octave, Yorick, and Perl that I like most. I like to create programs that each
do some particular thing to data, like transform it into another coordinate
frame, or compute regression coefficients on the data and output results. Then
in a shell script file I may run 2 or 3 scripts or programs in a row, each
doing their magic, with the final one giving the result I am seeking. With
this procedure, script languages can be mixed and matched to solve a problem,
with each offering its best capability.
Note that while the .perdlrc file can be used to set what modules to
autoload when perldl is used, batch executed files don't use the
.perdlrc file. So a batch perl/pdl program will need to have all
necessary use modules explicitly listed, including a use PDL;
instruction.
Perl is also quite flexible in file i/o capability, but like R
can be a bit of a struggle to get set up. ASCII i/o is pretty simple.
Binary quite doable, but not so simple. If you use some kind of
locally standard structure, you can set up your Perl routines and
forget about them. If you have to fiddle with the structure of the
i/o routines a lot, it can get tedious.
Perl is used a lot in the astronomy community for image processing
work. There, the odd syntax is actually advantageous. For example, if
you have a Perl piddle $x that contains an image,
$y = $x->slice("100:300,250:400");
|
doesn't do what you might expect. You may be thinking, as with
other languages, that $y is a new data object composed of a
sub-section of $x.
Wrong!
In the PDL nomenclature, $y is a handle to reference
the sub-section of the $x piddle. Changing $y in some
way will actually change that subsection of $x.
This ploy can make some image work very handy as well as saving memory.
Odd though it looks, it can be advantageous. If you actually want a
new matrix that is a subsection of the old one, not just a handle into it,
use the copy operator:
$y = $x->slice("100:300,250:400")->copy;
|
In support of its imaging processing functionality, PDL includes
the most comprehensive library of image file i/o functions. So if working
with images is your desire, the PDL may be your best solution. Perl is
also a language with a lot of variation is expression. If you like
expressive freedom, again Perl and the PDL may be your best solution.
I've successfully used the PDL for time series processing, regression, and
filtering, in addition to image processing. It, like the others, is well
adapted for general purpose matrix processing problem solving.
There are couple of drawbacks to using Perl as your go-to matrix language.
One is obvious from some of the code snippets -- there's likely some learning
to do. The other is, while the PDL offers an interactive interface
(perldl), it doesn't offer a development interface that's as handy for
program creation and testing as some of the other languages.
With any of the others, you may work on code in one window, then in the
matrix language window, reload and tryout functions by hand. A handy way to
know that each function is doing what you expect.
With perldl, you can load your program and exercise portions of it, but
you can't reload it. You have to get out of perldl and back in again
to load your changes. Better than conventional programming, but not as
handy as just being able to reload your code in place.
While that situation is true of libraries, it's not necessarily true if you
use the AutoLoader extension and individual function files. You can set
up your .perldlrc file to aways use AutoLoader, and to also set
autoloading to always rescan for new code versions like this:
$PDL::AutoLoader::Rescan=1, which makes PDL operate much like Octave.
With this setup, you can modify a .pdl function file, similar to an
Octave m file, and the next use of it in the perldl utility will reflect
the changes.
Though a bit odd in ways, Perl with the PDL is probably the most
flexible and far-reaching of matrix languages. Perl users know the term
TIMTOWTDI , which means: There Is More Than One Way To Do It .
That flexibility can take a bit of learning, but all you need is to pick
among the flexible nomenclature that fits your style, and never look back.
Top
|