In the Financial, Math, and Statistical areas,
users have a lot of choices. But I primarily work in the matrix area, and
that part is a bit weak, compared to the matrix languages I'm familiar with.
In addition to a generous collection of intrinsic functions, Google Sheets
has additional features that extend the spreadsheets' capabilities. It has, for
example, pivot tables, which is a feature that provides powerful ways to
build summary sheets from large data sets. Plus, I can publish sheets,
such as this one that presents a collection of Astronomical
Tables that are popular with amateur astronomers. Included are data and
summary sheets for the Messier, Caldwell, and Herschel 400 astronomical target lists.
The pivot tables feature was used to create the summary pages for the
different star object collections. Each summary is available as a tab at the
top of the main sheet, and each shows how many of each type of star object are
listed in the collection. Publishing it as a web page was just a click under
the spreadsheet File menu. This is just a very limited example of
what can be accomplished with the pivot tables feature.
Also available in Google Sheets is a powerful custom formatting tool that
lets users set color, font, and other formatting aspects on a cell or array of
cellls based on a wide range of criteria. Plus, I can apply names to ranges of
the spread sheet for a much easier way to reference ranges in equations and
functions.
Like most spreadsheets, Google Sheets is primarily cell
oriented. One enters equations into cells, and then copies them to all other
cells where the same equation is desired. For example: the equation
=sin(A1)*10 would automatically become =sin(B1)*10 if copied to
the cell to the right, or =sin(A2)*10 if copied to the cell beneath the
initial equation cell. By thus entering equations over ranges of a spreadsheet,
matrix operations can be accomplished.
Matrix languages tend to be, as the name implies, matrix
oriented. A single command is used to specify an operation over a range
of matrix elements. A matrix may be named X for example, and the
equation: Y = sin(X)*10 would automatically apply the equation to
every element of the X matrix, storing the result in the Y
matrix.
So matrix languages tend to be more terse, and equations treat entire
matrices as algebraic objects instead of each element (or spreadsheet
cell) as an object.
In Google Sheets' defense, however, I must mention that in the minimal
collection of array operators available, there is one called
ARRAYFORMULA, which allows a user to write an equation that refers to a
range instead of an individual cell. So if I had, say, 100 cells in the range
A1 to A100, the ARRAYFORMULA operator could be used as follows:
=ARRAYFORMULA(sin(A1:A100)*10), which would propagate 100 values in a
column, starting at the ARRAYFORMULA cell.
This operator is very handy for us old matrix language users, in that it
lets us express a range solution in a single equation. Very handy if I'm
tweaking parameters or trying different equations. Combining it with the Google
Sheets feature of letting the user apply names to ranges lets me express
equations similar to that of matrix languages. If, for example, the range
A1:A100 is named X using the range naming feature (Named
Ranges under the Data menu), the new formula would become
ARRAYFORMULA(sin(X)*10).
Google Sheets includes matrix multiply, transpose, and inverse functions in
its collection, so many matrix solutions can be obtained by using these
operators and the ARRAYFORMULA range operator. But I found these not to be
enough like the approach I generally took to solve problems with matrix
languages. So I explored the Google Sheets scripting feature. I found it
worked well, and for the matrix solutions I sought, was very easy to
implement.
Extend Google Sheets With Scripts
Along the top menu bar of a Google Sheets spreadsheet is an entry called
Tools. Clicking that reveals a number of utilities, one of which is
labeled Script Editor. That utility will open another tab to an editing
screen where I can create scripts tied to the spreadsheet from which it was
spawned. A typical function is already defined, to give one a start. The
scripts are written in Javascript, though rather than being executed on the
client
computer, they are executed on Google servers.
The new (as of 2014 or so) version of Google Sheets makes passing range
expressions, cell values, and fixed parameters easy for script users.
Individual cells and fixed parameters are passed to script functions much like
one is used to when writing functions in any language. Ranges are simply passed
as matrices, which Javascript can handle easily.
For example, if I wanted a utility to scale a range of values by some
constant, the code could look like this:
function msca(m1,sf){ |
var nr = m1.length; |
var nc = m1[0].length; |
var m = new Array(nr); |
var i; |
var j; |
|
for(i=0 ; i<nr ; ++i){ |
m[i] = new Array(nc); |
for (j=0 ; j<nc ; ++j){ |
m[i][j] = m1[i][j] * sf; |
} |
} |
|
return m; |
} |
As you can see from the coding example, a script looks similar to any
routine you've likely written in many languages like FORTRAN or C, and received
parameters are simple and easy to work with. The routine can be passed a range
of any size, as it can determine the size before operating. The nr or
number of rows value is the length of the matrix, and the nc or number
of columns values is the length of the first row (m1[0]). I can refer to
the new function in my spreadsheet as follows: =msca(A1:C200,20) to
scale a range by the factor of 20, as an example. The constant 20 could be a
value in a cell, say D1, and called thus: =msca(A1:C200,D1).
The cool thing about user script functions is that they can be called by
other user script functions, not just referenced within the spreadsheet. So I
can write pretty complex operations, broken up into a number of easy to read
and maintain functions. Or, as I've done, create a fairly extensive library of
extensions (or multiple libraries).
How To Create Script Libraries
I found it easy to create custom scripts with the script editor, but I
wanted to create some general scripts that I might wish to use in many
spreadsheets. I didn't want to have to cut and paste scripts from one
spreadsheet to another, I wanted to be able to reference a library.
Sure enough, it's possible to do that. From within the Script Editor
under the File menu, I can select Manage Versions, which
lets me save a script as a library. This library can be used by other sheets.
To use a saved library in a new spreadsheet, I need the Project Key
for the library. This key is found under the Files, Project
Properties click in the script editor used to create the library. Project
Properties opens a small window with information about the particular script,
including a rather long sequence of characters next to the label: Project
key.
In a spreadsheet needing access to a saved library, I select the
Resources menu from the spreadsheet's script editor window. From the
list display, I select Libraries. This brings up a window that allows me
to enter Project keys from libraries that are desired. This only has to
be done once, because the library selections are saved with the
spreadsheet. If I revise a library from the script editor associated with
the spreadsheet where I developed the code, I might need to visit
spreadsheets using the library, select from the respective script editors
Resources
selection, and select the latest revision of the revised
library.
To actually be able to call a function that's in a library listed in
Resources, I have to write a dummy
function that references the library
function, and then return the result. For example, I have a library I called
matlib, and within it is a function named mrand. The mrand
function returns an array of uniform random numbers. To use it, first I add the
library Project Key to my Script Editor Resources-Libraries menu. Then in my
script editor, I write a small function that calls the library mrand
function:
function myrand(nr,nc){ |
var m = matlib.mrand(nr,nc); |
return m; |
} |
The local spreadsheet defined myrand function references the library
version matlib.mrand and returns the result. The dummy function may use
the same name as the library function if desired, mrand in this case. I must
have such a local function for each library function I wish to reference from
my spreadsheet.
Since I've already added the matlib key to my resources, I can further
create any other functions that reference matlib functions. All matlib
routines are at my disposal because the matlib key is in my spreadsheet
resources.
Given how easy it was to write most routines and interface with the
spreadsheet, I created a healthy library of matrix routines to mimic much of
what I had previously done with matrix languages. I added routines to do the
basics, including add, subtract, scale, scalar multiply, matrix multiply,
inverse, and transpose matrices. Then I scanned through the matrix languages
with which I'm familiar, and added routines to accomplish some of what was
intrinsic in them, like matrix builders that will return an identity matrix, a
matrix of some constant, a matrix of sequential values, or a matrix of random
numbers.
Armed with this collection, I'm able to attack problems in much the way I
was accustomed, even though working in a spreadsheet. It gives me many more
extensions that work on spreadsheet ranges (matrices) instead of working
with cell equations.
So What Can You Do With Custom Scripts?
As the 3D plot illustrated, I was able to add some interesting graphics
ability to the spreadsheet. At the time of this writing, Google Sheets has no
contour plotting function either. But using the spreadsheets Custom
Formatting feature, I'm able to set up a color scheme for a 40x40 grid area
of the spreadsheet such that the colors used for the cells depended upon what
value the cells contain. I created a function that will take any array,
stretch or compress it to a 40x40 cell array, normalize it to match the values
in the custom formatting table, then return the values to the custom formatted
area of the spreadsheet.
The result? color contour maps, like the one
below:
Color Contour Plot
Not high resolution, for sure. But a usable color contour that when combined
with the 3D plotting feature, adds the ability to do some interesting work
outside of the more common financial exploration commonly pursued with Google
Sheets. As a matter of fact, the 3D terrain plot, shown again below, is a view
of the region presented in the contour plot, viewed from the left (or west)
side. Notice the low area (blue in the contour plot) to the left foreground in
the 3d plot, and the mountain peaks (white in the contour plot) to the
right.
Terrain 3D Plot
When all of that worked well, I added additional features, including
interpolation, derivative, integration, FFT, PSD, and filter functions. A
pretty good starting point for attacking the time-series type problems I often
did with matrix languages.
As to speed, Google Sheets intrinsic functions are very fast, yielding
virtually instant results when operating over thousands of cells. The custom
functions are slower, but still return results in just a few seconds when
operating over, say, 5000 rows and half a dozen columns. In fact, the
requirement by Google is that scripts return in under 30 seconds, and I've had
few occasions when functions took that long.
I've read of scripts that are exceedingly slow, but invariably that
problem occurred when users referenced spreadsheet cells within loops, rather
than operating as my matrix utilities do, on entire matrices passed in bulk. My
utilities receive all data at the start, and return a completed matrix at the
end, doing all work on the bulk-received data. That works well.
Why Google Sheets Over A Matrix Language?
So why bother with a spreadsheet when I have access to matrix languages?
Actually, there are a few reasons. First, the Google Sheets with scripting
ability provides a serious tool for Chromebook users, who generally do not
have access to the faster matrix languages. Chromebooks already offer
great portability, long battery life, and quick boot up. With Google Sheets,
they offer mathematics and programming as well.
Second, Google Sheets is available to me on all of my computers, as well
as is my most recent work. I can start a project on a desk top, add a little
from my laptop when it's more convenient, and even do the work on my Chromebook.
The omnipresent access is pretty handy.
Third, though not as much for me, work on Google Sheets is easily shared or
available to groups. I can see that if I were working on a team project, or was
teaching some math technique to a class, it would be very handy.
Google Scripts Guidelines
If you decide to check out Google Sheets for its scripting feature, you
might find some of the following guidelines helpful.
Work on bulk data
As the illustrated code suggests, pass in entire sections (ranges) of
the spreadsheet, and work through the elements of those. Loading values takes
time, but if there's only one initial load (arguments passed in) and one
return, then the code works much faster.
Mind the inner loops
Most of the work in a routine is done on the inner loops. Making inner
loops most efficient will pay dividends. As an illustration, I created
a function that lets me pass in an equation as a string, plus up to three
matrices that can be used in the equation. I find it easier to work with
than the aforementioned ARRAYFORMULA (though ARRAYFUNCTION operates
faster). My function could be called thus:
=solve(x sin sqr y cos sqr + sqrt
,a1:a100,b1:b100)
This equation takes the sine of each cell in the a
array, squares it,
takes the cosine of each corresponding element of the b
array, squares
it, adds them together, and takes the square root of the result. It will do
this through the range of the input arrays, and return a solution array of the
size of the input arrays. I find it easier to pass the arrays in, and then be
able to reference them with x, y, and z (if all three arrays are passed), than
using range nomenclature with ARRAYFORMULA. You may have noticed that
solve is a reverse polish (RPN) solver, which is much easier to
implement.
When I first created solve, I ended up with a lengthy switch
statement to handle all of the possible equation function references, and that
being referenced on the inner loop. The routine has 3 nested loops. Looping
over rows, then columns , then applying each equation element to the
cells in the columns. The delay caused by referencing the large switch
statement for each operation on each cell was intolerable.
Javascript, being the strange animal that it is, allowed me to move my big
switch statement up before the loops that were applying the equation, wherein
the switch statement is looped only once through the equation elements, with
the respective case
statments storing an anonymous function for each of
the specified equation elements into a function array. Each element of the new
array is then a function to do the referenced equation operation, like +
for add, sqrt
for square root. Sort of a poor man's compile.
Now the inner loop, where the equation is applied, consists of only running
functions in an array, no if's or switches. The looping portion of the code is
shown below:
for(i=0 ; i<nr ; ++i){
for(j=0 ; j<nc ; ++j){
for(k=0 ; k<nops ; ++k){
f[k]();
}
m[i][j] = st.pop();
}
}
|
In the code snippet the functions in the f
array work with a data
array named st (used as an RPN stack), The first loop goes through the
nr
rows, the next loop the nc
columns, and the inner loop through
the nops
number of operations specified in the equation. The use of
anonymous functions stored in the f
array makes the inner loop run quite
fast. After execution, the m
array, containing the results left on the
st stack after each set of operations is performed, is returned.
Library Caveats
Google has published guidelines for writing effective and efficient
custom functions. One thing they caution against is the use of libraries.
This is primarily a caution for Cell
oriented functions. If for
every cell in a large array, a call is made to a function that must
reference a library function, the delays can become significant and frustrating.
However, I've found that there is little effect with the functions I've
written, in that they all work with bulk-passed arrays, and tend to be called
only once to operate on hundreds to thousands of cells. With this technique,
the overhead of calling a library function is swamped by the moving of the
bulk data to and from the function.
In fact, as of this writing, I've found that the primary delay in using
custom functions is the movement of the bulk data. Replacing the matrix return
statement in a complex custom function with a return of a Done
message
reveals that the functions actually run very fast. The time delay is primarily
the return of bulk data from the Google server to the open spreadsheet.
Reducing Scripting Delays
So a sub-optimal approach when using scripts would be to use several scripts
whose inputs are dependent on one another, causing the delays to add up.
Script 2 may have to wait while the data transfers to/from servers finish for
script 1, then script 3 waits for the delays from script 2, etc.
To get around this, write an uber script that is passed data and then
internally runs the other scripts, returning all desired results in a final
transfer. This way there is only one to
and one from
transfer
delay between spreadsheet and servers, and little if any delay between
script calls within the uber script.
Clear as mud? Here's an example. I was checking out the function of an
FFT PSD routine and an autocorrelation PSD routine, both of which return
a Power Spectral Density computation from input data. For the data, I was
running a script that produced uniform random numbers passed through a simple
filter, the filter being used to control the frequency distribution of the
random sequence. Below is a graph of what I was examining. The noisy (blue)
line is the FFT PSD results, and the smooth (red) line is the
autocorrelation PSD results.
Initially, I had a script that produced the random numbers, a script that
applied the filter, a script that did the FFT PSD on the filtered output, and
a script that did the autocorrelation PSD on the filtered output. The filter
script had to wait for the random numbers, and the PSD scripts had to wait
for the filter output. So there were three data movement delays if I
regenerated the random numbers, or at best two delays if I re-used the
same random sequence but re-ran the filter script with a different
coefficient.
To speed the process up, I rewrote the filter using Google Sheets intrinsic
equations, so that the filter process happened almost immediately when the
filter coefficient was user modified. So the only pause was the two PSD
scripts, which ran in tandem since they didn't depend upon one another. The
overall time for an analysis run was cut in half.
But just as fast was an uber script that was passed the number of points I
wanted, and the filter value. The uber script internally called a random number
script, a filter script, and the two PSD scripts, returning the PSD data as
two columns. This script was as fast as the version re-using the same random
sequence and doing filtering with intrinsic equations. So more work could be
done, generating random numbers each time, yet run just as fast as long as only
one data transfer between servers and the spreadsheet was required.
So a general rule of thumb is: combine dependent scripts into an uber script
so they are executed in sequence within the uber script rather than waiting on
one another in the spreadsheet. Another point is that summary type scripts work
even better, in that the amount of returned data is less. Larger data returns
cause bigger delays.
Summary
If you have a Chromebook and have the itch to write some code, but
aren't into web page making -- check out the scripting utility of Google
Sheets. You get all the fun of coding, but none of the headaches of creating
web pages. You can let the spreadsheet handle all of the display issues and
file handling -- you get to have fun just creating solutions. As a plus,
you'll have access to your projects from virtually any computer, anywhere.