Linux Goodies

In Pursuit Of The Perfect O/S


HOME

ASTRO SCI FI CALC MENU

A Review of The Incredible Puppy Linux System

Amazon Computers

Google Sheets Review

Sinx over x 3d plot with Google Sheets

Sin(x)/x 3D Plot

The above Mexican Hat plot looks familiar, right? It's the sin(x)/x function that occurs in mathematics on occasion. The 3D version is always eye-catching.

So what's special about this particular plot?

It was made with Google Sheets, the freely available spreadsheet offered by Google. If you didn't think Google Sheets had 3D plotting ability, you'd be right, at the time of this article anyway. But Google Sheets does have a considerable number of plotting functions, and one will plot all rows or columns of a selected area of the spreadsheet. To get hidden line 3D plots, I had only to make a small custom function that makes a special matrix out of one I want to see in 3D, with the special matrix having the hidden line feature. Then I'm able to use the built in multi-row or column plot feature of Google Sheets on the special matrix to make the above rendition.

The default Google Sheets plot routine makes each line a different color, but it is possible to change the color of each line with the chart editing menu to whatever color desired -- one line at a time. In this case, instead I used Mtpaint to make the lines the color I wanted.

Ok, so a contrived function can make a 3D plot of an arcane mathematical function. If that's what you might be thinking, think again. The routine takes any topographic (matrix of elevations or values), and makes a corresponding 3D hidden line matrix ready for plotting. Below is a 3D rendering of a section of mountainous terrain:

Terrain 3d plot with Google Sheets

Terrain 3D Plot

Google Sheets Review On A Linux Site?

Okay, but what's a Google Sheet article doing on a Linux site anyway?

I will endeavor to explain.

Recently I purchased a Chromebook. Prior to purchase, I'd studied Chromixium enough to know pretty much what the Chromebook would do for me, so I bit. I reviewed both Chromixium and the Chromebook on this site. Chromixium is based on a minimal Ubuntu Linux operating system, and likewise the Chromebook OS is built on a similar minimal Linux system. So they fit well in the subject matter of this site.

I began to explore Google Sheets as a way of stretching the limits of what I could accomplish with the Chromebook. While I can't do a lot of programming on the Chromebook directly, as it supports only web-based programming with Javascript, CSS, and HTML, I can make heavy use of the Google Office products. And Google Sheets gives me the ability to write custom functions, even libraries of functions.

So by way of Google Sheets, I found I have the ability to do the kind of programming I'm used to even with my Chromebook, creating mathematical and data processing tools. I can do that by making extensive use of the Google Sheets scripting tools. I'll cover information about Google Sheets scripts in this article, but many more of Google Sheets secrets are revealed in Google Sheets Reference and Cheat Sheet: The unofficial cheat sheet reference for Google's free online spreadsheet application.

I figure that Google Sheets, with its scripting capability, fits well in this site just as the reviews of many other mathematical, Matrix Languages available to Linux users. The table below lists matrix languages reviewed here:

Euler Math Toolbox Review
Octave Language Review
PDL Language Review
R Language Review
Scilab Language Review
Tela Language Review
Yorick Language Review

Granted, a spreadsheet isn't quite like these languages, but it shares many characteristics, is available to Linux and Chromebook users, and many people primarily use a spreadsheet to do the things I and others tend to do with matrix languages. Since Google Sheets is a primary tool available to me on a Chromebook, I decided to see to what degree I could accomplish what I was used to doing with the matrix languages in the above table.

Google Sheets For Number Crunching?

Starting out, Google Sheets offers a lot. As the table below illustrates, at the time of this writing Google Sheets has 350 intrinsic functions, divided over the different categories as listed in the table.

CategoryNumber
Array15
Database 12
Date 23
Engineering 13
Filter 3
Financial 44
Google 15
Info 17
Logical 7
Lookup 15
Math 60
Operator 15
Parser 6
Statistical 73
Text 32
Grand Total 350

In the Financial, Math, and Statistical areas, users have a lot of choices. But I primarily work in the matrix area, and that part is a bit weak, compared to the matrix languages I'm familiar with.

In addition to a generous collection of intrinsic functions, Google Sheets has additional features that extend the spreadsheets' capabilities. It has, for example, pivot tables, which is a feature that provides powerful ways to build summary sheets from large data sets. Plus, I can publish sheets, such as this one that presents a collection of Astronomical Tables that are popular with amateur astronomers. Included are data and summary sheets for the Messier, Caldwell, and Herschel 400 astronomical target lists.

The pivot tables feature was used to create the summary pages for the different star object collections. Each summary is available as a tab at the top of the main sheet, and each shows how many of each type of star object are listed in the collection. Publishing it as a web page was just a click under the spreadsheet File menu. This is just a very limited example of what can be accomplished with the pivot tables feature.

Also available in Google Sheets is a powerful custom formatting tool that lets users set color, font, and other formatting aspects on a cell or array of cellls based on a wide range of criteria. Plus, I can apply names to ranges of the spread sheet for a much easier way to reference ranges in equations and functions.

Like most spreadsheets, Google Sheets is primarily cell oriented. One enters equations into cells, and then copies them to all other cells where the same equation is desired. For example: the equation =sin(A1)*10 would automatically become =sin(B1)*10 if copied to the cell to the right, or =sin(A2)*10 if copied to the cell beneath the initial equation cell. By thus entering equations over ranges of a spreadsheet, matrix operations can be accomplished.

Matrix languages tend to be, as the name implies, matrix oriented. A single command is used to specify an operation over a range of matrix elements. A matrix may be named X for example, and the equation: Y = sin(X)*10 would automatically apply the equation to every element of the X matrix, storing the result in the Y matrix.

So matrix languages tend to be more terse, and equations treat entire matrices as algebraic objects instead of each element (or spreadsheet cell) as an object.

In Google Sheets' defense, however, I must mention that in the minimal collection of array operators available, there is one called ARRAYFORMULA, which allows a user to write an equation that refers to a range instead of an individual cell. So if I had, say, 100 cells in the range A1 to A100, the ARRAYFORMULA operator could be used as follows: =ARRAYFORMULA(sin(A1:A100)*10), which would propagate 100 values in a column, starting at the ARRAYFORMULA cell.

This operator is very handy for us old matrix language users, in that it lets us express a range solution in a single equation. Very handy if I'm tweaking parameters or trying different equations. Combining it with the Google Sheets feature of letting the user apply names to ranges lets me express equations similar to that of matrix languages. If, for example, the range A1:A100 is named X using the range naming feature (Named Ranges under the Data menu), the new formula would become ARRAYFORMULA(sin(X)*10).

Google Sheets includes matrix multiply, transpose, and inverse functions in its collection, so many matrix solutions can be obtained by using these operators and the ARRAYFORMULA range operator. But I found these not to be enough like the approach I generally took to solve problems with matrix languages. So I explored the Google Sheets scripting feature. I found it worked well, and for the matrix solutions I sought, was very easy to implement.

Extend Google Sheets With Scripts

Along the top menu bar of a Google Sheets spreadsheet is an entry called Tools. Clicking that reveals a number of utilities, one of which is labeled Script Editor. That utility will open another tab to an editing screen where I can create scripts tied to the spreadsheet from which it was spawned. A typical function is already defined, to give one a start. The scripts are written in Javascript, though rather than being executed on the client computer, they are executed on Google servers.

The new (as of 2014 or so) version of Google Sheets makes passing range expressions, cell values, and fixed parameters easy for script users. Individual cells and fixed parameters are passed to script functions much like one is used to when writing functions in any language. Ranges are simply passed as matrices, which Javascript can handle easily.

For example, if I wanted a utility to scale a range of values by some constant, the code could look like this:

function msca(m1,sf){
  var nr = m1.length;
  var nc = m1[0].length;
  var m = new Array(nr);
  var i;
  var j;
 
  for(i=0 ; i<nr ; ++i){
    m[i] = new Array(nc);
    for (j=0 ; j<nc ; ++j){
      m[i][j] = m1[i][j] * sf;
    }
  }
 
  return m;
}

As you can see from the coding example, a script looks similar to any routine you've likely written in many languages like FORTRAN or C, and received parameters are simple and easy to work with. The routine can be passed a range of any size, as it can determine the size before operating. The nr or number of rows value is the length of the matrix, and the nc or number of columns values is the length of the first row (m1[0]). I can refer to the new function in my spreadsheet as follows: =msca(A1:C200,20) to scale a range by the factor of 20, as an example. The constant 20 could be a value in a cell, say D1, and called thus: =msca(A1:C200,D1).

The cool thing about user script functions is that they can be called by other user script functions, not just referenced within the spreadsheet. So I can write pretty complex operations, broken up into a number of easy to read and maintain functions. Or, as I've done, create a fairly extensive library of extensions (or multiple libraries).

One Script Caveat, Scripts Can't Reference Intrinsic Functions

There is a caveat, however. One can not reference Google Sheet intrinsic functions, like the built-in matrix routines, from within a user function. There are a number of interface functions available to script writers to make use of other Google apps, however, like the ability to get data from Google Analytics or Google Adsense.

So, while there is a matrix inverse routine as part of the intrinsic spreadsheet array functions, it is not callable from within a user defined script. I had to write one to be able to call it from within my other scripts. Fortunately, in this age of the Internet, it's usually not a long search to see how to write any specific routine.

How To Create Script Libraries

I found it easy to create custom scripts with the script editor, but I wanted to create some general scripts that I might wish to use in many spreadsheets. I didn't want to have to cut and paste scripts from one spreadsheet to another, I wanted to be able to reference a library.

Sure enough, it's possible to do that. From within the Script Editor under the File menu, I can select Manage Versions, which lets me save a script as a library. This library can be used by other sheets.

To use a saved library in a new spreadsheet, I need the Project Key for the library. This key is found under the Files, Project Properties click in the script editor used to create the library. Project Properties opens a small window with information about the particular script, including a rather long sequence of characters next to the label: Project key.

In a spreadsheet needing access to a saved library, I select the Resources menu from the spreadsheet's script editor window. From the list display, I select Libraries. This brings up a window that allows me to enter Project keys from libraries that are desired. This only has to be done once, because the library selections are saved with the spreadsheet. If I revise a library from the script editor associated with the spreadsheet where I developed the code, I might need to visit spreadsheets using the library, select from the respective script editors Resources selection, and select the latest revision of the revised library.

To actually be able to call a function that's in a library listed in Resources, I have to write a dummy function that references the library function, and then return the result. For example, I have a library I called matlib, and within it is a function named mrand. The mrand function returns an array of uniform random numbers. To use it, first I add the library Project Key to my Script Editor Resources-Libraries menu. Then in my script editor, I write a small function that calls the library mrand function:

    function myrand(nr,nc){
      var m = matlib.mrand(nr,nc);
      return m;
    }

The local spreadsheet defined myrand function references the library version matlib.mrand and returns the result. The dummy function may use the same name as the library function if desired, mrand in this case. I must have such a local function for each library function I wish to reference from my spreadsheet.

Since I've already added the matlib key to my resources, I can further create any other functions that reference matlib functions. All matlib routines are at my disposal because the matlib key is in my spreadsheet resources.

Given how easy it was to write most routines and interface with the spreadsheet, I created a healthy library of matrix routines to mimic much of what I had previously done with matrix languages. I added routines to do the basics, including add, subtract, scale, scalar multiply, matrix multiply, inverse, and transpose matrices. Then I scanned through the matrix languages with which I'm familiar, and added routines to accomplish some of what was intrinsic in them, like matrix builders that will return an identity matrix, a matrix of some constant, a matrix of sequential values, or a matrix of random numbers.

Armed with this collection, I'm able to attack problems in much the way I was accustomed, even though working in a spreadsheet. It gives me many more extensions that work on spreadsheet ranges (matrices) instead of working with cell equations.

So What Can You Do With Custom Scripts?

As the 3D plot illustrated, I was able to add some interesting graphics ability to the spreadsheet. At the time of this writing, Google Sheets has no contour plotting function either. But using the spreadsheets Custom Formatting feature, I'm able to set up a color scheme for a 40x40 grid area of the spreadsheet such that the colors used for the cells depended upon what value the cells contain. I created a function that will take any array, stretch or compress it to a 40x40 cell array, normalize it to match the values in the custom formatting table, then return the values to the custom formatted area of the spreadsheet.

The result? color contour maps, like the one below:

Color Contour plot with Google Sheets

Color Contour Plot

Not high resolution, for sure. But a usable color contour that when combined with the 3D plotting feature, adds the ability to do some interesting work outside of the more common financial exploration commonly pursued with Google Sheets. As a matter of fact, the 3D terrain plot, shown again below, is a view of the region presented in the contour plot, viewed from the left (or west) side. Notice the low area (blue in the contour plot) to the left foreground in the 3d plot, and the mountain peaks (white in the contour plot) to the right.

Terrain 3d plot with Google Sheets

Terrain 3D Plot

When all of that worked well, I added additional features, including interpolation, derivative, integration, FFT, PSD, and filter functions. A pretty good starting point for attacking the time-series type problems I often did with matrix languages.

As to speed, Google Sheets intrinsic functions are very fast, yielding virtually instant results when operating over thousands of cells. The custom functions are slower, but still return results in just a few seconds when operating over, say, 5000 rows and half a dozen columns. In fact, the requirement by Google is that scripts return in under 30 seconds, and I've had few occasions when functions took that long.

I've read of scripts that are exceedingly slow, but invariably that problem occurred when users referenced spreadsheet cells within loops, rather than operating as my matrix utilities do, on entire matrices passed in bulk. My utilities receive all data at the start, and return a completed matrix at the end, doing all work on the bulk-received data. That works well.

Why Google Sheets Over A Matrix Language?

So why bother with a spreadsheet when I have access to matrix languages?

Actually, there are a few reasons. First, the Google Sheets with scripting ability provides a serious tool for Chromebook users, who generally do not have access to the faster matrix languages. Chromebooks already offer great portability, long battery life, and quick boot up. With Google Sheets, they offer mathematics and programming as well.

Second, Google Sheets is available to me on all of my computers, as well as is my most recent work. I can start a project on a desk top, add a little from my laptop when it's more convenient, and even do the work on my Chromebook. The omnipresent access is pretty handy.

Third, though not as much for me, work on Google Sheets is easily shared or available to groups. I can see that if I were working on a team project, or was teaching some math technique to a class, it would be very handy.

Google Scripts Guidelines

If you decide to check out Google Sheets for its scripting feature, you might find some of the following guidelines helpful.

Work on bulk data

As the illustrated code suggests, pass in entire sections (ranges) of the spreadsheet, and work through the elements of those. Loading values takes time, but if there's only one initial load (arguments passed in) and one return, then the code works much faster.

Mind the inner loops

Most of the work in a routine is done on the inner loops. Making inner loops most efficient will pay dividends. As an illustration, I created a function that lets me pass in an equation as a string, plus up to three matrices that can be used in the equation. I find it easier to work with than the aforementioned ARRAYFORMULA (though ARRAYFUNCTION operates faster). My function could be called thus:

=solve(x sin sqr y cos sqr + sqrt,a1:a100,b1:b100)

This equation takes the sine of each cell in the a array, squares it, takes the cosine of each corresponding element of the b array, squares it, adds them together, and takes the square root of the result. It will do this through the range of the input arrays, and return a solution array of the size of the input arrays. I find it easier to pass the arrays in, and then be able to reference them with x, y, and z (if all three arrays are passed), than using range nomenclature with ARRAYFORMULA. You may have noticed that solve is a reverse polish (RPN) solver, which is much easier to implement.

When I first created solve, I ended up with a lengthy switch statement to handle all of the possible equation function references, and that being referenced on the inner loop. The routine has 3 nested loops. Looping over rows, then columns , then applying each equation element to the cells in the columns. The delay caused by referencing the large switch statement for each operation on each cell was intolerable.

Javascript, being the strange animal that it is, allowed me to move my big switch statement up before the loops that were applying the equation, wherein the switch statement is looped only once through the equation elements, with the respective case statments storing an anonymous function for each of the specified equation elements into a function array. Each element of the new array is then a function to do the referenced equation operation, like + for add, sqrt for square root. Sort of a poor man's compile.

Now the inner loop, where the equation is applied, consists of only running functions in an array, no if's or switches. The looping portion of the code is shown below:

for(i=0 ; i<nr ; ++i){
  for(j=0 ; j<nc ; ++j){
    for(k=0 ; k<nops ; ++k){
      f[k]();
    }
    m[i][j] = st.pop();
  }
}

In the code snippet the functions in the f array work with a data array named st (used as an RPN stack), The first loop goes through the nr rows, the next loop the nc columns, and the inner loop through the nops number of operations specified in the equation. The use of anonymous functions stored in the f array makes the inner loop run quite fast. After execution, the m array, containing the results left on the st stack after each set of operations is performed, is returned.

Library Caveats

Google has published guidelines for writing effective and efficient custom functions. One thing they caution against is the use of libraries. This is primarily a caution for Cell oriented functions. If for every cell in a large array, a call is made to a function that must reference a library function, the delays can become significant and frustrating.

However, I've found that there is little effect with the functions I've written, in that they all work with bulk-passed arrays, and tend to be called only once to operate on hundreds to thousands of cells. With this technique, the overhead of calling a library function is swamped by the moving of the bulk data to and from the function.

In fact, as of this writing, I've found that the primary delay in using custom functions is the movement of the bulk data. Replacing the matrix return statement in a complex custom function with a return of a Done message reveals that the functions actually run very fast. The time delay is primarily the return of bulk data from the Google server to the open spreadsheet.

Reducing Scripting Delays

So a sub-optimal approach when using scripts would be to use several scripts whose inputs are dependent on one another, causing the delays to add up. Script 2 may have to wait while the data transfers to/from servers finish for script 1, then script 3 waits for the delays from script 2, etc.

To get around this, write an uber script that is passed data and then internally runs the other scripts, returning all desired results in a final transfer. This way there is only one to and one from transfer delay between spreadsheet and servers, and little if any delay between script calls within the uber script.

Clear as mud? Here's an example. I was checking out the function of an FFT PSD routine and an autocorrelation PSD routine, both of which return a Power Spectral Density computation from input data. For the data, I was running a script that produced uniform random numbers passed through a simple filter, the filter being used to control the frequency distribution of the random sequence. Below is a graph of what I was examining. The noisy (blue) line is the FFT PSD results, and the smooth (red) line is the autocorrelation PSD results.

FFT PSD and Yule-Walker PSD

Initially, I had a script that produced the random numbers, a script that applied the filter, a script that did the FFT PSD on the filtered output, and a script that did the autocorrelation PSD on the filtered output. The filter script had to wait for the random numbers, and the PSD scripts had to wait for the filter output. So there were three data movement delays if I regenerated the random numbers, or at best two delays if I re-used the same random sequence but re-ran the filter script with a different coefficient.

To speed the process up, I rewrote the filter using Google Sheets intrinsic equations, so that the filter process happened almost immediately when the filter coefficient was user modified. So the only pause was the two PSD scripts, which ran in tandem since they didn't depend upon one another. The overall time for an analysis run was cut in half.

But just as fast was an uber script that was passed the number of points I wanted, and the filter value. The uber script internally called a random number script, a filter script, and the two PSD scripts, returning the PSD data as two columns. This script was as fast as the version re-using the same random sequence and doing filtering with intrinsic equations. So more work could be done, generating random numbers each time, yet run just as fast as long as only one data transfer between servers and the spreadsheet was required.

So a general rule of thumb is: combine dependent scripts into an uber script so they are executed in sequence within the uber script rather than waiting on one another in the spreadsheet. Another point is that summary type scripts work even better, in that the amount of returned data is less. Larger data returns cause bigger delays.

Summary



If you have a Chromebook and have the itch to write some code, but aren't into web page making -- check out the scripting utility of Google Sheets. You get all the fun of coding, but none of the headaches of creating web pages. You can let the spreadsheet handle all of the display issues and file handling -- you get to have fun just creating solutions. As a plus, you'll have access to your projects from virtually any computer, anywhere.