|
![](/i/fill.gif) |
On 23/02/2013 04:17 PM, Orchid Win7 v1 wrote:
> http://research.microsoft.com/en-us/um/people/simonpj/Papers/excel/excel.pdf
For those unable to read the paper, a short summary...
The number of people who use spreadsheets dwarfs the number of
professional computer programmers. And yet, some of these spreadsheets
are extremely complex, sophisticated constructions. In the main, these
are built not by trained programmers, but by accountants, salesmen,
purchasing associates, and anybody else who has a need to do complicated
numerical calculations.
A complex spreadsheet can be considered to be a "program", the formulas
in the cells being its "source code". And yet, accountants don't
generally sit down and learn to write code in C, C++, Java, Perl,
Python, R, or even something like Mathematica which is heavily
problem-oriented rather than machine-oriented. Why is that?
Arguably a spreadsheet is easier to comprehend. You type in a formula
and it makes a number. You can immediately *see* what is happening. In
traditional programming languages, you write a bunch of text, press go,
and maybe some kind of result is produced. It's far less visual, even if
you have a REPL to use. You don't literally *see* all the intermediate
calculation steps, unless you fire up a purpose-built debugger.
Which a spreadsheet, there is no separate coding and debugging phase.
You just have a grid of cells, which you can modify at will, at any
time, for any reason. This makes it easier to gradually construct a
sophisticated application from humble beginnings.
When you start to look at a spreadsheet as a program, you realise that
as a programming language, it's pretty damned weak. In particular, it is
impossible to define new data types, and it has impossible to define
subroutines. The *only* way to repeat a calculation on more than one
input is to literally copy and paste it. (And most spreadsheets have at
least moderately sophisticated mechanisms for doing intelligent copy and
pasting.)
As a trained computer programmer, this makes me gasp in horror. It is a
clear violation of the DRY principle. But as an accountant, or a
quantity surveyor, you're not going to know or care about such things.
You're just trying to produce this month's figures (or whatever).
There *is* of course a facility in Excel to define functions. However,
this requires you to learn Visual Basic, a completely different
programming language, with radically different semantics. It's a bit
like the way C lets you write macros, but in a totally different
language (CPP), whereas Lisp lets you write macros in Lisp.
So, the main thrust of the paper is "what if you could write Excel
functions in Excel?"
The mechanism they chose is to let you define a special "function
worksheet". Certain cells are the function's input, and a certain cell
is the function's output. Every time you call that function, notionally
a new copy of this function worksheet is dynamically created with the
appropriate cells filled in, and whatever ends up in the output cell is
copied back to the formula that mentions the function.
That's the gist of the idea. Naturally, if you edit one instance of the
function worksheet, the change affects *all* instances, automatically.
And therein lies the payoff.
Trouble is... Well, actually there's a whole *host* of problems. Problem
#1 is that your new function might be called twenty billion times.
You're not going to display twenty billion worksheet tabs! The solution
they came up with as that only one "instance" of each function sheet can
be visible at once.
Problem #2 is "how do we support recursion?" They chose the solution
"you don't". Which seems reasonable; how many people actually want to
use that feature? Do we really want to complicate the interface further
just to make those guys happy?
Problem #3 is how to support functions that take *ranges* of cells as
input. Their solution was to make a cell matrix a first-class value, so
that a single cell can contain an entire matrix. And since a matrix is
made up of cells, matrices can be nested arbitrarily deep.
In short, the solution is to UTTERLY BREAK THE SPREADSHEET MODEL! This
is the least convincing part of the paper for me. The cost of the user
learning this completely new mechanism seems too high compared to the
modest payoff of handling ranges of different sizes. I think a better
approach could be found.
So there we are. There's lots of screenshots, which gives the
*impression* that they had a working product. In 2003. But in the
conclusion they explain that these are simply mockups and simulations,
not a real, usable product. They make it clear that they are in
discussions with Microsoft about making this feature available in the
real commercial Excel product (Simon PJ works at Microsoft Research
Cambridge, after all). But, clearly, this never came to anything.
I think this could very easily be one of those features that nobody
knows about and nobody ever uses, unless you made it *extremely* simple
to comprehend and to use. For example, the vast majority of people who
use Word apparently have no idea what the hell a "mail merge" is, or
what Word has the capability to do such a thing. (Then again, most
people probably don't *need* it either...)
I think it would be very cool to define functions in Excel, but I doubt
Excel will be the one to pioneer this. I'm also not sure how much
anybody would use it...
Post a reply to this message
|
![](/i/fill.gif) |