Eighth Circle

From time to time I must delve into the arcane arts in order to peer into the dark mists of the unknown and divine what things shall be. In other words, come up with forecasts. My posts might slack off, as crunching numbers all day tends to leave me wanting to just kick back and chill.

It also means I’m mindful that Dante put diviners in the Eighth Circle of Hell, in Bolga 4, conveniently located between the simonists and the crooked politicians. Niven and Pournelle updated this to include prognosticators of a more conventional nature. Other times I wonder if this could be classified as hard science fiction, since it’s based on extrapolations of data and not flights of fancy.

A vital part involves comparing past forecasts with reality to keep them accurate. In a way it’s like framing shots at a moving target on an early 20^th Century battleship. Each one gets you closer, but the target is moving as well and might be so rude as to change course. That’s the problem with any data model, whether it’s deriving an equation to match the data and extrapolating from that, or a fancier computer model for more complex computations. Either way, it’s all a model, and with any model you pick what you think is most important to simulate the data and make predictions. Some of our equations are derived this way, which leads to the classic physics joke, “Assume a spherical cow.”

Yes, even our laws of science and equations are models. We don’t think of them as such, but laws are theories behind how our world works that lend themselves to creating equations that model the data. The results agree with real data very well, so we know they’re accurate models, indeed.

Even so, you need enough data to use them correctly. Then you have to make sure your data is good and know the margin of error. Then you have to watch out for assuming greater precision than exists in the data. That’s very easy to do, especially in these days of computers and pocket calculators that can spit out streams of digits to the right of the decimal place. So, if you have data that’s good to plus or minus 0.1, and your calculations from the data are 0.768 561, you can forget about precision beyond 0.7, because it’s just not there. The best you can hope for is to round it to 0.8 and note the margin of error is plus or minus 0.1.

Sometime you just don’t have good data, but think you can estimate it. In this case you write down how you derived your data and why you think it’s valid, and make sure everyone knows it. That’s a good idea even for solid data you can get, such as “The data was obtained from thus-and-so, with an accuracy of such-and-such.”

If you have to roll your own model, mathematical or computer, you also have to write down your assumptions in what drives it, by how much, what factors you’re using and why. Not only does this help someone double-check the model, but it helps you check it when you have real data to compare it with, because usually your model will be off a little. The more complicated the system you’re modeling; the more likely you’ll have to revise it.

You even need to double check proven models, such as equations and physical laws. In the sciences, it’s interesting when the model doesn’t match the data, because this means something else is going on, and that’s where discoveries are made.

Remember, though, the dictum of Alfred Korzybski: “The map is not the territory.” We could say the model is not the thing it represents. You might be going “Duh!” but it’s surprisingly easy to become so attached to a model, particularly your own, that you forget that it’s a representation of reality, and not reality itself. You even see this in engineering, but there it tends not to last, for in engineering bad things can happen – and usually does – if the model is not valid. That’s why there’s testing to make sure something wasn’t missed or that the assumptions were valid. In the sciences though … well, things can get contentious. More than once a faction will cling to a model even though it does not agree with reality. A classic is the phlogiston theory vs oxidation. Priestly never really gave up on the phlogiston theory, even after it was disproved by Lavoisier. Such is the emotional attachment one can have for models.

More sinister is the type of mentality where one realizes the model is not valid and tries to hide it. Mostly this is due to emotional investment. At worse it’s due to desire to profit by it. Either way, scientists have been caught fudging data to prop up a thesis, and one study found fraud growing in scientific papers. Nor is this a problem only in science. It can occur in other disciplines as well, including engineering, Which is why if someone isn’t willing to share their data or assumptions, that should throw up big warning flags. Maybe it’s just ego (that happens), or maybe it’s something else.

Either way, any modeling should be done with the utmost honesty, and with meticulous documentation. Any shame from being shown to be wrong is minor compared to the disgrace of being found a fraud. Not to mention the damage done to those who trusted in a bad model.

Dante’s The Divine Comedy was a work of fiction, but he did have a point. Something all of us involved in forecasts need to remember.