There has been quite a dramatic development of late that, in my opinion, delivers the final proof that Horizon on Line is flawed and does in fact contain software errors that result in unexplained losses to SubPostmasters. More of that in detail in the days and weeks to come.
First though I need to settle for once and all an argument, discussion, call it what you will, about what constitutes a SYSTEMIC ERROR in a computing system.
The definition from the Law Dictionary
What is SYSTEMIC ERROR?
Error affecting all items comprising the group in a similar manner and magnitude. They are caused by a flaw in the system and occur in the same direction and don’t cancel each other eat. Also known as constant error.
Using this definition it is easy to see why some people like Paula Vennells and George Thomson of the NFSP interpret the term ‘systemic’ as one that ONLY affects the whole group and not just part of the group. Paula does at least admit that from time to time Horizon will throw up a ‘systemic’ error but that they are identified and fixed. She relies totally on the misguided belief that because the system handles millions of transactions successfully everyday there is not and cannot be an error in the system that they do not know about that could be classed as systemic.
What annoys me more than anything is her refusal to listen and consider that her interpretation is wrong. That systemic errors do exist in the system at all times; the only difference is that they do not manifest themselves in the way she and her team can readily categorise as systemic.
Systemic errors can and do show themselves in different forms and can be split into subcategories. One such category is an Intermittent Error. It manifests itself infrequently in a random pattern across the network. It affects only one or a few of the nodes in the network at a time. The affected nodes are not necessarily part of a subgroup of the network and the only common group they belong to is the whole network. That is important with reference to the definition of a systemic error above.
All nodes within the Post Office Network use (as far as I know) the same version of the Horizon Online Computer Program. Any software bug in the system is therefore present at each node. It is only when the bug is triggered and the effect of the bug becomes apparent can it be recorded as an error in the system.
So how could software errors be classified as intermittent? Well the simplest example is an unexpected sequence of events that the programmer of the system failed to take into account when writing the software and the testers of the system failed to take into account when testing what he had written. As a former programmer I can attest to the unbelievable variety of ‘unexpected sequence of events’ that can occur and you have to ‘try’ to deal with them all.
Hardware issues and communication problems can give rise to an unexpected sequence of events as can an unexpected sequence of keystrokes.
Finding the problem…
An error that manifests itself across a wide number of nodes within a particular group on a regular basis can easily be identified. As an example the current debacle over the new Free Lucky Dip lottery prize is of course limited to only those branches that offer lottery.
The error comes to the attention of POL by, and only by, the calls to the Help Desk. Several calls in rapid succession will alert POL and as the problem is easily identified they can take remedial action straight away – perhaps by issuing a notice to all branches telling them they are aware of the problem and it will be fixed in due course.
An intermittent problem is harder to identify. It happens infrequently. It happens to what appears to be a random selection of branches. Most importantly the Help Desk is beholden to the Subpostmaster to report the error in a way that can be categorised and that the description of the problem is consistent with other descriptions of the same error from other SPMRs. Without that happening it is extremely difficult for the Help Desk to analyse logged calls over time (and that time could be measured in years) to see if a pattern is emerging.
If the intermittent error is eventually discovered by analysis of Help Desk Logs over a period of time then it is almost impossible to return to the scene of the original incidents and ask for supporting evidence or a more detailed description of the event.
Fixing the Problem ….
Fixing a simple problem/bug/systemic error such as the Lottery Debacle is easy. The problem is identified and most importantly it can be replicated in a test environment.
Replication of the error is vital. If you cannot replicate the original error then you can never be sure you have fixed it.
In an Intermittent Error scenario then you have to be able to replicate the unexpected sequence of events that occurred in the first instance. This is almost impossible to do. And if replication is impossible then fixing the problem with 100% confidence is impossible too.
However you can work backwards. If the cause cannot be determined you can work from the effect and build in protection to ensure that the erroneous effect of the problem is rectified before it causes problems elsewhere You can also build in error trapping routines to try and notice the event occurring and record as much detail of what has caused the event or even stopping the program at that point to await Help Desk assistance.
I see you have read this far. Thank you. If you didn’t understand the term ‘systemic error’ before I hope you do now. It is really important to do so. The error that is currently under investigation by POL is probably the most devastating piece of evidence against this organisation to date. I think it will highlight their incompetence and their misplaced arrogance. I am extremely hopeful it will lead to closure on the JFSA cases.
Just to remind you these are a collection of 2 or 3 hundred SPMRs, many of whom have reported the same effect, in a random sequence over a period of years. An intermittent error? Which is as I hope I have shown above could also be a Systemic Error?
More to follow in due course …..