Systemic Errors

There has been quite a dramatic development of late that, in my opinion, delivers the final proof that Horizon on Line is flawed and does in fact contain software errors that result in unexplained losses to SubPostmasters.   More of that in detail in the days and weeks to come.

First though I need to settle for once and all an argument, discussion, call it what you will, about what constitutes a SYSTEMIC ERROR in a computing system.

The definition from the Law Dictionary

What is SYSTEMIC ERROR?

Error affecting all items comprising the group in a similar manner and magnitude. They are caused by a flaw in the system and occur in the same direction and don’t cancel each other eat. Also known as constant error.

Law Dictionary: What is SYSTEMIC ERROR? definition of SYSTEMIC ERROR (Black’s Law Dictionary)

Using this definition it is easy to see why some people like Paula Vennells and George Thomson of the NFSP interpret the term ‘systemic’ as one that ONLY affects the whole group and not just part of the group.   Paula does at least admit that from time to time Horizon will throw up a ‘systemic’ error but that they are identified and fixed.   She relies totally on the misguided belief that because the system handles millions of transactions successfully everyday there is not and cannot be an error in the system that they do not know about that could be classed as systemic.

What annoys me more than anything is her refusal to listen and consider that her interpretation is wrong.   That systemic errors do exist in the system at all times; the only difference is that they do not manifest themselves in the way she and her team can readily categorise as systemic.

Systemic errors can and do show themselves in different forms and can be split into subcategories.  One such category is an Intermittent Error.   It manifests itself infrequently in a random pattern across the network.   It affects only one or a few of the nodes in the network at a time.  The affected nodes are not necessarily part of a subgroup of the network and the only common group they belong to is the whole network.   That is important with reference to the definition of a systemic error above.

All nodes within the Post Office Network use (as far as I know) the same version of the Horizon Online Computer Program.  Any software bug in the system is therefore present at each node.  It is only when the bug is triggered and the effect of the bug becomes apparent can it be recorded as an error in the system.

So how could software errors be classified as intermittent?   Well the simplest example is an unexpected sequence of events that the programmer of the system failed to take into account when writing the software and the testers of the system failed to take into account when testing what he had written.   As a former programmer I can attest to the unbelievable variety of ‘unexpected sequence of events’ that can occur and you have to ‘try’ to deal with them all.

Hardware issues and communication problems can give rise to an unexpected sequence of events as can an unexpected sequence of keystrokes.

Finding the problem…

An error that manifests itself across a wide number of nodes within a particular group on a regular basis can easily be identified.   As an example the current debacle over the new Free Lucky Dip lottery prize is of course limited to only those branches that offer lottery.

The error comes to the attention of POL by, and only by, the calls to the Help Desk.   Several calls in rapid succession will alert POL and as the problem is easily identified they can take remedial action straight away – perhaps by issuing a notice to all branches telling them they are aware of the problem and it will be fixed in due course.

An intermittent problem is harder to identify.  It happens infrequently.  It happens to what appears to be a random selection of branches.   Most importantly the Help Desk is beholden to the Subpostmaster to report the error in a way that can be categorised and that the description of the problem is consistent with other descriptions of the same error from other SPMRs.   Without that happening it is extremely difficult for the Help Desk to analyse logged calls over time (and that time could be measured in years) to see if a pattern is emerging.

If the intermittent error is eventually discovered by analysis of Help Desk Logs over a period of time then it is almost impossible to return to the scene of the original incidents and ask for supporting evidence or a more detailed description of the event.

Fixing the Problem ….

Fixing a simple problem/bug/systemic error such as the Lottery Debacle is easy.  The problem is identified and most importantly it can be replicated in a test environment.

Replication of the error is vital.  If you cannot replicate the original error then you can never be sure you have fixed it.

In an Intermittent Error scenario then you have to be able to replicate the unexpected sequence of events that occurred in the first instance.   This is almost impossible to do.  And if replication is impossible then fixing the problem with 100% confidence is impossible too.

However you can work backwards.   If the cause cannot be determined you can work from the effect and build in protection to ensure that the erroneous effect of the problem is rectified before it causes problems elsewhere   You can also build in error trapping routines to try and notice the event occurring and record as much detail of what has caused the event or even stopping the program at that point to await Help Desk assistance.

…………………

I see you have read this far.  Thank you.  If you didn’t understand the term ‘systemic error’ before I hope you do now.  It is really important to do so.  The error that is currently under investigation by POL is probably the most devastating piece of evidence against this organisation to date.  I think it will highlight their incompetence and their misplaced arrogance.  I am extremely hopeful it will lead to closure on the JFSA cases.

Just to remind you these are a collection of 2 or 3 hundred SPMRs, many of whom have reported the same effect, in a random sequence over a period of years.  An intermittent error?   Which is as I hope I have shown above could also be a Systemic Error?

More to follow in due course …..

Advertisements

3 thoughts on “Systemic Errors

  1. In a previous life, I was the HO end of a bespoke distributed system, admittedly serving a mere 100 branches.

    Even so, designed into the system was an error reporting routine so that, by the simple expedient of pressing a button, WHEN (not if) errors occurred, the menu, sub menu and steps would be logged and reported.The much broader scope, and external links, in Horizon make those errors that much more likely.

    We even had in place a facility for users to make system improvements, such as make the Dangerous Goods prompt ONCE per session, not every parcel of 30, and if enough users made the same suggestion and we could see it improved their efficiency, we would make those improvements as time allowed.

    And this was in the 1980s, when programmers were wizards who could talk to God!!

    The total absence of error tracking or improvement routines once again highlights the arrogant, anachronistic and authoritarian organisation that is POL.

    Like

  2. Would that include then, the recent launch of the new Moneygram system, which not only crashed but also interferred with the AEI system and others, which had a failures last week.
    This weeks attempted relaunch, is being done in small blocks at any one time.
    Not reached us yet fortunately.
    Also the curious phenomenon of the Travel money card transactions, on Wednesday gone,which despite five or six attempts, gave a better rate for loading £500 than for £750.
    My customer was less than impressed.

    Like

  3. Actually to be fair to POL, while these are examples of systemic failure, they fall into the category of known bugs that will be rectified. Paula readily admits to these. The errors I am referring to are latent errors within the system that have yet to be identified and resolved. Paula insists that these do not exist. You need to be incredibly naive, stupid or both to think that. Not personal attributes you would consider appropriate for someone in charge of such a large organisation. Being in charge of the NFSP is a different matter altogether.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s