Back to news and insights
Article

Disclosure: Lessons From the UK Post Office Horizon Inquiry

March 7, 2024

If you have been following the Post Office Horizon Inquiry closely, you may have picked up on several issues with the e-disclosure process that led to the failure to disclose relevant documents within an appropriate time period.  In this article we summarise three key issues that arose, which are likely to be relevant for most large-scale disclosure exercises.

1. Use of Search Terms

The first key issue related to the drafting of search terms.  

For example, in response to a request for a “Copy of [the Post Office’s] Investigations Policy (together with all iterations of the same since 1999 that are within [the Post Office’s] custody and control),” the search terms used were: ’Policy’ AND (‘Investigat*’ OR ‘Prosecut*’ OR ‘Whistle’). Unfortunately this search term failed to capture the intended documents.

The latter half of the search term appears well drafted, employing both synonyms and “wildcards” (i.e. a character used to substitute one or more characters in a search string) in order to capture a wide range of file name variations. The first half of the term, however, is narrower, not using either wildcards or synonyms, such as (Polic* or Guide* or Protocol* or Procedur* or SOP*). Similar omissions occurred in search terms for several other categories of documents described as “guidelines” or “guidance”. The lack of a wildcard such as “guid*” resulted in documents entitled “Guide” escaping review.    

It is quite common when teams prepare for document reviews that search terms are both over-emphasised and under-appreciated. Over-emphasised because many teams believe terms should be perfect and final before the review starts, when they should instead be treated as the starting point of an iterative process, subject to revision as the issues become clearer.  

They are also under-appreciated because, while they need not be perfect, they must operate as intended. Poorly formed search terms will often create errors due to faulty syntax. For example, search terms can be rendered substantively meaningless yet logically sound due to misplaced quotation marks or brackets. When this occurs, the term can return zero search results with no indication of an error.  

eDiscovery providers can tell which terms will be problematic due to complex arrangements of operators and nested brackets, and the more experienced among them will be sure to apply a sense check to those results.  

Issues related to noise words (e.g. “the”, “for”, “a”) and alphabet files (the list of characters, numbers or letters that are searchable in an index) are far more difficult to foresee and can easily go unnoticed without deploying a robust QC process.

2. De-duplication and Family Documents

A second key issue arose due to the method of de-duplication deployed to reduce high hit counts. In the case of the Post Office, this involved (i) applying de-duplication to individual documents or attachments and (ii) essentially ignoring the family members.  

Of course, de-duplication is a standard practice for any document review when applied to complete families. For example, when Email 001, non-responsive to a search term, with a responsive attachment ATTX is sent to three recipients, the data set will contain four copies of the same document family (one in the sender’s mailbox and one in each of the three recipients’ mailboxes). In such cases it is expected that only one copy of the Email 001 family will become part of the document review, as the other three copies contain identical information.  

However, when attachment ATTX is then attached to Email 002, together with other non-responsive attachments it becomes part of a new family. In the Post Office disclosure exercise one of Emails 001 and 002 would have been removed as ATTX was the only responsive document in both families. Although not responsive to the search terms it is possible that the other family documents now excluded may have been relevant, which is what occurred at the Post Office. Crucial documents were incorrectly removed from the document review and thus excluded from disclosure.

The appropriate response to high search term results is an iterative review methodology. Several tactics can be employed in tandem, but they generally involve identifying small sets of documents that fall into two broad categories: priority documents to review and problematic search results to analyse.

Techniques used to identify priority documents to review include running targeted searches (such as combining related search terms and short date ranges applicable to specific incidents) and calculating aggregate family-level hit counts in order to review families with the most hits. Either of these techniques will result in the early review of the documents most likely to be relevant, which allows teams to quickly form a clearer picture of both the issues and the broader data set. This clearer picture, in turn, provides an educated basis for further search term revisions, which will yield better results when identifying the next set of priority documents.

The second category, analysing problematic search results, can be accomplished most simply by reviewing a small random sample of documents that hit on only the terms with the highest hit counts. A more exact approach is to perform a false positive analysis of those same terms. For example, the search term (Smith w/2 Engagement) may return numerous emails related to the future marriage of Elaine Smith and John White. When such a false positive pattern is identified, the search term can be revised to ((Smith w/2 Engagement) AND NOT “John White”). The false positive approach has the additional benefit of reassuring your adversary and the court that relevant documents will not be missed as a result of the revision.  

3. Documentation and Accountability

The third key issue related to the communication between the internal and external teams undertaking the disclosure exercise.

Successful management of even the smallest document review requires detailed documentation and comprehensive data tracking, from collection to processing to review and disclosure – and the scale of the Post Office disclosure would certainly have required this.  

The scope of data subject to disclosure included searches of 230 physical locations and hundreds more digital repositories with a date range spanning more than two decades, resulting in more than 54 million documents stored in at least four databases.

The response to the Post Office enquiry related to the disclosure exercise described a failure to properly map the data landscape resulting in thousands of documents being disclosed mere days before scheduled witness testimony.  Differences of recollection between the teams also arose in relation to the instructions received regarding the approach to de-duplication and review of family members.    

The most successful disclosure projects are defined by close collaboration between the various teams involved in a disclosure exercise.

To that end, it is imperative that every decision is documented. This is true for many reasons. First, parties are often required to provide a detailed accounting of each step of the disclosure process once completed.  

Second, the larger, more complex, and longer a project runs, the more time will invariably be spent investigating discrepancies and anomalies. With proper documentation, fewer issues arise and those that do can be quickly traced to their source. Without documentation, small issues are liable to turn into serious problems, which in turn can lead to teams struggling to establish how, where, when and why a problem happened. Worse yet, problems that disrupt complex workflows tend to have a ripple effect which is amplified when teams are not organised and communicating fully with each other.

Perhaps the most pressing form of documentation is the tracking of data and any associated gaps. If gaps exist they can usually be found by regularly updating the status of each piece of data expected to be collected. Gaps within the data (for example, where a custodian has no data for a period of months or years) must be identified and remedied as soon as possible. If the collection is problematic today, re-collection will be exponentially more difficult after custodians have departed or laptops have been replaced. The classic example of this scenario is an iPhone password – make sure you have all passwords now, even if processing the data is only tentative, as finding them years later is often impossible. 

Conclusion

Responding to a request from an Inquiry panel requires meticulous planning and careful coordination by and between all teams. The same can be said of all document reviews, however. All disclosures, should employ the same basic strategy.  

The above discussion only provides a basic outline of that strategy: careful reasonable searches, followed by iterative rounds of review and revision, all guided by detailed planning and documentation.

Beyond these basic components, there are other “best practice” processes, beyond the scope of this article, which can greatly improve the efficiency of any review. These include email threading, repeated content filtering, textual near-duplicate identification, language identification, and automated translation.  

In addition, there exists a broad range of powerful, advanced tools now available, including AI in the form of machine learning, personal information recognition and redaction, concept clustering, and key phrase generation, which can by employed to great effect.

Any organisation facing a document review and disclosure – no matter the size – should ensure they have an early, detailed conversation with an experienced document review provider. Such conversations will invariably lead to increased collaboration and mutual understanding of a complicated process.  

No items found.
Article

Strategic Data Privacy Compliance for Litigation in the Gulf

April 24, 2024
Article

The Era of Legal Accountability

April 16, 2024
Article

Revisiting the case for a UK whistleblower reward programme

April 11, 2024
News

FRA debuts in GAR 100 Expert Witnesses 2024

April 11, 2024