Close

Why open legal data and analytics are not without risks

BY Caroline Calomme (@CarolineCalomme) - 18 February 2020

Data analytics are increasingly being used to extract valuable insights from court decisions. However, this is not without risks because detailed analyses can also help building profiles of judges or compiling information on the parties. This blog encourages data protection experts to become curious about the data that they are the most familiar with: legal data.

Legal analytics

We are currently busy drafting regulations and guidelines for novel data processing practices – and rightly so – but, paradoxically, we tend to forget that the legal field itself is also very rich in data.

Think about every law and every court case ever published at the local, national and international level. Since a lot of data is the primary ingredient for a functioning AI application, legal data is actually a perfect match. Analyses of court documents in particular, whether relying on AI or more traditional statistical methods, can reveal useful insights, such as the expected outcome of a case, the chances of success or the evolution of sentencing over time.

Beyond assisting judges, legal analytics can bring valuable answers to academics (e.g. Euthority, our very own KUL project) and it can also be profitable for attorneys looking for a winning strategy: where and when to sue, in front of which judge and which arguments to use. Basically, forum shopping at its worst.

A famous example from Israel showed that hungry judges are likely to render harsher judgments. This was done by comparing the time of the rulings. In countries where the names of judges are published together with the judgment, we could predict accurately which judges give the longest sentences or which ones are the most likely to be influenced by how hungry they are.

The downside of such practices could be disastrous for a judge’s career if the data demonstrate arbitrariness and perhaps also for the State in liability proceedings for judicial decisions. Judge profiling could even be worse if combined with personal data publicly available.

Probably in an attempt to prevent this, the French Law 2019-222 on justice reform introduced in March 2019 a prohibition to further process the judges’ names, even though they are public for policy reasons.  You now risk a five-year prison sentence if you analyze (French) court judgments to predict how a judge is likely to rule in the future. The same applies if you simply evaluate, analyze or compare rulings based on the judges’ identity.

Open legal data

Whether your goal is laudable or questionable, there is nothing to analyze if you cannot find these precious legal data.

Making case-law public, and to some extent the name of the judges behind the decisions, is key for the transparency and accountability of the judicial branch. Considering that the texts are of limited use to the general public which does not speak legalese, it is a somewhat symbolic gesture but a necessary one nonetheless.

According to a 2016 Council of Europe survey, not all European jurisdictions publish judicial decisions (e.g. Ireland or Poland), although the majority do. Considering that the decisions are information from the public sector, it is debatable whether this in line with the EU guidelines on open data. Some only publish judgments of particular interest to the general public, such as the decisions from the highest court, while others publish all judgments except when there is a valid reason not to do so.

Yet, even if decisions are open by default, “open” can mean many things. In practice, open legal data is often implemented as public access to legal information instead of access to this information in a database format, i.e. open data. Concretely, this means that you can consult a court decision but, if you want to analyze thousands of them, you would need to retrieve the information manually.

So, in reality, not every jurisdiction has a machine-readable court database that makes potentially damaging analyses (easily) possible. Realistically, only the most patient organizations would make the effort to structure a massive quantity of data to carry out more sophisticated analyses. And let’s be very clear: insights highly beneficial to the society as a whole can also be derived from the same data.

Hands off my legal data

Instead of removing the names of the judges, the French law recognizes in a roundabout way the imperative public interest in publishing them and attempts to protect the judiciary in another way. However, the balance is struck differently when this interest is weighted against the fundamental rights of the parties, witnesses or other individuals who are uniquely identifiable in court documents.

Known to many, the “Costeja case” is ironically an excellent illustration of this issue since Google, the defendant, used the name of the plaintiff who tried to enforce his right to be forgotten as the unofficial title of the case.

Despite not mentioning their names, individuals often remain indirectly identifiable because of addresses, tax numbers, salient facts or other personal information that are part of the case. Thus the publication of judgments ‘as is’ can be a severe intrusion of privacy, especially in criminal law and family law cases.

Even in less sensitive matters, the 1998 Green Paper on public sector information already warned about the risk of case-law databases to turn into “information files on individuals”. The Commission was right on point because until recently the credit score of Americans could drop dramatically if they ever lost a court case, including a small claim dispute about a few dollars.

Legal provisions requiring to remove identifiable information do exist but translating this requirement in practice is no small task. In Belgium, the law of 5 May 2019 introduced an electronic database of court decisions which does not include information that allows to identify individuals directly or disproportionately affects their rights. The database should be in place by September 2020, as long as the courts succeed in putting in place an effective (automatic?) pseudonymization system.

Case study for data protection

How can we balance the public interest in holding the judiciary accountable and in having access to legal information with the fundamental rights of judges and litigants? Can criminal charges be used as a safeguard to discourage further processing of personal data? Does privacy by design require court databases not to be machine-readable? This short introduction to legal analytics shows that legal technology raises pertinent questions for the data protection community whose work could be of tremendous support to design open legal data policies and systems for the courts as they embark in the digitalization process.

This article gives the views of the author(s), and does not represent the position of CiTiP, nor of the University of Leuven.
ABOUT THE AUTHOR — Caroline Calomme @CarolineCalomme

Caroline Calomme is a part-time researcher at KU Leuven CiTiP – imec. Caroline is helping develop a vision on the introduction of legal technology and relevant digital skills in the law curriculum at KU Leuven. She also supports existing teaching activities in that area and assists with the organization of new initiatives towards educating the ‘digital lawyer’.

View all posts by Caroline Calomme

Comments

blog comments powered by Disqus