Close

Using machines in law: Automated case-law summaries

BY Wessel Wijtvliet - 25 April 2019

The possibilities of automated textual analysis have grown in recent decades. In this contribution, I present the potential of a machine-driven technique to summarize case law. This serves as an example of an eclectic array of research techniques and methods my colleagues and I use in the interdisciplinary EUTHORITY project on conflict and cooperation between domestic and supranational courts in the EU legal system.

Academics and practitioners alike are increasingly aware that technological innovation is reaching the shores of law. Considering the relevance of text to the legal discipline, automated textual analysis is one of the techniques that holds promise for the study of law. In this blog, I attempt to summarize the ECtHR case Bouyid v Belgium automatically as an illustration of such analysis. I rely on the expertise of human rights specialists to validate the relevance of the outcome.

Visualizing co-occurrence of words

The first step of textual analysis requires filtering out words that do not convey meaningful information on the topic of a text. The software package UDpipe is a nifty tool to delete all “stop-words” such as ‘the’, ‘or’ and ‘having.’ Its application results in a cut down version of Bouyid that constitutes the basis for further analysis.

Words that frequently appear together could provide a first clue about the topic of the case. Figure 1 plots a network of nouns and adjectives that co-occur frequently within the range of a single sentence. The blue-dotted words (called ‘nodes’) and the thickness of the lines between them (called ‘edges’) visualize important co-occurring terms. Combinations like ‘human’ and ‘dignity’, ‘police’ and ‘officer’, and ‘use’ and ‘force’ pop out as especially relevant and give a first impression of Bouyid’s content.

Figure 1
Figure 2.

Key word extraction

A second way to shed further, though still tentative light on the content of a text is by key word extraction. The algorithm TextRank can plot a word-pair cloud displaying the most important terms that appear as a sequence of two words. The algorithm establishes links between words that follow one and another and weighs these links by frequency of occurrence. Google’s PageRank algorithm subsequently determines the importance of each word pair (I discuss PageRank in the next paragraph). Figure 2 depicts the most relevant combinations of words in Bouyid. Apparently, the case involves police officers, violence, degrading treatment and human dignity at a police station or by a police force linked to ‘Noode’. As will become clear, ‘Noode’ is a proper noun and refers to the city where the police operated, St.-Josse-ten-Noode.

Summarizing the case

TextRank does not just extract key words from a text. An additional feature ranks full sentences in order of importance. The algorithm searches for content overlap by counting the number of similar terms two sentences share and then applies PageRank to determine their respective centrality. A sentence that addresses a specific issue serves as a recommendation to refer to other sentences that are alike. The number of recommendations that a sentence receives indicates its significance. However, TextRank does not merely count textual overlap. It feeds this initial result into PageRank, which attributes more weight to recommendations received from important sentences. A simple example can illustrate this process. I could determine the popularity of a person by reference to the number of Facebook friends. The outcome of such a popularity contest does not solely depend on the bare number of friends. The centrality of these friends in the larger Facebook community matters as well. Accordingly, a friendship with a well-connected person boosts popularity more than accepting a friendship request from a digital hermit.

TextRank deems the ten sentences below as most central to the Bouyid case. The order of appearance in the original text determines sentence sequence. Sentences 1-5 lay out the facts of the case and the procedure followed. Sentences 6-8 cover the sources of the investigation and the argument of the Government. The last two sentences state both the procedural and substantive conclusion in the case. The algorithmic output conveys that the Court considers that two applicants, at least one of whom is a minor, were slapped by police officers in the police station of St.-Josse-ten-Noode (a community in Brussels – editor’s note) and that this treatment violated their dignity to such an extent that it breached article 3.

  1. Alleging, in particular, that they were both slapped by police officers while they were in a police station, the applicants complained of degrading treatment and argued that they were victims of a violation of Article 3.
  2. They added that on 24 June 1999 the first applicant, then aged 13, had been “beaten” by another police officer in the police station, where he had been taken following a fight in the street
  3. On 6 April 2001 and 12 July 2001 respectively, N. and the second applicant had been verbally abused by officers of the Saint-Josse-ten-Noode police force.
  4. February 2008 six members of the Bouyid family, including the two applicants, had filed a civil-party complaint with an investigating judge of the Brussels Court of First Instance concerning all their accusations against the Saint-Josse-ten-Noode police officers, in particular relating to facts that predated the events of 8 December 2003 and 23 February 2004.
  5. The applicants alleged that police officers had slapped them in the face while they were in the Saint-Josse-ten-Noode police station.
  6. The Chamber further noted that the Government had disputed the fact that the applicants had been slapped by police officers, and had submitted that the medical certificates provided did not establish that the injuries recorded had been caused by such slaps.
  7. Firstly, the investigation had been principally based on screening of the family’s behaviour, drawing on records prepared by the police station at which the officers of whom the applicants had complained were based.
  8. The Government emphasised that in the present case, although the applicants had submitted medical certificates attesting to injuries that might be compatible with the events of which they complained, it was only the applicants’ statements that suggested that those injuries were the consequence of a slap and that the slaps in question had been inflicted on both applicants by police officers.
  9. In conclusion, the slap administered to each of the applicants by the police officers while they were under their control in the Saint-Josse-ten-Noode police station did not correspond to recourse to physical force that had been made strictly necessary by their conduct, and thus diminished their dignity.
  10. The Court considers that the applicants’ allegations – as set out in the complaints lodged with the domestic authorities – that they were subjected to treatment breaching Article 3 of the Convention by officers at the Saint-Josse-ten-Noode police station were arguable.

Assessment of the results

According to the press release, the Court held “that the slap which the applicants had received from police officers while under their control at the police station had undermined their dignity and that there had been a violation of Article 3 (in that they had been subjected to degrading treatment).” The Court’s own summary thus corresponds to the automated output.

I further validate the results by consulting Koen Lemmens (Professor of Human Rights at KU Leuven) and Louise Reyntjens (Doctoral Candidate at KU Leuven and intimately familiar with Bouyid). Reyntjens confirms that the algorithm manages to capture the most important facts of the case, the arguments of the parties and the decision of the Court, though fails to include the legal reasoning behind the judgment. Lemmens expresses his admiration for the algorithm’s summarization capabilities. He qualifies the ten sentences as a good case indicator for journalistic purposes. Echoing the remark of Reyntjens, he also observes that the algorithm does not include the legally relevant considerations about the severity threshold of Article 3 ECHR.

The TextRank algorithm manages to single out many though not every legally relevant sentence of Bouyid. The lawmaking aspect of legal reasoning itself is a relatively short excerpt of the entire text, which may explain why the algorithm omits it from the result. Overall, automated textual analysis seems a promising technique to parse legal information. Its accuracy is bound to increase in the future. For example, when asked to find answers on Wikipedia to posed questions and formulate these in their own words, algorithms already perform slightly better than humans under specific benchmarks (link and link).

For questions and remarks regarding this contribution, please contact wessel.wijtvliet@kuleuven.be.

This article gives the views of the author(s), and does not represent the position of CiTiP, nor of the University of Leuven.
ABOUT THE AUTHOR — Wessel Wijtvliet

Dr. Wessel Wijtvliet from Leuven Centre for Legal Theory and Empirical Jurisprudence)

View all posts by Wessel Wijtvliet

Comments

blog comments powered by Disqus