Improving Legal Scholarship with Network-Based Search Tools

By Andrew Higgins*

I. Introduction

In recent decades, network-driven data analysis has been a source of major developments and insights in neuroscience,[1] sociology,[2] and information science,[3] just to name a few of the academic fields; these tools have also been used to develop precise product marketing initiatives, more appropriate recommendations on sites such as Pandora[4] and Amazon,[5] and efficient search algorithms such as Google’s PageRank.[6] Curiously, legal research is typically not especially network-based, despite the fact that network tools such as PageRank were inspired by tools in legal analysis (especially Lexis’ Shepard Citations).[7] It is a truism among legal scholars that statutes, enforcement, precedent and interpretation are all deeply interconnected.[8] The significant role of stare decisis in contemporary legal practice makes it all the more puzzling why legal scholarship tends to be conducted in a linear or modular form. The aim of this article is to encourage a more network-theoretic approach to the identification and interpretation of legal precedent that more appropriately fits the non-modular, network structure of law.

I begin by briefly reviewing the basic concepts and tools of network analysis. Following this introduction, I highlight an important shortcoming in the most common tools for legal scholarship, and some concrete steps that could be taken to improve the methods used by lawyers and legal scholars to represent and interpret legal precedent. In particular, I argue that services such as WestlawNext and Lexis Advance could be improved if users were given more resources for going beyond simple Boolean searches. If properly implemented into the user interfaces of these services, network representations of legal precedent could make the process of searching and drawing from legal precedent more efficient, both in terms of the time taken to conduct searches and the accuracy of the results. I conclude by noting some directions for future research.

II. Background

Networks have two components: objects and relations.[9] The objects are called nodes and the relations between those objects are edges or vertices.[10] In the network represented in Figure 1, the nodes are the numbered entities (1–10) and the edges are the lines connecting those entities. Not all edges are equal. If, for example, we represented a friendship network, it would be useful to distinguish between close friends and acquaintances. To track the strength of friendship ties, we could give distinct edge weights to each (e.g., two for close friends and one for acquaintances). In Figure 1, edge weight is represented by the color of the edge, with black edges representing strong ties and gray edges weak ties. If our representation of the network were sensitive to edge weight, 9 would be spatially closer to 8 than 10.

Figure 1. An example network with ten nodes and seventeen edges.

The creation of network representations usually involves attraction and repulsion between nodes.[11] Edges between nodes act as attracting forces, with the edge weight determining the strength of the attraction.[12] In order to preserve spatial distance between nodes, this attraction is countered by a general repulsive force between all nodes. To avoid unlimited repulsion between disconnected nodes, a gravitational force pulls all nodes to the center.

The most significant properties of nodes, for present purposes, are their relational properties. Degree, a basic relational property, is equal to the number of the node’s edges.[13] In Figure 1, node 2 has a degree of four because it is related to four other nodes. Degree is a limited measure because it only considers nodes in relation to their nearest neighbors and is insensitive to the significance of the connection.[14] In a trade network, for example, it would be important to know not just which countries trade with which, but also the quantity of goods traded. To track this information, we should consider weighted degree, which assigns distinct values to each edge based on the significance of that relation, but this information is still highly limited. In analyzing a criminal or terrorist network, for example, we can learn something from the fact that A communicated with B, but we learn far more about A if we also know that B worked with C, D, and E, where these are high level figures in the illicit organization.

To track such indirect connections, we also need a measure of network centrality. Various centrality algorithms are used for different purposes, but they share an important common feature: sensitivity to a node’s position in the network as a whole.[15] Here I mention just three. The first, betweenness centrality, is a measure of how often a node occurs in the shortest path between two other nodes.[16] Nodes with higher betweenness centrality are more likely to play an essential bridge role in connecting two otherwise separate groups of nodes. In Figure 1, node 8 has the highest betweenness centrality because 9 and 10 are only related to other nodes through 8. In a network of U.S. senators, with edges defined by voting records, centrist senators would have the highest betweenness centrality because they alone bridge the divide between Republican and Democrat voting blocks. Eigenvector centrality is a measure of the importance of a node in the network as measured by its connectedness to other nodes with high Eigenvector centrality.[17] This metric is similar to the third measure of centrality, Google’s PageRank metric for determining the relevance of websites in a search, which in turn is inspired by Shepardizing.[18] The PageRank for website W is determined by considering the number of other websites with links to W, with greater weight given to linking websites that are themselves frequently linked.[19] Above, node 7 has the highest Eigenvector centrality and PageRank because it has several connections with nodes that themselves have several connections. In a citations-based network, Eigenvector centrality is a measure for the relative centrality of an author to the discussion in their area of specialty.

For present purposes, we can think of individual court opinions as nodes in the network. The most significant edges in the network are citations to previous court opinions, but one could also conceptualize the legal precedent framework with edges indicating similarity of content, geographical regions, or time periods. Whatever data are chosen as the basic structure of the network, legal scholars could, as I argue below, benefit from a network-theoretic reconceptualization of the legal terrain.

III. Analysis

Online research tools such as WestlawNext[20] and Lexis Advance[21] already have limited network-based approaches, but these services could be substantially improved by extending the user’s ability to visualize and digest the interconnected network of cases constituting current legal precedent. In this section I present several ways that these services could be enhanced. Each of the suggested changes would be relatively easy to implement and could significantly improve scholars’ and lawyers’ ability to identify the most relevant precedents. These suggestions apply to Westlaw, LexisNexis, and other similar services, but I will focus on the current user interface of WestlawNext and leave it to the reader to see how the suggestions would apply to other services.

For WestlawNext, generalized inquiry usually begins with the user providing a citation, party names, keywords, or other information into a Boolean search algorithm.[22] While this process is fairly straightforward and efficient, it has a notable shortcoming. If, for example, your aim is to find cases involving pre-verbal infants causing harm, a search for “baby” will return just those cases where “baby” appears as a keyword or within the text; but, of course, cases mentioning “infant,” “toddler,” “small child,” or “newborn” could also prove relevant. Thus, these search engines could be improved by implementing semantic network databases such that nearby terms are given some weight.

Once the user has found a relevant case, WestlawNext provides excellent network-based information in the form of KeyCite.[23] This tool allows users to immediately see a summary evaluation of how the case fits into the network of legal precedent, whether the case has been superseded, affirmed, distinguished, or received other treatments, and the significance of each related case. This information is analogous to knowing node degree, types of edges, nearest neighbors, and edge weight, but is limited in the same way as these measures of node significance. A major shortcoming of the initial search results is that users are given a list of cases, C1–Cn, each related to the queried case, C0, where C1–Cn are each provided with specific information linking it to C0, but without any further information putting these cases in a broader legal context or showing how they might directly relate to one another. This is partially remedied by the diagrammatic representation of the case history on WestlawNext, wherein users see a small set of prior cases that have been granted rehearing, had their judgment reversed, etc., but there is a great missed opportunity at this stage of the search. Along with learning how the case directly relates to prior cases, it would be valuable to have network-based representations of a greater diversity of relations and a ranking system more sensitive to a case’s position within the network of legal precedents. Researchers could benefit from visual representations of several clusters of cases relevant to their specific topics, where the edges would indicate important relations between these cases beyond the relation of explicit undermining or supporting relations. For example, one could selectively add or remove edges indicating similarity in semantic content, relevant statutes, or topics. This would be beneficial for allowing scholars to freely navigate the metaphorical legal space in a literal physical space that intuitively maps onto the conceptual distances between the various cases. When starting the research process, this would provide users with an easily digestible, unified picture of the topic highlighting the most important judgments to consider in more detail, and, for the users already familiar with the legal landscape, this service would help them identify the most important gaps in their knowledge. For most of the relevant criteria, both Westlaw and LexisNexis already possess the data, so these services could be improved simply by adding functionality to the user interface.

It would also be beneficial to use network-based measures of centrality as an indicator of the significance of cases rather than raw citation counts or merely relying on a vague sense of importance that one has inherited from peers and educators. If one wished to know the most significant landmark cases on a specific issue, one could do far worse than seeking experts’ opinions, but a quantitative measure of citation counts may be a more reliable indicator of significance than even the intuitive judgments of experts. WestlawNext provides these citation numbers, but raw citation counts can be highly misleading as a method for ranking the significance of cases because this data is not sensitive to the relative importance of the court decisions citing the case in question, and some cases have received more citations simply in virtue of the fact that they were decided earlier. By analogy, in an academic citation network, being cited by the top scholar in the field is more important than being cited by ten small players. In the same way, court decisions cited in landmark cases are more significant than those cited by several less significant cases.

To gain a more accurate representation of the most significant cases, it would be better to have a system that mirrors academic rankings like H-index[24] or Google’s algorithms for ranking websites. This could be implemented by WestlawNext and similar services by providing users with a significance score for case C that is simultaneously sensitive to all of these factors: (1) the number of cases citing and cited by C, (2) the significance of the cited cases to C and the significance of C in the court decisions citing it, and (3) the relative importance of the citing and cited cases. This sort of method has been tested by James Fowler et al., who found inward relevance (one of many measures of network centrality) was a strong predictor of future citations.[25] Given the relative success of this and similar models for accurately identifying and predicting case significance, online archives such as Westlaw could improve the relevance of their search results by using network centrality for sorting and filtering results, and they could provide more meaningful information to users by including cases’ centrality scores in the listed search results.

IV. Recommendation

The advice offered above is specifically aimed at improving the efficiency of searches for cases with legal precedent, but these tools could be used in a greater variety of contexts. I conclude by briefly suggesting a few further possibilities. Closely related to the discussion above, the method of collecting and analyzing case precedent from a network perspective could be used by legal scholars to develop highly accurate pictures of the history and future of law. For example, Fowler et al. observed that, in the cases they reviewed, the Commerce Clause was the most significant legal issue in 1955, whereas First Amendment issues had become dominant in more recent years.[26]

By identifying and tracking the trends in law over the years, researchers could develop fine-grained, data-driven overviews of the history of the law while also developing accurate models for predicting future trends. Second, scholars could use network analysis to test for possible sources of bias in judicial decisions over the years by creating and analyzing social networks showing social or communication links between judges and lawyers that correlate, in a problematic way, with judges’ rulings. Finally, similar methods could be used to compare the structures of scientific and legal citation networks to see if the legal community’s structure is relevantly similar to the structure of the sciences.


*Ph.D., Philosophy, University of Illinois.  Special thanks go to Laura Peet and Alexis Dyschkant for invaluable discussions regarding the nature and practice of law.  I also wish to thank Jonathan Waskan and Jana Diesner for providing the empirical and theoretical tools needed to approach this topic.

[1] Simon Haykin, Neural Networks: A Comprehensive Foundation (1st ed. 1994).

[2] The SAGE Handbook of Social Network Analysis (John Scott & Peter Carrington eds., 2011).

[3] Ravinda Ahuja, Thomas Magnanti, & James Orlin, Network Flows: Theory, Algorithms, and Application (1993).

[4] Pandora, (last visited Feb. 4, 2015).

[5] Amazon, (last visited Feb. 4, 2015).

[6] Ian Rogers, The Google Pagerank Algorithm and How it Works, Ian Rogers, (last visited Feb. 4, 2015).

[7] Eugene Garfield, Discovering Shepard’s Citations, WebOfStories, play/eugene.garfield/25;jsessionid=C829679D889485A4E6AF76C0C3286EF1 (last visited Feb. 4, 2015).

[8] Ronald Dworkin, Law’s Empire (1986).

[9] David Easley & Jon Kleinberg, Networks, Crowds, and Markets: Reasoning About a Highly Connected World 2 (2010).

[10] Id.

[11] Id. at 47.

[12] Id. at 53.

[13] Reinhard Diestel, Graph Theory 5 (3d ed. 2005).

[14] Easley & Kleinberg, supra note 10, at 434.

[15] Id. at 342.

[16] Id.

[17] See id. at 417. This may seem paradoxical, as Eigenvector centrality for any given node can only be determined in reference to the Eigenvector centrality of other nodes. The paradox is removed because this metric is calculated on the basis of several iterations of the algorithm.

[18] The Page Rank Algorithm, eFactory, (last visited Feb. 4, 2015).

[19] Id.

[20] WestlawNext, (last visited Feb. 4, 2015).

[21] LexisNexis, (last visited Feb. 4, 2015).

[22] WestlawNext, (last visited Feb. 4, 2015).

[23] Lexis Advance offers a similar service with Shepard’s, and its Map option mirrors WestlawNext’s case mapping function described later in the paragraph.

[24] Publish or Perish,Harzing, (last visited Feb. 4, 2015). H-index is a measure of academics’ productivity. A scholar is given a score of h where she has h papers with h publications and the remaining papers have less than or equal to h citations.

[25] James Fowler et al., Network Analysis and the Law: Measuring the Legal Importance of Precedents at the U.S. Supreme Court, 15 Pol. Analysis 324–46 (2007).

[26] Id.