December 15, 2000
Hon. Thomas R. Philips
Hon. Michael J. McCormick
Dear Chief Justice Phillips and Judge McCormick:
We have been asked by the Dallas Trial Lawyers' Association to review the Texas Judicial Council's November 1999 Performance Measures for Texas District and Appellate Courts. Because we conduct empirical studies of the judicial process, serve as peer reviewers of studies of the judicial process for professional journals, and have worked with organizations such as the Texas Bar Foundation and the Federal Judicial Center, we accepted the request to comment on this study.
We support the decision to reject the use of myth and stereotype as the basis for public policy decisions about the judicial process and to rely instead on information about the judicial process gained from empirical study. All empirical studies of the judicial process, however, are necessarily limited either by financial, ethical, and/or practical considerations. For example, to assess judicial performance it might be desirable to assign a committee of judges to review each district and appellate judges' judicial performance for a year, but that would not be a financial, ethical, or practical possibility. Recognizing the limitations that face researchers, studies published in leading professional journals note the limitations on their methodology and the inferences to be drawn from their research. We address the limitations of this study that affect the use to which the information it has generated should be put.
The most important limitation of the study is that it is solely a quantitative study. It seeks answers to numerical questions about efficiency in the district and appellate courts, but not about the quality of the decisions made in those courts. While the observation that "justice delayed is justice denied" reminds us that timeliness is an important consideration in the judicial process, efficiency achieved through arbitrary and capricious decision making (e.g., plaintiffs win on Tuesdays and Thursday, defendants on Mondays, Wednesdays, and Fridays) would strike no reasonable person as fair or just, even if it was efficient. Rule 1 of the Texas Rules of Civil Procedure articulates our accumulated wisdom about the appropriate balance that the judicial process should strive to reach:
Efficiency is a relevant consideration in the judicial process, but a study that focuses only on efficiency as a measure of judicial performance risks reinforcing this one aspect of judicial performance at the expense of other important considerations.
Measures exist to study these other important considerations, yet they have not been included in the study. The National Center for State Courts Performance Measures, which the Texas Judicial Council refers to in its 1999 Report on Performance Measures for Texas District and Appellate Courts, addresses five areas: access to justice; expedition and timeliness; equality, fairness, and integrity; independence and accountability; and public trust and confidence. (http://www.ncsc.dni.us/RESEARCH/tcps_web/index.html) Yet, the Texas Performance Measures address only expedition and timeliness, ignoring access to justice; equality, fairness, and integrity; independence and accountability; and public trust and confidence. One judge's response to Question 23 captures this concern with the Texas Judicial Council's failure to utilize the NCSC measures:
The decision to ignore readily available, nationally recognized means to study the ability of our courts to adjudicate cases fairly as well as efficiently casts a shadow over the study and seriously limits the use to which the data it produces might responsibly be put.
Other judges' responses also reflect their perception of this inadequacy in the study. One example is found in a response to Question 14C of the Survey of Judges Regarding Factors Impacting Case Disposition Activity that seeks to measure the district attorney's propensity to plea bargain, asking for a rating from strong propensity to pea bargain criminal cases to strong propensity to bring criminal cases to trial. A judge corrected the question and then responded to the corrected question: "I think they plea bargain in the right cases and go to trial on the right cases." That re-framed question is a powerful critique of the failure to address qualitative concerns in the study. Another judge's response to Question 1 that asks for criteria to measure the performance of a district court reflects that judge's recognition of the importance of measuring public trust and confidence as well as efficiency. That judge noted that to measure the performance of a district court it was necessary to employ both "(1) Objective standard: Review of time spend activity pursing duties of the specific court (2) Subjective: Survey of those affected by and working with the court."
One judge added a lengthy and thoughtful addendum addressing the quantity vs. quality concern. In response to the first question asking for criteria that the judges believe "should be used to accurately reflect and measure the performance of the district court," the judge wrote:
Time and again, in written comments judges expressed very real concerns about the quantity versus quality issue. Wrote another judge, "How good a judge is, especially in their role as a judicial hearing officer, can and often may have no relation to case disposition statistics. A judge who will give fair hearings with adequate time for each side will dispose of fewer cases." Still, another judge wrote,
In a strong criticism of a purely quantitative approach to measuring judicial performance, a judge wrote "If you want to use numbers as a measure of a court's performance, study the German Court system of the 1930's and 40's. They were very fast in dispensing rulings and handled an incredible number of cases. Judges were evaluated solely on numbers and the result was efficiency; but certainly not justice."
A second concern is that statistics can easily mask events over which the judge has no control, but which can for a year or more dramatically affect the individual judge's performance measures. surely, the citizens of Texas would want to make allowances for the judge who wrote that his case dispositions were affected by a need to care for a son with lung cancer or another judge who wrote that his dispositions were affected by injuries sustained in a serious automobile accident. Yet, a focus on caseload disposition would not account for these important explanations for these judges' low dispositions. Nor would a purely quantitative focus explain the problem one judge faced in one of the counties in the judicial district where a prosecutor was unwilling to try jury cases. That wrote the judge, "...caused the defense lawyers to become disinterested in seeking plea arrangements because they see little danger that he will actually prosecute the case if they simply stand pat. When pushed, he will more often than not file a motion to dismiss the case." And statistics can mask other factors as well. As one judge pointed out, family law cases can be remarkably time consuming and involve numerous hearings which don't result in a closed case. Similarly, wrote the judge, judges may have children in conservatorship for many years where the case remains open because the children are never adopted. Another judge also emphasized that family law judges dealt with very emotional cases on a regular basis and that contested child custody cases could be unusually time consuming and complex. Yet, the questionnaire does not note the complexity of such cases as it does for some other categories. As the judge wrote,
Perhaps a weighted caseload system can account for the problem of a judge who had a capital murder case that occupied two months simply for voir dire! But a weighted system would ignore the mountain of a caseload management problem of a family law judge who reported that in one year he had a senior family law specialist die and had to deal with the reallocation of that lawyer's cases. Then the judge had two other family lawyers undergo chemotherapy with all the problems of trying to make accommodations for those lawyers' needs. A judge may help out other courts, though that assistance would not show up in the judge's own statistics. And sometimes, closed cases may not be desirable. As one judge pointed out, it may not be desirable for judges to push people to finalize a divorce when neither side has requested a formal setting. Letting such matters work themselves out may be in the best interest of the parties; but, noted the judge, quantitative performance standards "will only lead to judges playing number games with statistics."
Another judge even illustrated how judges have improved their statistics. That judge wrote,
A third concern with the study and the data it produces is that a number of questions are ambiguous. For example, question 8B of the Survey of Judges Regarding Factors Impacting Case Disposition Activity asks: "To what degree do you believe your current operating budget is adequate?" Against what standard should a judge measure his or her belief in the adequacy of the current operating budget -- other judges in similar size counties in Texas; any other county in Texas with which the judge is familiar; other courts around the country with which the judge is familiar; or the judge's idealized courtroom expectations? Given the question's absence of clarity about the standard against which these expectations should be measured, each respondent must decide the question to which he or she wishes to respond, but nowhere in the response is the judge's understanding of the question required to be set forth.
Question 18A of the same questionnaire asks about the time visiting judges were used. Judges are asked to respond -- very frequently, somewhat frequently, occasionally, somewhat infrequently, very infrequently, or not at all. Whether a judge who used a visiting judge on three occasions answers that question very frequently, somewhat frequently, occasionally, somewhat infrequently, or very infrequently confounds the judge's view about the appropriateness of using visiting judges with a statistical measure of its frequency. It is impossible to disentangle these two matters from the answer given.
And the wording of Question 3G which asks "Please indicate specifically how the number of counties served by your court, the geographical jurisdiction (e.g., urban or rural), the geographic size of your judicial district, and/or the calendaring system impacted your court's case disposition activity during FY 1999" led one judge to answer in exasperation, "Respectfully, this seems a peculiar question."
The ambiguity of the questions probably partially explains a problem with the responses. That is the problem of the completeness of the responses. The greater the amount of missing data, the more doubtful is the value of any quantitative analysis. Especially where the effort is to measure the performance of the individual judges, significant missing data minimizes the overall portrait of that judge's performance. Unfortunately, there is a large amount of missing data in these questionnaires. In addition to the ambiguity of the questions, questionnaire responses suggest that some judges were hostile out the questionnaires. The large number of open-ended questions on the questionnaire probably reduced the response rate as well. And, some judges expressed the view that they lacked information to respond to some of the questions or that the questions did not seem appropriate to them.
In the absence of a code reflecting the region of the state to which the questionnaire was sent and an analysis of the response rate for that region, there is a risk of over or under representation of particular areas of the state in the results. By not accounting for there differential regional response rates, the study risks comparing cases and courts that are not comparable. Any study of disposition that measures the length of time to disposition across judges dockets, assumes that, on the average, the cases are comparable from docket to docket and that each judge has a similar share of "simple" and "complex" cases. There are more meaningful ways to measure judicial performance that avoid these erroneous assumptions. For example, peer review of the handling of a sample of closed "simple" and "complex" cases would provide a more meaningful measure of judicial performance than collapsing and comparing incomparable cases.
In reference to the above point, the vast diversity of the Texas court system -- especially the district court system -- suggests the importance of appropriate statistical controls to properly measure the performance of judges compared to their peers as opposed to district court judges vastly different in role and resources. Will the performance of district court judges be assessed in comparison with all the other district courts? Will there be statistical controls for jurisdiction? For resources available to the judges? For urbanization? Obviously, the appropriate way to assess any judge is with statistical controls to try to locate the judge among similarly situated judges rather than among a group of highly diverse judges in terms of both the natures of their jurisdictions and their resources. One judge pointed out that rural courts fall in a different category from urban ones. One of the problems of the judge's rural court was that lawyers had cases set in different courts at the same time and in rural Texas "the travel time makes it impossible to try both cases so one case is tried and the others are cancelled." In a related vein, the judge pointed out that in rural areas there were problems with the availability of witnesses. Since there were few medical witnesses, especially those dealing with mental health, doctors were often unavailable for testimony.
Still another problem is that the study relies on self-reporting without attempts to verify or corroborate the information sought. It asks judges how many hours they work, how much they travel, and what percentage of time they spend on various tasks, without any check on the accuracy of the judge's memory or the tendency of the respondent in any study to seek out the "correct" answers. The response of one judge notes this concern in more down to earth terms: "There is no objective way to determine who is working and who is not with great accuracy." Thus, it is not clear how accurate or reliable much of the information that the study collects about efficiency will be.
The final problem problem with this study is one that cannot be ignored in a state with an elected judiciary that has had intensely contested judicial elections for two decades. These measures can have significant political impact in an election campaign. In one sense, of course, this is desirable because it encourages accountability of judges. However, in another sense, these measures provide only an incomplete standard by which judges may be judged. Not only are the measures incomplete in that they focus on quantitative standards at the expense of qualitative standards, but there is incomplete quantitative data. It is unclear whether the data gathered will be subjected to the appropriate statistical controls so that the proper comparisons may be made. And, even if the data are properly used and analyzed by the Texas Judicial Council, there remains the problem of whether the data will be correctly used in the heat of a judicial campaign. Certainly, there seems little doubt that individual judicial performance evaluations can readily become political weapons in judicial campaigns and also little doubt that, rather than contributing to an improvement in judicial accountability, these data can become instruments to obscure, confound, and mislead the electorate. Indeed, the political volatility of some of these measures may create significant problems in insuring the accuracy of responses. Where there are no independent checks on the answers of judges, there is a significant incentive for judges to exaggerate answers to questions. The question about the number of hours worked per week by judges is one that seems especially sensitive to political concerns and abuse. It is difficult (if not impossible) to independently measure the number of hours worked per week by judges. As a result, there will likely be a natural tendency on the part of the judges to generously estimate the length of their work-week. A high number of hours worked, for example, will help protect the judge from political criticism and will be useful in promoting the judges' candidacy for reelection. Indeed, even ignoring the impact of individual performance measures on a judicial election, there remains the knowledge that responses to the questionnaire can affect the number of judges in that area. That alone is probably enough to affect the responses to the questionnaire. Of course, some responses can be independently checked for accuracy and it is important that they are checked, although independent checking will increase the difficulty and cost of the individual judicial performance studies.
Perhaps the best way to conclude this review of the study and to capture its failure to develop comprehensive measures of judicial performance is with one judge's response to Question 23:
Comment on the November 2000 Draft Report
Our earlier letter examined the questionnaire and responses to the Texas Judicial Council Performance Measures. After preparing that letter, we were asked to comment on the November 2000 draft of the report on the District Court Performance Measures Pilot Project.
We were pleased to note that the authors of the draft report have identified as a number of problems with the questionnaire, problems that we also noted in our evaluation of the questionnaire. Of particular value, we think, are the following comments made in the draft report:
This report reinforces our earlier concern with the enormous methodological problems in doing these performance evaluations and it serves to emphasize the caution that should be exercised in undertaking this research.