Daniel W. Shuman
Professor of Law

Telephone: (214) 768-2577
Fax: (214) 768-4330
Home phone: (214)-XXX-XXXX

December 15, 2000

Hon. Thomas R. Philips
Chief Justice, Supreme Court of Texas
Supreme Court
P.O. Box 12248
Austin, Texas 78711

Hon. Michael J. McCormick
Presiding Judge, Court of Criminal Appeals
P.O. Box 12308
Austin, Texas 78711

RE: Texas Judicial Council
       Performance Measures for Texas District and Appellate Courts

Dear Chief Justice Phillips and Judge McCormick:

We have been asked by the Dallas Trial Lawyers' Association to review the Texas Judicial Council's November 1999 Performance Measures for Texas District and Appellate Courts.  Because we conduct empirical studies of the judicial process, serve as peer reviewers of studies of the judicial process for professional journals, and have worked with organizations such as the Texas Bar Foundation and the Federal Judicial Center, we accepted the request to comment on this study.

We support the decision to reject the use of myth and stereotype as the basis for public policy decisions about the judicial process and to rely instead on information about the judicial process gained from empirical study.  All empirical studies of the judicial process, however, are necessarily limited either by financial, ethical, and/or practical considerations.  For example, to assess judicial performance it might be desirable to assign a committee of judges to review each district and appellate judges' judicial performance for a year, but that would not be a financial, ethical, or practical possibility.  Recognizing the limitations that face researchers, studies published in leading professional journals note the limitations on their methodology and the inferences to be drawn from their research.  We address the limitations of this study that affect the use to which the information it has generated should be put.

The most important limitation of the study is that it is solely a quantitative study.  It seeks answers to numerical questions about efficiency in the district and appellate courts, but not about the quality of the decisions made in those courts.  While the observation that "justice delayed is justice denied" reminds us that timeliness is an important consideration in the judicial process, efficiency achieved through arbitrary and capricious decision making (e.g., plaintiffs win on Tuesdays and Thursday, defendants on Mondays, Wednesdays, and Fridays) would strike no reasonable person as fair or just, even if it was efficient.  Rule 1 of the Texas Rules of Civil Procedure articulates our accumulated wisdom about the appropriate balance that the judicial process should strive to reach:

The proper objective of the rules of civil procedure is to obtain a just, fair, equitable, and impartial adjudication of the rights of litigants under established principles of substantive law.  To the end that this objective may be attained with as great expedition and dispatch and at the least expense both to the litigants and to the state as may be practicable, these rules shall be given a liberal construction.

Efficiency is a relevant consideration in the judicial process, but a study that focuses only on efficiency as a measure of judicial performance risks reinforcing this one aspect of judicial performance at the expense of other important considerations.

Measures exist to study these other important considerations, yet they have not been included in the study.  The National Center for State Courts Performance Measures, which the Texas Judicial Council refers to in its 1999 Report on Performance Measures for Texas District and Appellate Courts, addresses five areas: access to justice; expedition and timeliness; equality, fairness, and integrity; independence and accountability; and public trust and confidence. (  Yet, the Texas Performance Measures address only expedition and timeliness, ignoring access to justice; equality, fairness, and integrity; independence and accountability; and public trust and confidence.  One judge's response to Question 23 captures this concern with the Texas Judicial Council's failure to utilize the NCSC measures:

Even though my court has the lowest backlog and highest disposition rates of the civil dist. cts. in our county, it does not mean that litigants are getting more justice.  The Trial Court Performance Standards promulgated by the Natl Center for State Courts recognizes this, and what's why I think Texas should accept the national standards and not try to reinvent the wheel.  Texas should certainly not try to develop standards that equate justice with disposition rates!!!

The decision to ignore readily available, nationally recognized means to study the ability of our courts to adjudicate cases fairly as well as efficiently casts a shadow over the study and seriously limits the use to which the data it produces might responsibly be put.

Other judges' responses also reflect their perception of this inadequacy in the study.  One example is found in a response to Question 14C of the Survey of Judges Regarding Factors Impacting Case Disposition Activity that seeks to measure the district attorney's propensity to plea bargain, asking for a rating from strong propensity to pea bargain criminal cases to strong propensity to bring criminal cases to trial.  A judge corrected the question and then responded to the corrected question: "I think they plea bargain in the right cases and go to trial on the right cases."  That re-framed question is a powerful critique of the failure to address qualitative concerns in the study.  Another judge's response to Question 1 that asks for criteria to measure the performance of a district court reflects that judge's recognition of the importance of measuring public trust and confidence as well as efficiency.  That judge noted that to measure the performance of a district court it was necessary to employ both "(1) Objective standard: Review of time spend activity pursing duties of the specific court (2) Subjective: Survey of those affected by and working with the court."

One judge added a lengthy and thoughtful addendum addressing the quantity vs. quality concern.  In response to the first question asking for criteria that the judges believe "should be used to accurately reflect and measure the performance of the district court," the judge wrote:

This question does not lend itself to a good answer.

For this question, one first thinks of numbers of cases moved, however, that is not a valid measure of the performance of a court.  For example, I have one case involving two large corporations, who have no desire or impetus to settle the case and who continuously have hearings on discovery disputes.  This one case alone could consume an enormous amount of court time and research time thus depriving me of the ability to move other cases.  If a defendant in a civil case is able to pay large amounts of damages, a Plaintiff's attorney may pursue the case longer than if the defendant is essentially judgment proof and thus the case may last longer or move more quickly.  The quality of the attorney makes a difference as to whether a case is quickly moved.  Some criminal defense counsel move cases very quickly while others will always demand a trial.

One may think that the reversal rate of a judge on appeal is a good method of determining performance of a district court judge; however, counsel decide which cases to appeal -- not the judge.  If a counsel appeals a case which has no merit, and the case is thus affirmed on appeal, the record would reflect the judge performed well.  But if counsel did not appeal the case (perhaps his client could not afford it) and the case would have been affirmed had it been appealed, there will be no statistics to validate the judge's performance.  Likewise, one appellate counsel may find a reversible error in a transcript while another appellate counsel may not find it.  Two judges with the exact same record but with two different appellate counsels may thus have different outcomes on appeal.

Should one count the extrajudicial legal activities of the judge -- activity in the bar, judicial committees, speaking at conferences, writing legal papers?  How can that be quantified?

Time and again, in written comments judges expressed very real concerns about the quantity versus quality issue.  Wrote another judge, "How good a judge is, especially in their role as a judicial hearing officer, can and often may have no relation to case disposition statistics.  A judge who will give fair hearings with adequate time for each side will dispose of fewer cases."  Still, another judge wrote,

To be honest, I am not convinced that these proposed performance measures are in the best interests of the people we serve.  To me, one of the best performance measures is the esteem, or the lack thereof, that the citizens have for the courts that serve them.  If the citizens who are served by a court believe that they can go to that court and that justice will be done, then that court is performing well.  I do not believe that the quality of justice dispensed by a court can or should be judged by numerical statistics anymore than the quality of a legislator can be judged simply by the number of bills he files each session.  I am of the firm opinion that if we become to enamored with performance measures that can be counted on fingers and toes, we will lose sight of our real purpose for being, that is to see that justice is done fairly and impartially.

In a strong criticism of a purely quantitative approach to measuring judicial performance, a judge wrote "If you want to use numbers as a measure of a court's performance, study the German Court system of the 1930's and 40's.  They were very fast in dispensing rulings and handled an incredible number of cases.  Judges were evaluated solely on numbers and the result was efficiency; but certainly not justice."

A second concern is that statistics can easily mask events over which the judge has no control, but which can for a year or more dramatically affect the individual judge's performance measures.  surely, the citizens of Texas would want to make allowances for the judge who wrote that his case dispositions were affected by a need to care for a son with lung cancer or another judge who wrote that his dispositions were affected by injuries sustained in a serious automobile accident.  Yet, a focus on caseload disposition would not account for these important explanations for these judges' low dispositions.  Nor would a purely quantitative focus explain the problem one judge faced in one of the counties in the judicial district where a prosecutor was unwilling to try jury cases.  That wrote the judge, "...caused the defense lawyers to become disinterested in seeking plea arrangements because they see little danger that he will actually prosecute the case if they simply stand pat.  When pushed, he will more often than not file a motion to dismiss the case."  And statistics can mask other factors as well.  As one judge pointed out, family law cases can be remarkably time consuming and involve numerous hearings which don't result in a closed case.  Similarly, wrote the judge, judges may have children in conservatorship for many years where the case remains open because the children are never adopted.  Another judge also emphasized that family law judges dealt with very emotional cases on a regular basis and that contested child custody cases could be unusually time consuming and complex.  Yet, the questionnaire does not note the complexity of such cases as it does for some other categories.  As the judge wrote,

Our courts must determine the 'best interests if the children' ... and these cases usually take more time than non-children cases.  That standard precludes any approach that pushes the time factor as a major factor.  A contested child custody case may take two or more years before it is ready for trial and if a jury case, may take several weeks to try.  The same case tried before the bench may take 4 to 6 days.  In addition, each such case will be ordered to mediation and will be the subject of a social study.  Such studies, depending on the backlog, may take up to 6 months to complete.  If psychological exams are ordered, this may increase the time, especially if the exams are started prior to the family study.  In addition, all parties and children in contested custody cases are ordered to attend a parenting class.

Perhaps a weighted caseload system can account for the problem of a judge who had a capital murder case that occupied two months simply for voir dire!  But a weighted system would ignore the mountain of a caseload management problem of a family law judge who reported that in one year he had a senior family law specialist die and had to deal with the reallocation of that lawyer's cases.  Then the judge had two other family lawyers undergo chemotherapy with all the problems of trying to make accommodations for those lawyers' needs.  A judge may help out other courts, though that assistance would not show up in the judge's own statistics.  And sometimes, closed cases may not be desirable.  As one judge pointed out, it may not be desirable for judges to push people to finalize a divorce when neither side has requested a formal setting.  Letting such matters work themselves out may be in the best interest of the parties; but, noted the judge, quantitative performance standards "will only lead to judges playing number games with statistics."

Another judge even illustrated how judges have improved their statistics.  That judge wrote,

The 'number of cases closed' is not a good indicator in and of itself.  I practiced law in one court that abused this statistic.  It would call its docket.  If an attorney requested a continuance, the court would advise him/her that their case would start in one hour.  If an attorney announce [sic] ready, their case would be continued.  Either way, all the attorneys were in the hall trying to settle their cases and the court stats for closed files were impressive.  However the attorneys and the parties were unhappy with the process.  I practiced in another court that would set 60-70 cases on a weekly docket.  Attorneys would be forced to sit around most of the day when there was little chance of actually going to trial.  This also resulted in a lot of settled cases and good stats for the court, but created unhappy attorneys and litigants.

A third concern with the study and the data it produces is that a number of questions are ambiguous.  For example, question 8B of the Survey of Judges Regarding Factors Impacting Case Disposition Activity asks: "To what degree do you believe your current operating budget is adequate?"  Against what standard should a judge measure his or her belief in the adequacy of the current operating budget -- other judges in similar size counties in Texas; any other county in Texas with which the judge is familiar; other courts around the country with which the judge is familiar; or the judge's idealized courtroom expectations?  Given the question's absence of clarity about the standard against which these expectations should be measured, each respondent must decide the question to which he or she wishes to respond, but nowhere in the response is the judge's understanding of the question required to be set forth.

Question 18A of the same questionnaire asks about the time visiting judges were used.  Judges are asked to respond -- very frequently, somewhat frequently, occasionally, somewhat infrequently, very infrequently, or not at all.  Whether a judge who used a visiting judge on three occasions answers that question very frequently, somewhat frequently, occasionally, somewhat infrequently, or very infrequently confounds the judge's view about the appropriateness of using visiting judges with a statistical measure of its frequency.  It is impossible to disentangle these two matters from the answer given.

And the wording of Question 3G which asks "Please indicate specifically how the number of counties served by your court, the geographical jurisdiction (e.g., urban or rural), the geographic size of your judicial district, and/or the calendaring system impacted your court's case disposition activity during FY 1999" led one judge to answer in exasperation, "Respectfully, this seems a peculiar question."

The ambiguity of the questions probably partially explains a problem with the responses.  That is the problem of the completeness of the responses.  The greater the amount of missing data, the more doubtful is the value of any quantitative analysis.  Especially where the effort is to measure the performance of the individual judges, significant missing data minimizes the overall portrait of that judge's performance.  Unfortunately, there is a large amount of missing data in these questionnaires.  In addition to the ambiguity of the questions, questionnaire responses suggest that some judges were hostile out the questionnaires.  The large number of open-ended questions on the questionnaire probably reduced the response rate as well.  And, some judges expressed the view that they lacked information to respond to some of the questions or that the questions did not seem appropriate to them.

In the absence of a code reflecting the region of the state to which the questionnaire was sent and an analysis of the response rate for that region, there is a risk of over or under representation of particular areas of the state in the results.  By not accounting for there differential regional response rates, the study risks comparing cases and courts that are not comparable.  Any study of disposition that measures the length of time to disposition across judges dockets, assumes that, on the average, the cases are comparable from docket to docket and that each judge has a similar share of "simple" and "complex" cases.  There are more meaningful ways to measure judicial performance that avoid these erroneous assumptions.  For example, peer review of the handling of a sample of closed "simple" and "complex" cases would provide a more meaningful measure of judicial performance than collapsing and comparing incomparable cases.

In reference to the above point, the vast diversity of the Texas court system -- especially the district court system -- suggests the importance of appropriate statistical controls to properly measure the performance of judges compared to their peers as opposed to district court judges vastly different in role and resources.  Will the performance of district court judges be assessed in comparison with all the other district courts?  Will there be statistical controls for jurisdiction?  For resources available to the judges?  For urbanization?  Obviously, the appropriate way to assess any judge is with statistical controls to try to locate the judge among similarly situated judges rather than among a group of highly diverse judges in terms of both the natures of their jurisdictions and their resources.  One judge pointed out that rural courts fall in a different category from urban ones.  One of the problems of the judge's rural court was that lawyers had cases set in different courts at the same time and in rural Texas "the travel time makes it impossible to try both cases so one case is tried and the others are cancelled."  In a related vein, the judge pointed out that in rural areas there were problems with the availability of witnesses.  Since there were few medical witnesses, especially those dealing with mental health, doctors were often unavailable for testimony.

Still another problem is that the study relies on self-reporting without attempts to verify or corroborate the information sought.  It asks judges how many hours they work, how much they travel, and what percentage of time they spend on various tasks, without any check on the accuracy of the judge's memory or the tendency of the respondent in any study to seek out the "correct" answers.  The response of one judge notes this concern in more down to earth terms: "There is no objective way to determine who is working and who is not with great accuracy."  Thus, it is not clear how accurate or reliable much of the information that the study collects about efficiency will be.

The final problem problem with this study is one that cannot be ignored in a state with an elected judiciary that has had intensely contested judicial elections for two decades.  These measures can have significant political impact in an election campaign.  In one sense, of course, this is desirable because it encourages accountability of judges.  However, in another sense, these measures provide only an incomplete standard by which judges may be judged.  Not only are the measures incomplete in that they focus on quantitative standards at the expense of qualitative standards,  but there is incomplete quantitative data.  It is unclear whether the data gathered will be subjected to the appropriate statistical controls so that the proper comparisons may be made.  And, even if the data are properly used and analyzed by the Texas Judicial Council, there remains the problem of whether the data will be correctly used in the heat of a judicial campaign.  Certainly, there seems little doubt that individual judicial performance evaluations can readily become political weapons in judicial campaigns and also little doubt that, rather than contributing to an improvement in judicial accountability, these data can become instruments to obscure, confound, and mislead the electorate.  Indeed, the political volatility of some of these measures may create significant problems in insuring the accuracy of responses.  Where there are no independent checks on the answers of judges, there is a significant incentive for judges to exaggerate answers to questions.  The question about the number of hours worked per week by judges is one that seems especially sensitive to political concerns and abuse.  It is difficult (if not impossible) to independently measure the number of hours worked per week by judges.  As a result, there will likely be a natural tendency on the part of the judges to generously estimate the length of their work-week.  A high number of hours worked, for example, will help protect the judge from political criticism and will be useful in promoting the judges' candidacy for reelection.  Indeed, even ignoring the impact of individual performance measures on a judicial election, there remains the knowledge that responses to the questionnaire can affect the number of judges in that area.  That alone is probably enough to affect the responses to the questionnaire.  Of course, some responses can be independently checked for accuracy and it is important that they are checked, although independent checking will increase the difficulty and cost of the individual judicial performance studies.

Perhaps the best way to conclude this review of the study and to capture its failure to develop comprehensive measures of judicial performance is with one judge's response to Question 23:

The quality vs. quantity of justice is not a valid comparison to make in most situations.  A variety of factors come into play with nearly every case.  Performance standards sound like standard business practice rather than more sensitive business of dealing with the lives of people in courtrooms.  If we forget that we will have forsaken the oath that we all adhere to as judges.

Daniel Shuman

Daniel W. Shuman
Professor of Law
Southern Methodist University

Anthony Champagne

Anthony Champagne
Professor of Government and Politics
University of Texas at Dallas

Comment on the November 2000 Draft Report

Our earlier letter examined the questionnaire and responses to the Texas Judicial Council Performance Measures.  After preparing that letter, we were asked to comment on the November 2000 draft of the report on the District Court Performance Measures Pilot Project.

We were pleased to note that the authors of the draft report have identified as a number of problems with the questionnaire, problems that we also noted in our evaluation of the questionnaire.  Of particular value, we think, are the following comments made in the draft report:

  1. The need for consideration of qualitative factors.  "Qualitative factors (e.g., the quality and fairness of judicial decisions rendered, judicial temperament) also need to be considered when trying to evaluate such a subjective and qualitative matter as justice." (p. 4, with numerous concerns expressed throughout this report about reliance on quantitative measures only)
  2. The problems of focus on individual judges.  "The implementation of accurate, reliable performance measures for individual district courts is infeasible because of the complex structure and the geographic and funding diversity of Texas' trial court system." (p. 8)
  3. The problem of overly broad questions.  "The Office of Court Administration's statistical reporting system for the courts in Texas should be revised to add specificity to the data collected concerning the types of cases filed and their dispositions, and to ensure that the data is reported uniformly and consistently...." (p. 7)
  4. The problem of Inaccurate or Incomplete Data.  This problem is explicitly noted (p. 15) and is well illustrated by the survey to sample courts.  The survey was submitted to 75 courts.  Only 43 courts responded and only 26 provided usable statistical data. (p. 10)
  5. The difficulties in making comparisons across courts.  "Only statistical comparisons of similarly-sized counties and/or counties with comparable caseloads should be made (on a countywide basis) with special notes given to the following factors:
  • Subject matter jurisdiction;
  • The use of an exchange of benches systems;
  • The use of a master calendar system;
  • The presence of multi-county districts;
  • Whether the district court has overlapping jurisdiction with statutory county courts or other district courts;
  • Complexity of cases." (p. 8)

This report reinforces our earlier concern with the enormous methodological problems in doing these performance evaluations and it serves to emphasize the caution that should be exercised in undertaking this research.