Jump to: China Data, US DataInternational Data


My Surveys

In my China research I analyze both original data I collected and a wide range of secondary data. I have conducted six surveys in China: (1) my 2000 survey of Chinese lawyers in 25 cities, (2) the 2009 Chinese Legal Environment (CLE) Survey, together with Sida Liu, (3) the 2015 CLE Survey (Wave 2 of the subsample of lawyers who participated in the 2009 CLE), (4) the 2001 Beijing Law and Community Survey, together with Ben Read and colleagues at the Renmin University of China Department of Sociology, and supported by the Ford Foundation’s Beijing Office), (5) the 2002 Rural Law and Community Survey (with the same collaborators and funding source), and (6) the 2010 Rural Law and Community Survey (a repeated cross-sectional survey conducted the same locations as the original 2002 survey).

More Original Datasets

I have also assembled several large data sets from print and web sources. First, I have compiled a dataset of vital statistics (population, births, deaths, and moves) from local gazetteers (primarily from the 1980s and 1990s) covering approximately 1,400 counties.

Second, in 2005, long before the term “web scraping” emerged, I downloaded and organized a considerable portion of the contents of the celebrated online lawyer forum hosted by the All-China Lawyers Association’s website: over 173,000 messages posted by over 8,000 users, many if not most of whom were lawyers. Owing to its politically sensitive content, the forum became unusable through most of 2006 and was ultimately shut down for good in 2007. The relational database I created contains the complete text of each message, information necessary to reconstruct entire threads (thread id, message id, direct and indirect reply relationships, dates and times, username or poster, and profile information associated with users, including geographical information, occupation, firm name, etc.).

Third, I have created relational databases from court decisions published online. Almost a decade ago Chinese courts began posting their written judgments online. Local courts’ efforts to make publicly accessible their decisions gained momentum and spread in accordance with signals in 2009 from the Supreme People’s Court. Ultimately the top court passed a rule in 2013 (effective in 2014) mandating all courts to post their decisions (with a few exceptions). In conjunction with a project led by Benjamin Liebman, Rachel Stern, and Margaret Roberts, I created a dataset of over 1 million judgments from every court in Henan Province between 2009 and 2015 containing the complete text of the decisions (with case details and outcomes), information about the plaintiffs and defendants (sex, age, ethnicity, residence), their legal counsel (lawyer or other type of counsel, name, sex, firm, city), the case’s history (prior litigation), and court context (type of court, court size in terms of judges and docket, docket composition, social and economic characteristics of the locale). I built a similar database of over 3 million court decisions from Zhejiang Province.

Secondary Survey Data

Sources of secondary data I analyze from China include the Chinese General Social Survey, the China Family Panel Studies survey, the China Health and Retirement Longitudinal Study, the China Health and Nutrition Survey, the China Household Income Project, a 2007 survey of Chinese lawyers in eight provinces, and various government surveys, including individual-level census data.

(click here to go back to the top)


I am a member of the After the JD (AJD) project’s Executive Coordinating Committee. The AJD is large longitudinal national survey of law school graduates admitted to a state bar in the year 2000. We have been tracking the lives and careers of this cohort of JD-holders through three waves of surveys (2002-3, 2007-8, and 2012-13).

I have been analyzing data from the National Survey of College Graduates (NSCG) for the same purpose. NSCG data also permit the longitudinal analysis of the lives and careers of law school graduates. I am using NSCG data for three main purposes: (1) to replicate AJD findings, (2) to compare the class of 2000 with earlier and more recent cohorts of law school graduates, and (3) to compare law school graduates with other types of advanced degree holders.

I am also an avid consumer of US Census data. I have been using individual-level census data (Public Use Microdata Samples and the American Community Survey), Economic Census data, and County Business Patterns data to study various dimensions of the US legal profession over the past several decades.

I have dabbled in the analysis of data from the Survey of Income and Program Participation, but found them to be poorly suited to the study of lawyers and law school graduates.

(click here to go back to the top)


For about five years I have been trying to build a field I call legal demography, which I define as the study of legal professions using data with occupational and/or educational codes sufficiently detailed to identify lawyers and/or law degree holders but which were not collected for the dedicated purpose of studying this population. I have been analyzing NSCG and US Census data (see “US Data“) in my legal demography efforts. I have taken legal demography global with the help of the International IPUMS collection, which includes over 150 country-years with identifiable members of the legal profession.

My research on trust in legal institutions has also gone global with data from various waves of the World Values Survey.

(click here to go back to the top)