Dataset
The corpus underlying this project consists of 1,535 U.S. newspaper records published between 1880 and 1885, collected from the Library of Congress Chronicling America archive. Records were retrieved through seven keyword searches targeting Chinese children, students, schools, and education. The corpus spans 323 distinct newspaper titles across all major U.S. regions.
Two versions of the corpus are used in the analysis: a deduplicated corpus of 1,100 original articles (reprints removed) and a full corpus of 1,525 articles (originals and reprints combined). Comparing the two reveals how the nineteenth-century reprinting network reshaped what readers actually encountered.
In this section
- Browse the Dataset: search and filter all 1,535 records, with direct links to Chronicling America
- Dataset Reference: column descriptions, summary statistics, and an example record