Dataset

The corpus underlying this project consists of 1,535 U.S. newspaper records published between 1880 and 1885, collected from the Library of Congress Chronicling America archive. Records were retrieved through seven keyword searches targeting Chinese children, students, schools, and education. The corpus spans 323 distinct newspaper titles across all major U.S. regions.

Two versions of the corpus are used in the analysis: a deduplicated MALLET corpus of 1,100 documents, with repeated boilerplate and copied passages reduced, and a full corpus of 1,525 articles, with originals and reprints combined. Comparing the two reveals how the nineteenth-century reprinting network reshaped what readers actually encountered.

In This Section

Browse the Dataset: search and filter all 1,535 records, with direct links to Chronicling America
Dataset Reference: column descriptions, summary statistics, and an example record

Dataset ​

In This Section ​

Dataset

In This Section