The Vermont Digital Newspaper Project last week added its first batch of digitized newspaper pages to a national database dedicated to providing searchable digital copies of historic newspapers from all over the nation.
Tom McMurdo, the project librarian for the state effort, said there are currently 25 states involved in the National Digital Newspaper Project along with Washington, D.C. Vermont’s addition last week added the database’s oldest available pages, some from 1836, the earliest year within Vermont’s range of funding.
The $391,552 for the Vermont project comes from the National Endowment for the Humanities, and stipulates that the project must digitize 100,000 pages of Vermont newspapers published between 1836 and 1922. The earliest page available from anywhere in the nation, from Jan. 5, 1836, is from The Rutland Herald. It features a follow-up story on New York’s Great Fire of December 1835.
Federal officials chose 1836 as a starting point in an effort to extend the realm of public information to before the Civil War, around which much work has already been done. 1922 is the last year of public domain, said McMurdo, so the project would have to get special permission from publishers in order to digitize any content published after that year.
The work outlined in the grant is the beginning of a much larger effort, said McMurdo.
“100,000 pages is just a drop in the bucket,” he said. “There are millions of pages of newspapers in Vermont that have not been digitized.”
McMurdo moved to Vermont from California, where he had been working with the California Digital Newspaper Collection since the inception of the national project in 2005. He was hired by UVM as a full-time librarian on the Vermont project. McMurdo said that as the primary 100,000 pages are completed, the project will seek additional funding to extend their efforts.
The process of digitizing the newspapers is much more complicated than simply scanning an image, McMurdo said. Because the digitization effort also involves a searchable database of pages, the process requires an additional process called Optical Character Recognition, OCR for short.
Optical Character Recognition is software that analyzes scanned pages and assigns digital text values to written characters, McMurdo said. Once these values are assigned, users can search the database of pages for specific terms. The searches will return a set of pages which contain the terms searched.
The process, McMurdo said, is not 100 percent accurate. If the digital scans are made using degraded microfilm which was made using degraded newspapers, OCR accuracy can be as low as 25 percent. However, using high quality microfilm images, accuracy can be up to 98 percent, McMurdo said.
The National Digital Newspaper Project publishes the searchable database at Chronicling America, a website where digitized pages from all over the country are available.
http://chroniclingamerica.loc.gov/ Title: Chronicling America Homepage
Vermont’s project is focusing primarily on 12 titles from all over the state within the prescribed time period. A 12-member advisory committee of journalists, librarians, and historians from all over Vermont decided on the selection of publications, focusing on capturing quality historical content while at the same time maintaining a good geographic spread, McMurdo said.
http://vtdnp.wordpress.com/state-advisory-committee/ Title: Vermont Digital Newspaper Project State Advisory Committee
Vermont Digital Newspaper Project Blog: List of Focus Publications
Distinctly missing from that list is the Times Argus and daily editions of The Rutland Herald. Montpelier is covered by the State Journal and Watchman in the 1836 through 1910 period, though the Argus was “definitely in strong consideration,” said McMurdo. Rutland Herald’s master negatives, the film required for digitization, are in possession of a Michigan-based company called ProQuest. The company’s tagline is “Central To Research Around The World,” but McMurdo said they asked for “an exorbitant amount of money” in exchange for the master negatives requested by the Vermont Digital Newspaper Project.
ProQuest did not respond to a request for comment.
The Vermont Digital Newspaper Project is contracting out the labor-intensive microfilm scanning process, McMurdo said, to iArchives, a Utah-based company. Microfilm from the state archives is copied and sent to the company, where they are scanned and then sent back in digital form for processing. The state archive originals, he said, never leave the state.
VTDigger.org editor Nick Monsarrat is a member of the Vermont Digital Newspaper Project advisory board.