Challenge of digitizing old books and documents
Kolkata: Year 1818: Bengali journal Dik Darpan sees India’s first advertisements where merchant navy operators seek to win trade consignments.
1890s: Rabindranath Tagore’s family, famous for producing some of the best literary talents through generations, publishes journals on recipes written by women.
1860: Ishwar Chandra Gupta, a famous Bengali writer and social critic, writes about the menace of Kolkata’s brothels and the need to demarcate red-light areas in his weekly journal. British rulers formulate the Immoral Trafficking Act of 1862.
1880: India’s first medical journal Bhisak Darpan appears in Bengal, matching modern counterparts in its detailing—individual case studies, treatments and follow-ups.
All these apparently disconnected events wouldn’t have been known but for an initiative to recover and digitize a rotting pile of papers stacked in damp store rooms.
Abhijit Bhattacharya, documentation officer at the Centre for Studies in Social Sciences, Calcutta (CSSSC)—an institute that works closely with the government in nurturing research on humanities—has taken up the challenge of converting old books and documents found in eastern India into digital form and making them available for everyone to access on the Internet.
In a project funded by British Library, Bhattacharya and his team have so far collected 27,000 books printed before 1950 from different libraries and private possessions. Out of this, 3,850 books with 380,000 pages, accounting for 24 terabytes of data have already gone online.
“This is no rocket science; rather a simple use of existing technology to feed data in a universally accepted form,” Bhattacharya said. What can make the project achieve its potential, though, is its importance among scholars of Asian history the world over, he said.
Bhattacharya’s digitized content, hosted on the website of British Library’s Endangered Archives Programme that funds his project, has provided valuable inputs to researchers in the world’s top universities such as Colombia University, the University of Michigan, Princeton University and Jawaharlal Nehru University of New Delhi.
CSSSC is also working on a similar archiving project with Germany’s University of Heidelberg. Across all its projects, the institute has so far digitized 1.2 billion pages, yielding 150 terabytes of data.
“The culture of preservation and archiving historical resources is poor in India, and only large-scale digitization can help conserve the priceless documents that are on the brink of decay,” said Shinjini Das, a post-doctoral research fellow at the University of Cambridge.
Digitization of documents “will alter the way historical research is done currently by both cutting down cost and time”, particularly at a time when funding for research on liberal arts and humanities was facing a crisis worldwide, she said.
India lagged behind developed countries by miles as far as digitization of academic documents is concerned, Bhattacharya said. While in the US, Canada and European countries, 80% of texts of historical relevance are converted into digital forms, in India, less than 15% is so far digitized.
Though institutes in eastern and southern India made some progress in this regard, the west and the north have vastly neglected the importance of preserving documents other than government data, he added.
In the developed world, the need for digitizing historical resources was first perceived as early as 1966 when the American Library Association formed a committee to decide on a plan.
“The committee realized that it was much better to make a single digital catalogue of a book rather than physically cataloguing it in thousands of libraries worldwide,” Bhattacharya said. The real breakthrough came in 1998 when a universally accepted formula called the Machine Readable Catalogue 21 (Marc 21) was developed for 21st century record-keeping.
The real wave of technological innovations in digital archiving started in the 1980s and continued till 2005. Since then, the system had been working smoothly with occasional upgrades and tweaks, Bhattacharya said. For example, the system of microfilming or encrypting documents in microfilms has now been replaced by digital imaging.
Interestingly, Australia, which was rather late in declaring higher education and research studies as focus areas, was the first to initiate digitization in academics in the early 1980s, Bhattacharya said. This was much before Google perceived posting storybooks on the Internet.
While India’s academia suffered due to lack of funding sources to support digital preservation, wealthy businessmen did not show the initiative, Bhattacharya said. India’s heritage companies that are a treasure trove of the country’s rich history of business and labour movements have done little to preserve them.
It was “unethical” to charge a fee for the use of the academic content posted on the Internet, he said. “After all, there are a handful of scholars using our preserved documents and we have to charge them a fortune in order to make the project self-sustainable.”
Without a fool-proof revenue stream, projects like Bhattacharya’s are vastly dependent on funding by reputed international institutes that patronize them for their sheer importance among scholars. British Library’s Endangered Archives Programme awarded two rounds of grants to Bhattacharya’s project—£10,007 in 2009 for the pilot survey and £39,060 in 2010.
Even if funds dried up, the enormous volume of documents uploaded on the Internet will never lose their relevance, says Bhattacharya.
For now, however, Bhattacharya’s hands are full. Some of the ongoing projects undertaken by his institution included preserving manuscripts on the famous Bhakti Movement collected from West Bengal’s Nadia district—birthplace of Chaitanya Mahaprabhu, a Hindu monk and social reformer of the 15th century.
His team has also taken up a project to upload daily newspapers of the period 1870-1949 from a repository of the famous Amrita Bazar Patrika—a nationalist newspaper that countered The Statesman in its heyday.
Mint has a strategic partnership with Digital Empowerment Foundation, which hosts the Manthan Awards.