Monday, September 17, 2007

A Technique and Low-cost Way for Global Access to Your Library's Digital Assets (Part 2)

htkauwkeIn the June 2007 Library Insider newsletter, I wrote about how using Google Sitemap can improve web crawling and search results for a library’s digital repository, www.bsu.edu/libraries/virtualpress/libinsider/libinsiderv5i6.pdf.
In this issue, I will further describe the use of techniques using Open Archives Initiative Protocol for Metadata Harvesting, OAI-PMH, to increase or improve the availability of digital content to search engine users. I will also discuss how to use your CONTENTdm OAI server with Google Sitemaps.

CONTENTdm has included OAI support since its version 3 or earlier. The Ball State University Libraries have been advantageously using this functionality for over a year, and we have seen how it improves and expands hits and usage of our digital collections. For those who are not familiar with this protocol, I will write a brief explanation about OAI-PMH.

Simply stated, OAI-PMH is a protocol used to collect metadata in XML format from various systems with the intention to index the metadata in a centralized location.

There are two main players in this process. They are the Data Providers and the Service Providers. Data Providers are institutions with systems that have shareable metadata or content. These institutions may already have a searching ability to search their contents. Even if they do, searching can only be done within the data provider’s repository or interface. Service Providers are institutions that are able to gather content or data from different Data Providers in one place and provide service for the collected data, such as searching.

If you have digital collections, you already are one of the many Data Providers, and you probably also have collections that are unique to your institutions. The barrier to making these collections accessible globally is getting them known. Happily, OAI-PMH is one of the answers.

The first action is to register your institution as a Data Provider with the organization that started the whole idea, the Open Archives Initiative, www.openarchives.org/data/registerasprovider.html. If you are using CONTENTdm, then you can rest assured that it will pass the conformance test. Below are the general steps to follow to register a CONTENTdm system as a Data Provider:
Locate oai.txt. It is usually located under %CONTENTdm_Installation_folder%\server\conf\
Change the field elements as instructed in the file:
· Enable the OAI
· Name the repository
· Specify the email contact for the administrator
· Specify the collections’ alias you wish to make available through OAI, one line per collection, e.g. Collections/mycollection

The second step is to find as many Service Providers as possible who are willing to harvest your data, and find out how you can become a Data Provider for them. There is a list of Service Providers at www.openarchives.org/service/listproviders.html. This process usually does not cost anything. However, some service providers are only willing to harvest data that is of interest to them in particular subjects or disciplines.

Keep in mind that the central idea in following these steps is to make your collections searchable from multiple locations. In this respect, it is natural to think about Google when talking about web searching. So the next question to ask is how to get Google involved in harvesting your data. The good news is that Google supports OAI-PMH, www.google.com/support/webmasters/bin/answer.py?answer=34655.

However, the instructions on the website are not easy to understand, so let me provide some clarity.
In CONTENTdm, a file that acts as an OAI server is oai.exe. This file can usually be found under
%CONTENTdm_Installation_folder%\server\docs\cgi-bin\.

From the web, the OAI server will be accessed through http://yoursite.com/cgi-bin/oai.exe. If you are interested in using the Google Sitemap service, keep in mind that Google requires sitemap files to be placed in the highest level web directory, that is, http://yoursite.com/. This requires you to move the OAI.exe from \server\docs\cgi-bin\ directory to \server\docs\ directory.

An alternative to the first solution is to add a new site, http://yoursite.com/cgi-bin, to your sitemap account. By doing this the OAI server will be located in the highest level of web directory.

Unfortunately, if you can successfully add your OAI server as a sitemap, you are only half way done. Google stated that it will automatically add query parameters such as ?verb=Identify or ?verb=ListRecords to the server. However, based on our server log analysis, it added only the Identify parameter and never added the ListRecord parameter to its query during the first 4 months or longer of our implementation. This means our data was never actually collected by Google because, in order to collect data from an OAI data provider, Identify parameter is not enough. Service provider would need to query the Data Provider using additional parameters, technically referred to as verbs. Having learned this, we modified our approach for using the OAI server. Instead of submitting just the OAI server (oai.exe) to Google Sitemaps, we provide the whole query string for each collection that is required for a Service Provider to harvest data from a Data Provider. The format of the query string is a long URL, shown at the bottom of this article.*

After completing the above process, do not expect to see an immediate increase in your usage statistics. In our experience, it took two months before we could actually see the results harvested from Service Providers, which we believe translates into usage increase of our Digital Media Repository.

I hope this technical information provides you with insight for using a low-cost way to achieve global access to your digital collections.

For more information, contact P. Budi Wibowo, Ball State University Libraries’ Head of Digital Libraries and Web Services, BWibowo@bsu.edu, (765) 285-8032.

Labels: , , ,

0 Comments:

Post a Comment

<< Home