Metadata Librarian Experience

Monday, December 8, 2008

Dublin Core One-to-One Principle

In Dublin Core metadata schema, the one-to-one principle refers to one metadata description is only for one resource. For instance, description for a digital image of Mona Lisa can not be regarded as same as the original painting. However, in most practices, it's difficult to just make a straight line of it.

When we create metadata to describe a resource, such as a digital image, or an analog object, we need to consider users' requirements. From users perspective, we want to give the information they are looking for; metadata creators should have the capability to identify the key information need. For example, when a metadata creator describes the date of an image of Mona Lisa digitized from an original painting, s/he should think about what users really want to know here. In most case, users are interested in the original date of the painting instead of the image. If metadata creators give the digitization date of the image, it would be less satisfied users' interest.

However, in the above example, if the original created date of the painting is provided in the metdata description instead of the digitization date, it would conflict with the one-to-one principle. Therefore, we need to use our best judgment to create metadata meaningful for users, rather than just follow straight rules and miss the information users need.

Monday, November 24, 2008

RDA Constituency Review

RDA is up for review again. People who are interested in RDA could submit your comments by February 2, 2009. RDA (Resource Description and Access) will be the general guideline for information professionals to describe electronic resources and provide access to online informaton for users; it also facilitates the metadata quality control and sharing metadata between different communities and metadata schemes.

Monday, November 10, 2008

Generating MARC with MarcEdit

It has been a trend to harvest metadata from available online resources. Since OAI was adopted by most of data providers, it has facilitated libraries to share metadata . However, sometimes you probably want to integrate a few websites into the library cataloging database, you could do this easily with MarcEdit.

MarcEdit could process the conversion between MARC and XML metadata, it could do the following transformation:

MARC→ Dublin Core XML
MARC→ MARCXML

Other conversions could be possible, but the above transformations are commonly used by librarians. Users can also edit those marc records with MarcEdit, and batch load them into your intergrating library system. If users could make use of some macros, the bacth editing will be much easier. People who are intertested in this could look at the sample at Miller Library.

Monday, November 3, 2008

Item Mapper in DSpace

Recently, I got several calls to ask how to use the item mapper in DSpace. The item mapper is used to reduce duplicates of the same record, and create an easy way to link the item record across multiple collections in DSpace. For instance, if a photograph by John Lee is collected in the photograph collection, but it also appears in the collection of Arts Department. Then we can use the item mapper to match the same photograph record in the second collection to avoid reproduce an item record in the second collection.

An item mapper is a convenient tool for users to manage records at the item level. However, when a same creator has multiple work in DSpace, and some of work might appear in various collections, then the item mapper becomes problematic. Currently, users can only use the item mapper by searching author name. As I mentioned, if the author has multiple publications in DSapce, how do users recognize the right publication to map the item record without title information?

This extreme situtation has been less considered by DSpace developers. At this point, if such case happhens, users need to label the item record which needs to be mappred in more than one collections. For example, users can create second co-author to tag the publication, then search the collections by the co-author name to identify the record and map it in the according collection. After your mapping is done, users need to go back the record and delete the co-author from the item record. That is how we can temporarily solve the problem. Nevertheless, We still hope that DSpace developers could improve the item mapper with more combined search features.

Sunday, October 26, 2008

INMAGIC New Knowledge Management Tool

Recently, INMAGIC has announced the new generation knowledge management tool - through social knowledge networks to inspire innovative insight and share knowledge to keep organizational intelligence and transfer implicit knowledge to explicit knowledge.

It seems promising that companies have found an effective and efficient way to keep innate intelligence through the communication within a company social network. The intention of this knowledgenet is to link the existing knowledge repository with different organizational groups to generate a sound solution for a particular business or technical problem. These gourps could be R & D, marketing, sales, decision makers, production, stragetic planning, and legal department. INMAGIC hope that the social knowledge network could play a comprehesive role in information organization, publishing, sharing, creation and collaboration.

This is a new information model built upon Amazon.com. Facing such complicate and diverse types of content, I wonder what makes the search engine distinguished from other knowledge tool. How this tool will make search easier for users to find the required information or link relevant information to the target problems? How does the tool encourage internal users to contribute more to the knowledgenet? That will be interesting to see.

Monday, October 20, 2008

Ranking Terms in a Thesaurus Database

As I have discussed in the previous posting, a thesaurus is very useful tool for users to efficiently search information in a database. The purpose that users look up the thesaurus is to find the right concept for their search. Do users really need a ranking scale to indicate the relevancy of the term they are looking for?

If the ranking makes sense to the users, it would be worthy doing so. For instance, when users search chemicals in databases, such as STN and Dialog, they would prefer to look up the term in the thesaurus first. By looking at the term indexed by the database producer, users would know how to create their search strategy. The ranking scale used by these databases is the number of records linked to the term. A term in a database could be ranked quite differently from it in another database.

If the ranking is based on a word partially matching with the indexing term, it would confuse users. For example, if users get the same ranking scale of different indexing terms because of partial word matching, users would conclude that these indexing terms are all equally relevant. That's not true. Some indexing terms are linked with more relevant records than others, how the thesaurus would help users to decide which term should be used to perform the search?

I would love to see a new systematical ranking system will be adopted in a thesaurus database.

Monday, October 13, 2008

WilsonWeb Thesaurus Database

WilsonWeb is a hybrid database with subjects of humanities, social science, education, business, applied science and technology. The thesaurus database is a very useful tool for searchers. For instance, if you are not sure the term indexed in the database, you can search thesaurus database, then you will get the list for all related subjects and related terms with the number of linked records. This is a preliminary search, it will give you some hints to search WilsonWeb with indexing terms.

However, WilsonWeb covers subjects in different disciplines, the same concept could have different meanings in different domains. It would be a huge amount of works to create a hierachic structure in its thesaurus database, or people should call it taxonomy, which will really narrow down the search terms. It's always a challenge for database producers that what kind of thesaurus should be offered to users. Database producers would think cost and effectiveness are the key to solve this problem.

Today, taxonomy has been a main information technology to make search engine more intelligent. Law firms, R & D, and consulting companies have begun building their own taxonomy to enhance the searchability of web search engines, which greatly saves searchers' time with more relevant search results. The problem most people have today is too much information exists, how are they able to find the needed information? Taxonomy could assist companies to organize information and make information easily searchable for users. SLA website and Askus.com are good examples of web content with the benefit of taxonomy.