Monday, December 8, 2008

Dublin Core One-to-One Principle

In Dublin Core metadata schema, the one-to-one principle refers to one metadata description is only for one resource. For instance, description for a digital image of Mona Lisa can not be regarded as same as the original painting. However, in most practices, it's difficult to just make a straight line of it.

When we create metadata to describe a resource, such as a digital image, or an analog object, we need to consider users' requirements. From users perspective, we want to give the information they are looking for; metadata creators should have the capability to identify the key information need. For example, when a metadata creator describes the date of an image of Mona Lisa digitized from an original painting, s/he should think about what users really want to know here. In most case, users are interested in the original date of the painting instead of the image. If metadata creators give the digitization date of the image, it would be less satisfied users' interest.

However, in the above example, if the original created date of the painting is provided in the metdata description instead of the digitization date, it would conflict with the one-to-one principle. Therefore, we need to use our best judgment to create metadata meaningful for users, rather than just follow straight rules and miss the information users need.

Monday, November 24, 2008

RDA Constituency Review

RDA is up for review again. People who are interested in RDA could submit your comments by February 2, 2009. RDA (Resource Description and Access) will be the general guideline for information professionals to describe electronic resources and provide access to online informaton for users; it also facilitates the metadata quality control and sharing metadata between different communities and metadata schemes.

Monday, November 10, 2008

Generating MARC with MarcEdit

It has been a trend to harvest metadata from available online resources. Since OAI was adopted by most of data providers, it has facilitated libraries to share metadata . However, sometimes you probably want to integrate a few websites into the library cataloging database, you could do this easily with MarcEdit.

MarcEdit could process the conversion between MARC and XML metadata, it could do the following transformation:
  • MARC→ Dublin Core XML
  • MARC→ MARCXML
Other conversions could be possible, but the above transformations are commonly used by librarians. Users can also edit those marc records with MarcEdit, and batch load them into your intergrating library system. If users could make use of some macros, the bacth editing will be much easier. People who are intertested in this could look at the sample at Miller Library.

Monday, November 3, 2008

Item Mapper in DSpace

Recently, I got several calls to ask how to use the item mapper in DSpace. The item mapper is used to reduce duplicates of the same record, and create an easy way to link the item record across multiple collections in DSpace. For instance, if a photograph by John Lee is collected in the photograph collection, but it also appears in the collection of Arts Department. Then we can use the item mapper to match the same photograph record in the second collection to avoid reproduce an item record in the second collection.

An item mapper is a convenient tool for users to manage records at the item level. However, when a same creator has multiple work in DSpace, and some of work might appear in various collections, then the item mapper becomes problematic. Currently, users can only use the item mapper by searching author name. As I mentioned, if the author has multiple publications in DSapce, how do users recognize the right publication to map the item record without title information?

This extreme situtation has been less considered by DSpace developers. At this point, if such case happhens, users need to label the item record which needs to be mappred in more than one collections. For example, users can create second co-author to tag the publication, then search the collections by the co-author name to identify the record and map it in the according collection. After your mapping is done, users need to go back the record and delete the co-author from the item record. That is how we can temporarily solve the problem. Nevertheless, We still hope that DSpace developers could improve the item mapper with more combined search features.

Sunday, October 26, 2008

INMAGIC New Knowledge Management Tool

Recently, INMAGIC has announced the new generation knowledge management tool - through social knowledge networks to inspire innovative insight and share knowledge to keep organizational intelligence and transfer implicit knowledge to explicit knowledge.

It seems promising that companies have found an effective and efficient way to keep innate intelligence through the communication within a company social network. The intention of this knowledgenet is to link the existing knowledge repository with different organizational groups to generate a sound solution for a particular business or technical problem. These gourps could be R & D, marketing, sales, decision makers, production, stragetic planning, and legal department. INMAGIC hope that the social knowledge network could play a comprehesive role in information organization, publishing, sharing, creation and collaboration.

This is a new information model built upon Amazon.com. Facing such complicate and diverse types of content, I wonder what makes the search engine distinguished from other knowledge tool. How this tool will make search easier for users to find the required information or link relevant information to the target problems? How does the tool encourage internal users to contribute more to the knowledgenet? That will be interesting to see.

Monday, October 20, 2008

Ranking Terms in a Thesaurus Database

As I have discussed in the previous posting, a thesaurus is very useful tool for users to efficiently search information in a database. The purpose that users look up the thesaurus is to find the right concept for their search. Do users really need a ranking scale to indicate the relevancy of the term they are looking for?

If the ranking makes sense to the users, it would be worthy doing so. For instance, when users search chemicals in databases, such as STN and Dialog, they would prefer to look up the term in the thesaurus first. By looking at the term indexed by the database producer, users would know how to create their search strategy. The ranking scale used by these databases is the number of records linked to the term. A term in a database could be ranked quite differently from it in another database.

If the ranking is based on a word partially matching with the indexing term, it would confuse users. For example, if users get the same ranking scale of different indexing terms because of partial word matching, users would conclude that these indexing terms are all equally relevant. That's not true. Some indexing terms are linked with more relevant records than others, how the thesaurus would help users to decide which term should be used to perform the search?

I would love to see a new systematical ranking system will be adopted in a thesaurus database.

Monday, October 13, 2008

WilsonWeb Thesaurus Database

WilsonWeb is a hybrid database with subjects of humanities, social science, education, business, applied science and technology. The thesaurus database is a very useful tool for searchers. For instance, if you are not sure the term indexed in the database, you can search thesaurus database, then you will get the list for all related subjects and related terms with the number of linked records. This is a preliminary search, it will give you some hints to search WilsonWeb with indexing terms.

However, WilsonWeb covers subjects in different disciplines, the same concept could have different meanings in different domains. It would be a huge amount of works to create a hierachic structure in its thesaurus database, or people should call it taxonomy, which will really narrow down the search terms. It's always a challenge for database producers that what kind of thesaurus should be offered to users. Database producers would think cost and effectiveness are the key to solve this problem.

Today, taxonomy has been a main information technology to make search engine more intelligent. Law firms, R & D, and consulting companies have begun building their own taxonomy to enhance the searchability of web search engines, which greatly saves searchers' time with more relevant search results. The problem most people have today is too much information exists, how are they able to find the needed information? Taxonomy could assist companies to organize information and make information easily searchable for users. SLA website and Askus.com are good examples of web content with the benefit of taxonomy.

Monday, October 6, 2008

Metadata Quality Control

Metadata quality control is becoming more important than ever when we deal with batch import. These problems include various versions of author name, unconventional abbreviations, inappropriate data formats, duplicate records, spelling errors, incomplete punctuations, unrecognized characters etc. To control the quality of metadata, we need systematical way to instantly identify and correct those errors; on the other hand, we also need strengthen metadata creation at the beginning.

Recently, DSpace has released an add-on for metadata quality control, which has powerful mass-edit feature, duplicate detection and resolving algorithms. This is a very promising feature for metadata quality control, at least, some systematical methods could be adopted to solve the current dilemma at the time of submission.

Hopefully, those functions, which are available in traditional Integrating Library System, such as, authority control, control vocabulary etc., will be available in Institutional Repository software.

Sunday, September 28, 2008

Metadata Harvesting with MarcEdit

Last time, when I harvested DSpace metadata with MarcEdit, it only lasted around 3 minutes. From the test on yesterday, it went well even though I only did a small collection.

It is important to ensure consistent metadata and access to those digital collections, especially during the period of senior seminars. A simple and clear processing manual should be written to train staff or student assistants how to create metadata, publish theses and harvest metadata with MarcEdit. I am also interested in harvesting metadata with other software. I hope I can hear some good news from Claude in Montreal soon!

Tuesday, September 2, 2008

Back To the Library World

This is a really busy and nice summer. I am glad back to work even though the beginning of the semester is over crowed on campus.

The other exciting is my NITLE Technology Fellowship started in July. Because of the scheduling, I missed the Technology Fellow Workshop on July 21-23 at Southwestern University in Georgetown, TX. But I was able to make it up in August via MIV (Multipoint Interactive Videoconferencing). I met a couple of NITLE Technology Fellows on MIV. Actually, MIV will be my major classroom in the future. I really get excited about what I should teach there ...

Monday, June 30, 2008

RDA Webcasts by Barbara Tillett

Barbara Tillett, the chief officer of the Cataloging Policy and Support Office, Library of Congress,talks about RDA (Resource Description and Access) in two new webcasts. He introduces the background of RDA development, gives us an overview of the new rules, and addresses the next generation cataloging code designed for the digital environment. Will RDA be published in early 2009?

Title 1: Resource Description and Access: Background / Overview
Title 2: Cataloging Principles and RDA: Resource Description and Access

Monday, June 16, 2008

NITLE DSpace User Community Meeting

NITLE DSpace User Community Meeting was held on June 11-12, at University Peget Sound, Tacoma, WA. All the participating institutions have sent their information professionals at the meeting.

It is a great opportunity for participates to share their best practices and also address their concerns. Topics such as institutional repository, digital collection, metadata creation, DSpace Manakin, marketing DSpace etc. have been fully discussed. Most librarians feel that DSpace provides a great opportunity for institutions to create digital collections and scholarly communication on campus wide, but how to effectively promote it on campus and create value-added collection for research will be the key to succeed. NITLE agrees to continue providing such opportunities to facilitate best practices.

Tuesday, May 27, 2008

Harvesting Metadata with MarcEdit Harvester

Last week, I got a chance to harvest some Dublin Core records from DSpace, which is our institutional repository. When the process have gone for 15 seconds, MarcEdit Harvester stopped. I wonder whether DSpace OAI-PMH setup is compilable with MarcEdit Harvester. As far as I know, except Oregon State University Library, I haven't heard that any institutions have successfully harvested metadata with MarcEdit.

It might need more customizations to comply with local needs. I hope I could hear more from other institutions which directly use MarcEdit to harvest metadata. Some commercial XML harvester software now are available. The function of editable configuration file makes the software very promising.

Sunday, May 18, 2008

Interoperability and Metadata Quality

The interoperability between metadata schemas and institutional repositories is always a big concern since people want to exchange metadata between repositories. Metadata creation is expensive, in order to reuse metadata, institutional repositories start to harvest metadata from other repositories, this not only dramatically reduces the cost, but also facilitates to share knowledge globally.

However, interoperability is never easy to get, even though people have made efforts on it. I would suggest to control metadata quality. If metadata quality is guaranteed, it might take less time for our IT staff to clear up data and smoothly harvest metadata. Although there is no national standards like AACR2 to control metadata creation, metadata creators could use more controlled vocabularies by consulting FAST and database thesauri, and give enough descriptive metadata for users to search.

Controlling metadata quality offers accurate information for users to search and define the information they need. While for information professionals, this will warrant less potential errors and discrepancy, which would be a huge obstacle when we try to reuse the metadata.

Wednesday, May 14, 2008

Harvesting Non-Marc Records with MarcEdit

I am still waiting for the information from NITLE. Recently, MarcEdit has updated some new functions, such as macro engine, which could be used to further improve compliance of Marc records customized for local purpose. As long as I get the message from them, I will post how we do the process.

But in the meantime, I would like to pray for victims in China Earthquake, especially those children and teenage directly affected.

Tuesday, May 13, 2008

Harvesting Non-MARC Records From Institutional Repository

This week, I spend more time on OAI-PMH. I try to figure out what is the best way to harvest the records which we have deposited on DSpace. It seems that some libraries have tried to use software XML Harvester to harvest non-MARC records. This commercial software mostly has been integrated into ILS, and facilitates to harvest and convert Non-MARC records to MARC records.

Free software MarcEdit can also harvest NON-MARC records, but at some point, it has some problems. I am still exploring to make it work at my library. Hopefully, I can post the process later.

Monday, May 5, 2008

Scholarly Publishing vs. Scholarly Communication

Since all kinds of open-source software are available to the public, open access to scholarly publications becomes a hot topic in educational institutions. Sometime, people consider them same, or even equal. I would reserve this view rather than argue with it. For me, I think they are different things, but they might be related each other at some point.

The reason that people argue it is that open access is challenging traditional scholarly publishing, e.g. university press. I would say publications published by university press usually come from both internal and external institutions. They do not only rely on campus publications, otherwise, they won’t survive.

Today, open access makes scholarly online communication possible at free of charge, it will tremendously change the way that scholars communicate historically. Scholars have more options to make their researches shared by the community globally if they hold the copyrights. They can easily reach people who have the same interests at no cost. This scholarly communication will bring innovation to the scholarly community. On the other side, we also need to establish a fair system to control the quality of scholarly publishing and make those peer-reviewed publications available to more researchers at no cost.

Monday, April 28, 2008

Harmonization of Metadata Standards

Once we move into the digital age, we are excited with the direct access to those great and high quality resources. ETDs, digital collection and digital archives are all available to a variety of researchers. The resource creators employ different metadata standards to facilitate the access and display resource in the most appropriate and descriptive way. Now most resources are created with meatadata schema Dublin Core, MODS, METS, EAD and MARC. Because of these various metadata standards, data transformation is becoming a huge challenge for metadata librarians in the digital age.

When I read the report PROLEARN by Network of Excellence in Professional Learning, I feel like we are in the darkness, it seems to me that the harmonization of metadata standards is never globally actualized. As we all know, now the most promising way to harmonize metadata standards is making crosswalks with XSLT among those various metadata schema. But according to the report, we are just like a starting point, even though we are not sure whether what we have done is correct or not, we still have a long way to go. Probably, what we really need is a standard like MARC21, which could be globally used to describe bibliographic information of a publication. Can we get that far? Before we get that far, I believe that creating crosswalks is still a feasible way to solve metadata interoperability.

Monday, April 21, 2008

ETDs Collection on DSpace

Washington College started to create digital collection of ETDs in 2007, now all Senior Capstone Experiences are available on DSpace. But senior students still do not submit their theses directly to DSpace, instead, they have to submit theses to blackboard or the library by email. Then the Metadata Librarian has to create metadata and deposit theses on DSpace one by one. I think this is time-consuming and very low efficient. Since the workflow of creating metadata for theses could be customized, so I can design a new default workflow for senior students to directly submit their theses to DSpace. This is the purpose that we choose DSpace as our digital repository originally.

Some faculty might question whether undergraduate students could deposit theses on their own. I would like to trust them. Through step-by-step guide and elaborately designed workflow, senior students will make it correctly and efficiently. Probably, I should work on designing such new default workflow model first, and then I will teach them how to use it. I wish we can make this happen by 2009.

Monday, April 14, 2008

Miller Library Presented Digital Collection of ETDs at CALD Annual Spring Meeting 2008

On April 11, 2008, College Librarian Dr. Ruth Shoge presented Washington College Digital Repository DSpace at CALD (Congress of Academic Library Directors of Maryland) Annual Spring Meeting 2008.

The Congress of Academic Library Directors (CALD) of Maryland is an organization open to library directors in post-secondary institutions in Maryland. Dr. Shoge made a presentation on “ DSpace, a place for senior thesis.” This year, CALD Annual Spring Meeting focuses on digitization and Maryland library digital projects. Presenters from University of Maryland, Towson University and Johns Hopkins University also shared their digital projects.

Dr. Shoge presented the rationale that we chose DSpace as our DdigiTool, and how the librarians made the collection of ETDs available to college community. This is first time Miller Library officially publicizes our digital collection at academic library community.

Monday, April 7, 2008

Metadata In DSpace

DSpace uses Dublin Core metadata schema, which includes 15 elements and some qualifiers to adapt to the library implementation profile. DSpace supports OAI (Open Archives Initiative’s Protocol) to provide metadata harvesting as a data provider, which is crucial for Open Archive Initiative.

Dspace could export metadata in DCXML file, now the team is working on migrating the export capability to use the METS standard, which will facilitate to exchange digital library objects between repositories. Basically, DSpace could be customized to allow interoperation with other library or document systems for auto-depositing in DSpace. With this capability, I try to establish the export capability for transferring metadata between dspace and our ILS, then our library users only need one search on library catalog to locate the information in our digital collection, which will definitely improve the search effectiveness and efficiency.

Tuesday, April 1, 2008

Building A Digital Repository, Part IV Publicizing Digital Collection

Good product needs to be promoted in an effective way, especially the collection created by the library. We decide to publicize the digital collection on campus wide and beyond campus. Firstly, we try to join consortium user group meetings to exchange the experience from what I have done. On the other side, I have to convince faculty that this is convenient, accurate and easy-access resource, and students will initially taste what it looks like when they have scholarly publications on DSpace.

I know we will have a long way to go, but we are going to employ campus newspaper ELM, presentations and library instruction to publicize the digital collection. In mean time, I will make the digital collection available on our ILS, so users will eventually use it no matter where they will access the collection

Building A Digital Repository, Part IV Publicizing Digital Collection

Good product needs to be promoted in an effective way, especially the collection created by the library. We publicize the digital collection on campus wide and beyond the campus. Firstly, we try to join consortium user group meeting to exchange the experience from what we have done. On the other side, I have to convince faculty that this is convenient, accurate and easy-access resource, students will initially taste their
Once we have digital collection on DSpace, we start to promote it. We use campus newspaper ELM, presentations and library instruction to publicize DSpace. We want to faculty and students could actively communicate on DSpace, and facilitate faculty to employ DSpace to manage their scholar research

Monday, March 24, 2008

Building A Digital Repository, Part III Metadata Creation

Our first project is Senior Capstone Experience. Metadata creation will start from senior theses. Firstly, we need convert WORD into PDF, which will be read only. We request that all the papers should be submitted with clear title, author full name, optional abstract and keyword, which will be ready to create metadata.

The default metadata scheme on DSpace is Dublin Core. We create and customize metadata according to the feature for each paper. Some papers provides more bibliographic information, such as table of contents, extra contributor to their paper, e.g. advisor, non-control vocabularies for keywords etc., we try to ingest more bibliographic information to our metadata creation, and create more access entries for users. At same time, it also means extra work for us, we need validate subject heading offered by authors. Theses which cross disciplinary will be linked to all related disciplines to ensure users can find them at any related departments. With medatada creation for each record, now users can search them.

Monday, March 17, 2008

Building A Digital Repository, Part II Access Control

When we talk about access control of your digital collection, some users might not be happy with it. Open access to scholarly publication is an ideal, but when you have to consider copyright issue for your institution, then a limited access control will give you such flexibility to make your digital collection available to authorized users.

Senior Capstone Experience is the first collection we put up on DSpace, most of them are unpublished papers, so we create the collection with different levels of privileges. Public have open access to search / browse the collection and view bibliographic information, but only authorized Washington College users can download the actual documents.

For other digital collection, such as digital archives, we provide open access to them, public can access them anytime, but cautiously using the collection. You can always control the access at the level of subcommunity or collection, which will make your pilot digital project easier at the preliminary exploration.

Monday, March 10, 2008

Building A Digital Repository, Part I Information Structure

Create information architecture is first thing you should consider before you start. A good information structure is easy for users to navigate and browse information on your site, otherwise, it really becomes an irregular and crowed repository without organizing and sorting, and eventually, with the rapid growth of digital content, users find nothing because of malfunction.

My best practice for our digital repository on DSpace is:

  • decide what you want to deposit at your digital repository. You should have your collection policy to help you make decision on acquiring original materials
  • create hierarchy information structure at community and subcommunity level by discipline, which eases users to browse collection
  • arrange collection by subjects or special topics, give subcommunities more flexibility to create and manage collection
Once you have decided how you are going to do with the information structure, you can start to arrange your collection.

Thursday, March 6, 2008

Building A Digital Repository

Building a digital repository seems a big thing to most of people, but you can always start with something small, concrete and under your control.

We are at a small liberal arts college, apparently, we want to effectively use all resources and provide better services and products for faculty and students. When we started to build the digital repository, we thought we not only provide a virtual place for digital collection, but also create a platform for our faculty and students to demonstrate their works and talents, share their views and promote their academic achievements. What we want to do here is to give each department flexibility to build their own community and collection for each specific disciplinary purpose, eventually, it will be interactive virtual community, which promotes research and scholarship, ultimately, students will have open access to electronic publications for their scholarly communication.

Wednesday, March 5, 2008

Electronic Publishing

Each year, each educational institution generates a certain amount of publications, such as newsletter, periodical and newspaper. Can we have open access to electronic publications and further encourage students to start scholarly communication on campus wide? The key here is students can have effective learning, and they will benefit from their preliminary scholarly achievement. Is this the objective of education?

This could be actualized on DSpace. In the virtual community, students experience a new journey to create knowledge from what they learn. The benefit of publishing, managing and editing digital collection will bring a motivation for campus scholarship and original productions.

Monday, March 3, 2008

Teaching and Learning with Multimedia

Today multimedia are heavily used in teaching and learning, multimedia bring new interactive experience of teaching and learning for both faculty and students. Our digital repository-DSpace can manage multimedia in the way you prefer . It gives you flexibility to design the collection with different access privileges. You can have audio and video files in mp3 or mp4. The most convenient is users can search or browse the collection by subject.

On the college digital repository, faculty and students can play with images, text, audio and video. The collection they build up there will benefit teaching and learning through resource sharing. For example, art project could be easily collected on DSpace via images; students of music and drama can shoot their concertos and drama productions lively to make them available on DSpace, and share their senior capstone experience with audiences all over the world. Our faculty can shoot their museum tours and deposit them on DSpace to share with the class. These collections are created by faculty and students, it will encourage faculty and students to teach and learn in more interactive way, bring more inflows and spark their ideas. The electronic publishing on campus wide will promote open access to resources that students and faculty create, and will absolutely encourage interactive teaching and learning, and eventually, impact on scholarly communication.


Thursday, February 28, 2008

Digital Collection

Archives usually record college history and heritage, which is a valuable source for research and teaching. Digitizing archives is not only a good way to promote college history and culture, but also an effective means to maximize resource sharing globally. We'd better carefully preserve digital collection if we want more people to share the knowledge. Preservation of digital collection is a big challenge today. DSpace can help you handle your concerns at your lowest cost.

DSpace is able to manage images with various types of files, such as JPEG, MPEG, TIFF. Users can preview a thumbnail image when reading metadata. Digitizing archives is a good news for those who usually read archival documents in reading room. Once the digital collection are online available, they could access archives anywhere and anytime. The digital archives can bring you a spectacular experience, establish a stable and valuable source, and trace new findings in your most interesting fields.

Monday, February 25, 2008

Publishing on DSpace

Scholarship is one of the three pillars (teaching, scholarship and service) in college community. Faculty could put up their published and unpublished works on DSpace with control of rights to use. On one hand, it is convenient for faculty to link required reading list with their works, and make collections online searchable, which is an effective way to publicize themselves globally; on the other side, faculty can effectively demonstrate their publications in the class as they need.

Once you set up this connection, it will be well worthy to build up an easily navigated information architecture on DSpace. Faculty can access the collection anytime to trace research progress and interests, and also be reached by other researchers and audience to share their thoughts anytime and anywhere. DSpace exposes your research projects world wide and promote your scholarly publishing more than ever.

Friday, February 22, 2008

Sharing Senior Capstone Experience on DSpace

In summer 2007, we ended the history of collecting print senior theses, instead, senior students at Washington College submitted their electronic copies to Miller Library. This is a big change for both students and faculty. However, this change also incites their creation and my passion to make them online searchable as soon as I can.

We have a great team at Miller Library, we convert various files to PDF firstly. Then I start to create the information architecture on DSpace to hold all kinds of digital objects. Creating metadata for theses is a long process, but also very crucial for users to be able to search the collection online. It is a big challenge for me to validate keywords assigned by author for every single paper, especially without auto-indexing tool, but the process itself is full of fun!

Now you can search SCE collection by author, title, subject and keyword. Professors can demonstrate students with actual theses on DSpace in the class and share their analysis. DSpace is becoming a virtual community for students to conduct interactive learning and research. Is that mazing that technology is changing the world every second?

Sunday, February 17, 2008

Teaching and Learning on DSpace

Digital library has become a popular virtual community for most educational institutions. Faculty and students start to create additional connection with digital library. Except collecting digital objects, a variety of contents can be created by faculty and students to support teaching and learning, to share different thoughts and promote interactive and effective learning. DSpace can help library manage culture heritage, intellectual properties and archives for the institution. In mean time, we need to find a systematical and efficient way to mange these metadata, which will facilitate users easily search and browse the digital collection. DSpace is open source software, which will capture digital objects in any format (text, video, audio, and data), and distribute them over the web. It indexes your work, so users can search and retrieve your items. It preserves your digital work over the long term1. DSpace is affordable and easily customized, it supports browse and could be updated simultaneously. You can manage your different digital objects at community and collection level, you can control how to share and maintain your digital collection.

Teaching and learning on DSpace will give faculty and students greater flexibility and creativity in the community. People with common learning objective share their ideas and works to communicate and leverage their knowledge. DSpace is absolutely a feasible and effective tool to target their education purpose for both faculty and students.