Vol. 39, No. 1, April 2010
Printer friendly version of Newsletter
Searching Smartly in the HistSciTechMed Database
By Stephen Weldon, Kim Rudolph, and Sam Spence
Although most of us are now able to find our way around online resources with at least a little facility, few of us are able to do this with the precision and skill that makes us feel satisfied that we have done it well. The problem is compounded because every database and every search engine has different quirks and protocols. As the Isis bibliographer, I have come to realize that few people understand enough about this particular resource in its online form (the HistSciTechMed database, or HSTM) to take advantage of the many useful subject indexing features it offers.
Quick Links....
Notes from the Inside
-------------------------------------
News
-------------------------------------
Member News
-------------------------------------
Disturbingly Historical: Reinventing a Museum
-------------------------------------
Teaching Tricks
-------------------------------------
Program Profile, Georgia Institute of Technology
-------------------------------------
Feeding a War: Q & A with Daniel Ragussis
-------------------------------------
Honoring Scientists with Stamps
-------------------------------------
Searching Smartly in the HistSciTechMed Database
-------------------------------------
Jobs, Conferences, Grants
This article by me and my two graduate assistants, Kim Rudolph and Sam Spence, will explain how to get the most out of your searches when you use the HistSci-TechMed database. Most of you will be familiar with this database, as it contains the digital version of the annual Isis Bibliography. As you also likely know, it contains the citation data from three other bibliographies as well. What we say here will be most relevant to finding information submitted by Isis, but some of the suggestions will be useful for doing global searches. (1)
Part I: The structure of Isis CB subject indexing
The citations in the Isis Bibliography are indexed in two ways. They are organized according to a scheme that places each work in one location according to a classification system originally developed by George Sarton and modified several times over the past one hundred years. Moreover, they are indexed with subject terms based on a thesaurus that is expanded as necessary to accommodate new subject matter. Anyone doing subject-based searching in the bibliography, whether in the print or in the electronic database, will benefit from knowing the basic construction of the classification and index terms and how to use them for effective searching.
Sarton's original classification system derived from his understanding of history of science as a discipline based fundamentally on time period (which sometimes included geographical or "ethnographical" aspects as well) and scientific discipline. He characterized these two classification modes as horizontal and vertical, a horizontal view being a broad multi-disciplinary effort to understand science in a single period or culture, and a vertical view being a narrower focus on a single scientific discipline as it developed across time. This horizontal and vertical access to the historical literature remains a consistent element throughout the classification systems of the Isis bibliographies from Sarton's time to today. Revisions that I made in the structure of the bibliography since 2002 have rearranged significant parts of the system, but the period-plus-discipline framework remains the fundamental structural feature of the bibliography even today. (See figures 1 and 2.)
The creation of an indexing system that would supplement the classification framework was an innovation introduced by Magda Whitrow when she worked on the first cumulative bibliography spanning the period from 1913 to 1965. Her indexing system came directly from a faceted classification structure that she employed for placing citations with greater precision in this extremely large printed cumulation. By creating a much more detailed classification system, she was able to make fine distinctions in otherwise broad topical areas (so that instead of merely classifying a work as about "geology," for example, she could tag it as "mineralogy" or even "gem stones"). Whitrow advanced the system in another way as well. Not only did she increase the precision of existing horizontal and vertical classification forms, she also introduced a detailed vocabulary to describe wholly new features of the works: aspects of scientific organization, historical analysis, and bibliographical form, such as "privately sponsored," "freedom and secrecy," "methods of communication," and "archives; manuscripts."
John Neu, the University of Wisconsin librarian who edited the Isis CB for over thirty years, employed the Whitrow category system as the basis for his subject indexing in the HistSciTechMed database. The detailed terminology proved ideal for this, although there were some disadvantages. Developed for classification, the Whitrow system concatenated many terms that made it function less precisely for a database index. For example, when the single subject phrase "North America: United States; Canada" is used to tag citations, users have no way of separating those entries dealing with Canada, from those on North America or the United States. To solve this problem, I have continued to use Whitrow's terminology, but have frequently broken the phrases into discrete units: "North America," "United States," and "Canada." In addition, I freely add new terminology to the thesaurus as needed, and I don't hesitate to use multiple chronological, geographical, and disciplinary terms for a single entry-something which was impossible as long as the Whitrow terminology was tied to a single-entry classification structure.
Part II: Practical search techniques in the Hist-SciTechMed database
The indexing system of the HistSciTechMed database makes it possible to utilize both the standard annual classification scheme as well as the index terms in the thesaurus for discovery. All of the items published since 2000 now have subject terms from both the annual print-volume classification and Whitrow-based thesaurus index terms. (See figure 3.) This gives researchers a good bit of flexibility in subject searching, allowing both narrowly focused subject searches as well as broad category listings. By combining the two in ingenious ways, researchers can perform a variety of specialized searches.
The advantage of utilizing both of these features, the classification system and the thesaurus, is that scholars can find their way into the literature using search strategies that are designed to accommodate more types of research projects. The data continues to provide access according to horizontal and vertical categories of chronology and discipline, but many other kinds of searches are possible. By analogy one can think of these other ways of searching through the data as diagonal methods. The topic area of science and religion, for example, spans many disciplines and time periods, cutting through those two fundamental structures. The citations in the database added after 2002 are indexed with such diagonal strategies in mind.
Understanding the OCLC terminology
To understand the indexes in HistSciTechMed, let's first examine a typical record. Looking at figure 4, you can see how records utilize the subject tagging that we've been talking about. You'll notice that there are several types of subjects listed in this record, most of which can be searched separately (time, the main exception, cannot). OCLC has seven types of indexes accessible. They are subject, identifier, descriptor, geographic name, named corporation, named person, and time. (See figure 5.)
Of these indexes, we will focus on three: subject, identifier, and descriptor. The subject index is the metaindex, and it includes all of the other fields within it. You will want to use this field for general searching, when you do not need or desire to differentiate types of terms.
The identifier index is the one Isis data uses for the print classification structure. The identifiers include only Isis CB classification terms and none of the other thesaurus index terms. This means that the Hist-SciTechMed identifiers correlate closely, though not exactly, to the classification headings in most of the print bibliographies. The current classification system is shown in figure 2; this list differs slightly from those found on back covers of recent bibliographies.
The descriptor index is the one in which incorporates the Isis subject thesaurus terms. By selecting either identifier or descriptor, you can perform the more advanced searching described here. The important distinction to remember is the one between the descriptor and the identifier fields.
Replicating the annual printed CB categories in the online database
Depending upon your research project, you may find it helpful to replicate the print categories of the CB. Let's say your area of interest is early modern chemistry and you find the "Chemistry-17th Century" section of the printed bibliography to be the most helpful to you. You can recreate this category through a search using the identifier field.
To perform a search of this sort, you must go to the advanced search window and enter terms as found on the list in figure 2. In our example, we enter "17th century" in the first box, and then "chemistry" in the second box. The dropdown box next to these terms should be marked as identifier. (See figure 6.) Perform the search, and you will find all Isis results for "Chemistry-17th Century."
Replicating the category search has advantages and disadvantages. A second example will show you some of the limitations of this kind of search. Let's say you are interested in East Asian medicine. In order to replicate this Isis category exactly, first find the exact category terms: "Asian cultures-Medical sciences, general works." Following the procedure laid out above, you would enter "Asian cultures" into the first box and "medical sciences, general works" into the second box, selecting identifier for both. Upon performing the search, you will find all Isis bibliography results for this category. In this case, however, it is a disappointingly small search result.
The main advantage to this form of searching is its familiarity to print users who find the print classification valuable for discovery in their particular field. The disadvantages are that searches of this kind currently omit records classified prior to 2000 (this ought to change soon however), and that these general category searches tend to be imprecise.
Using the subject index to do more refined searching
An alternative means of searching by index terms will produce more specific results. Using the subject and descriptor indexes allows both greater precision when doing standard horizontal (time-bounded) and vertical (discipline-bounded) searches and makes various kinds of diagonal searching possible. Searching by subject is a far better and more customizable tool. Below, we will illustrate how you can use and customize subject searches to find results that you may not have found otherwise.
Effective subject and descriptor searching is somewhat more complicated than identifier searching because finding the correct search terms and learning how to combine them appropriately takes more work than identifier searching. There are three main ways to locate search terms: scanning the index of the print bibliography, exploratory searching, and using the Related Subjects button.
(1) Scanning the index of the print bibliography is still a good place to start if there are any index terms you find yourself using often. The Isis CB Web site also has a list of the subject terms which can be downloaded. (http://www.ou.edu/cas/hsci/isis/ website/thesaurus/index.html). By looking at this list carefully one can discover the precise terminology that is used for classification. Sometimes there are patterns that might be helpful in searching. The thesaurus contains a number of parallel terms, for instance, that reflect similar topic areas in different disciplines. The term "science and war," parallels both "technology and war," and "medicine and war." Understanding the nature of this thesaurus, thus, can dramatically help with either more precise or more comprehensive discovery.
(2) An exploratory search using keywords is another method of finding relevant search terms. Here is an example where the most relevant subject terms may not be the ones that immediately come to mind. The first step in an exploratory search is doing a keyword search using the terms you think most appropriate. Let's assume that you are interested in finding material on science in Russia during the Cold War. If we limit our search only to Isis records after 2000, a quick keyword search of cold war and Russia yields only nine records. (See figure 7.) Judging by the number of results, you can immediately tell that this search is not getting all of the records that you want. It turns out that the term Cold War is not a commonly used index term. The most commonly shared terms are "20th century," "20th century, late," "Russia," and "Soviet Union." (2) (See figure 8.)
(3) Using the Related Subjects button is a third very useful method for identifying related terms. This button is on the top right hand side of the search results screen. (See figure 7.) When you are looking at your search result, this button will take you to a list with all of the index terms of all of the records in the found set, ranked according to the percentage of records in your found set tagged with that term. (See figure 9.) Using this screen may help you identify terms you did not expect to be associated with your search. (Discovering subjects through exploratory searches and the Related Subjects button will work with all four databases included in HistSciTechMed.)
After identifying the most relevant search terms, the actual search process is relatively simple, the choice you have will be determining whether to search by subject or by descriptor. The broadest type of search and probably the one you'll want to do most often is a subject search because it includes all of the subject indexes. To do this, make sure the drop down box next the each search field is set to subject (3) (see figure 10) and use the Boolean operators as desired. (OCLC has a guide to Boolean operators in its help section.) If you want to search a particular phrase such as "science and war" you should put quotation marks around the phrase; otherwise the search will return all terms that have both science and war anywhere in one of the descriptor fields, not just those items listed as "science and war." In Figure 11 we have used "20th century, late" and "Soviet or Russia" in the second box, both searched as subjects. This now produces a much longer list of terms that deal with Cold War Russian science. (See figure 11.)
Although using the subject index is easy, there may be times in which using the specific indexes such as descriptor, geographic name phrase, or the like will be advantageous. Let us assume that you want to find sources that deal with science and literature. The print category "Science and literature; science and art" includes literature, but neither a subject search nor an identifier search will work well for this search, because both will result in far too many unwanted records. In this case, using the descriptor field will to allow you to focus on "science and literature" alone. (See figure 12.)
The most complex type of search you can do will allow you to combine searches within the classification categories and thesaurus terms. By using the identifier index to do a category-based search for the time period we can isolate items with a primary focus on a time period. Let's take the example of medieval traditional medicine. If we run an identifier search using "medieval" and a descriptor (or subject) search for "medicine, traditional," we will find about 4 records. (See figure 13.) So, too, for a search of works specifically on eugenics in the 19th century. Using the index term "eugenics" and the classification category of "19th century" we can use the descriptor and identifier fields, respectively, to yield results that focus on this period. (See figure 14.)
A subject search of the period in each case would have returned more results, but these would have included works with a much broader chronological range. In the eugenics example, we wanted to exclude the mass of records dealing with 20th-century eugenics. This sort of combination is just not possible in the print version.
Useful Hints and Tips for Working with your Search Results
The Limit button, which can be found in the top row of buttons on the results screen (see figure 7 or figure 11) may provide some useful refining tools for your searches. (See figure 15.)
- Limiting by Year. For example, if you wish to set a date range on your results, select Limit and then "Limit by Year." (Figure 16.) This function will allow you to enter a date range and also presents a useful list that relates the year published to the frequency of your results. By using this function you can see trends on the rate of publishing on your subject.
- Limiting by Document Type Phrase will limit your results to the medium of your choosing: Journal article, Book review, Chapter, Monograph, Serial.
- Limiting by Author will show the frequency of authors who have written about this subject—this is a useful tool to find authors who are publishing about your area of interest, and in what frequency.
- Limiting by Subject Heading is especially useful. By clicking this option, you will see a list of subjects that occur frequently with the subject you searched.
You can search these related terms by selecting them and clicking Search at the top, and it will search for the additional terms plus the original term using the AND operator. Note that this function can also be reached by clicking Related Terms on the results page.
All of these Limit functions can be used simultaneously to produce a very specific, very refined result. Note that many of these functions can be set before a search is done using the Advanced Search feature. However, you may find you prefer working with the results rather than setting limits on the search before you obtain results. (Subject Heading and Author feature are not available before your search.)
One word of caution regarding author and title searches. If you are doing a simple author or title search, it is best to use the Basic Search screen, (see figure 17) which is relatively self explanatory; however, there are a few points to note. Of the three search fields (keyword, title, and author), do not use keyword if you are searching for a specific author. Neither order of names nor punctuation matters in basic searches; all of these searches find all words irrespective of their location in a field.
Figures
Download the figures for the article (PDF)
Footnotes:
- One word of caution before we begin: The HISTSCITECHMED database has data formatted and submitted by different bibliographers, which means that certain types of searches will only work with some of the data. Even the format of the Isis data has changed over the years. This means that you will need to employ different search tactics to find data that includes citations from both the earlier submissions and the more recent ones and for data submitted by other users. This article will focus mostly on extracting information from recent Isis data, from 2000 to the present. In future articles we hope to provide suggestions on integrating searches across the database. back to text
- Notice that the terms "Russia" and "Soviet Union" appear both together and separately. Because of this, we want to find records that include either "Russia" or "Soviet Union" or both. The Boolean search operator "OR" performs this function. A subject search of "Russia" OR "Soviet Union" will return all the records that have at least on of those identifiers. This search returned 312 results—a much larger number than our previous result of 8. Remember, though, that we are interested particularly in Cold War science in Russia, so we need to limit our results to that time period. For this example, let's use "20th century, late" to find material on Russian science after WWII. If you look to the left side of the search boxes on the advanced search screen, you will see a drop down box with the options "and," "or," and "not." Using these boxes we can refine the parameters of our search. We have already determined that we want to use "Russia" OR "Soviet Union"; now, to limit our search we should add AND "20th century, late". back to text
- Note that you should not use "subject phrase." back to text
