CORK Logo


powerpoint presentations
CORK database search
resource materials
bibliographies
clinical tools
user services
newsletters
about cork
home

 


Database Searching: The Whys and Hows





The Whys


The ability to search databases has become an essential skill. If you can do e-mail and surf the web, you can also learn how to search databases.

Half-life of knowledge is less than 8 years. As a consequence, half of today's knowledge will be out-of-date in less than a decade. Also, half of what one will need to know 8 years from now is not currently available. Thus, the ability to access information becomes critical. Increasingly, it is a truism that it's not what you know, but what you are able to find out that is the critical factor.

Demise of card catalogs. A substantial number of library systems have retired and physically removed their card catalogs. Library patrons now have to use the library's computer-based online system to find materials.

Nature of substance abuse literature. The substance abuse field is distinct in many ways. It contrasts sharply with other more established biomedical or human service fields. For one, the literature is very dispersed. This means there are not a handful of journals which if browsed regularly will assure your being up-to-date. There are several reasons for this. One important one is that the substance abuse field is relatively new. An important characteristic of the literature of 'newer' fields of inquiry is that it is dispersed over many journals. Over time, inevitably some of these journals disappear or consolidate. In the substance abuse field this "shake-down," which will result in a limited number of journals surviving over the long haul, has not yet occurred.

In addition, the field is one which crosses many disciplines. Therefore, the relevant literature is found in the journals of the basic biomedical sciences, the medical clinical sciences, the social sciences (including sociology, psychology, anthropology, economics, and history), as well as law journals, applied journals such as those devoted to evaluation studies, as well as the specialty substance abuse journals. Even if one faithfully keeps up with journals in any one field, when it comes to substance abuse, it inevitably means missing those things that are published in other literatures.

One way of appreciating this fact is to examine the sources of journal articles in the CORK bibliographic database of substance abuse information. Of approximately 52,000 items (covering a period from the late 1970s to the present), 48,500 represent the journal literature. These articles are drawn from over 1,450 different journals. The single most frequently occurring journal represents less than 6% of the database's journal article entries.

The substance abuse field is also unique in that it has a large proportion of what librarians term "fugitive literature." This includes foundation and government reports the types of materials that are not indexed in the standard databases, which typically are restricted to journal literature.

Authors' publication preferences also account for a portion of the dispersion of the substance abuse literature. Authors frequently will opt for a non-substance abuse journal in order to address their work to the largest possible audience. As a consequence, much of the most significant work lies in journals outside of the substance abuse field.

The Hows

The following are some practical suggestions to gain skill in database searching.

Play around. To use an analogy, no one learned the multiplication tables on the basis of a hand-out to which they later referred. It was repetition, practice!

The process of "playing around" is essential in many respects. You learn the basic computer commands. You will get a feel for how databases are organized. You will stumble across approaches that work for you. You will catch on to the strategies for doing effective searches efficiently.

There's also something to be said for learning one particular database well, before branching out to others. The skills acquired in become familiar with one will be transferable elsewhere.

One of the most important skills to learn is boolean searching. This means including and excluding terms within the same search. If the computer system allows it, the easiest way to accomplish this involves a bit of "cheating," i.e., doing several different simple searches, and then selecting from among those, rather than trying to get everything correctly into a single command.

Beyond trial and error. While a certain amount of "do-it-yourself" is essential, there is a point at which having a tutorial with someone is helpful. Many libraries offer mini-courses. Or, you can sit down and do a search with someone more skilled than you. It's a quick way to learn some of the basic approaches and short cuts.

Get a copy of any documentation. If there's a database that you anticipate using, get a copy of any documentation available. Some databases provide the option of actually downloading these background documents. Typically there will be two types.

One is the instructions explaining how a particular system works technically. For example, it describes the basic commands. The other kind of documentation is related to the content of a particular database. This kind of documentation will describe what is in the collection and what is deliberately excluded. Also most databases have a list of subject terms that are used to index materials, referred to as the "Thesaurus." Another type of documentation, outlines the indexing rationale and is termed "Scope Notes." For example, what does a particular term mean and, equally important, which term is used if there are a number of possible synonyms. Thus, it also denotes terms that are not used. One could potentially categorize "accidents" in any number of ways: as "trauma" or "emergency medicine," as well as "accident." Similarly one might use "tobacco," or "nicotine," or "cigarettes," or "smoking." The Scope Notes indicate how these potential synonyms are being handled. Beyond briefly defining the terms used, just as importantly for those terms not being used, there is an entry "SEE: ____" to point out the term for that concept. Ultimately the key to doing an effective search is being able to think like the person who categorized the materials in the first place.

Identify the database content. The nature of the technology sometimes tempts one to believe that the system must miraculously know about everything. It is often natural to presume that if one doesn't find something, that it must therefore not exist.

Every database has things it includes and things it doesn't. There are also differences in the priority for entry of different kinds of materials and whether material is entered "cover-to-cover" or selectively. Thus, databases differ in how inclusive they are and how current they are in respect to different topics or journal sources.

Medline, the database produced by the National Library of Medicine directed to the medical sciences, is instructive in this regard. The file is enormous; there are 3 million citations for just the past five years. However, it only includes a set number of journals that are defined as biomedical and medical clinical sciences. Therefore, there is a large literature, from social sciences, that will not be found via a Medline search.

Also Medline handles various journals differently. Some are indexed cover to cover, meaning that everything in the journal is entered into the database. There are other journals that are indexed selectively, meaning only some of the articles are included. The substance abuse journals tend to fall into the latter category. There are also priorities for actually entering materials onto the database. Some things will be entered immediately at the point of publication, the same week. This includes things such as the Journal of the American Medical Association, or the New England Journal of Medicine, or Lancet, the 'premier' medical journals. For substance abuse journals there is frequently a lag-time of at least 9 to 12 months.

In the absence of the above kind of information one can't evaluate the results of a particular database search.

Determine the file structure. Databases differ in the kinds of information that they include. Typically there are different choices for displaying information. The "short form" may provide the author and title, other display formats will provide additional information. For example, the full citation, the "long format" will often include an abstract as well as the indexing terms used. In addition, while not publicized, most databases have an even more complete, "technical" format, which includes absolutely all the information. Some of this is irrelevant to anyone other than the library staff, but it provides the most complete picture. When exploring any new database, print out a few citations in the longest format available. This will be useful later in deciding how to search for materials. Sometimes the "long display" format is over-kill and provides more information than you care to know. At other times it can be a treasure trove. For example, it may include the address of the first author, the number of references, special notes, and denote the subject terms, which provides a useful clue as to how things are indexed.

Use the help screens. The help screens provide examples and help solve problems that may arise.

Free text versus subject searches. Most databases offer several avenues to find materials, for example, by author, by title, by subject. In addition one can do "free text" also known as "general" searches, which will locate a word or words in any part of the record. These words do not need to be adjacent to one another. There are pros and cons to each approach. Computers are so fast that they will always spit out something. If you are doing free text searching and guessed wrong, it's no big loss in terms of time. On the other hand, free text searching tends to be much less targeted. There can be what librarians would describe as "false hits." Since a general search looks for the designated terms in any field, including the address for example, there can sometimes be interesting surprises.
In doing free text searches, avoid the use of jargon. Select the terminology that is most likely to be used in the literature. In addition, try to select the most distinctive phrase to minimize false hits. The use of "adjacent" in a free text search is another way of targeting the search, adjacent indicates that it is a phrase and not the independent words that is of interest. Thus one searches for "health adj care adj reform" rather than "health care reform."

Limit and cull your searches. Often the idea is not to locate everything but to find a group of materials that are absolutely on target. There are several ways of doing this. One technique for culling a large set of citations is to limit the search by date. Another option is adding an additional subject term as a means of better targeting what you find.

Finally, restricting a search to review articles is sometimes useful. A review article may well contain upwards of a 100 or more citations. Only one or two review articles can be an excellent way to familiarize yourself with a new topic, while also providing an easy method for getting into the literature.

Print out your search strategy. At the end of a session, print out a copy of all the searches conducted; these will generally contain the actual terms used. That way if you ever want to go back and re-do it, you will have an exact copy of what you did. Don't trust it to memory, even if you think it's obvious. Tomorrow inevitably it wouldn't be!

When something is cumbersome, there's probably an easier way. There are all kinds of tricks. The reference librarian is the one to ask; for those in college settings, don't settle for a student at the help desk.

Example. One thing that often is useful is to sort your search. A sort by date (in descending order) means you see the most recent materials first. A sort by journal title is very useful if you have to find things in the library. That way all the articles from the same journal will be printed out next to one another. So you wouldn't find yourself criss-crossing the stacks.