Given Name(s) Last Name

Ancestry Daily News
  Michael John Neill – 3/30/2004


Getting Wild

In the last few columns, we've talked about name variants, the reasons why they came about, and the frustrations they create for the genealogist. This week we look at some ways to search for variant spellings of names.

Searching for Soundex Equivalents
The Soundex is a system that assigns a number to each name based upon the initial letter of the name and the next three distinct consonant sounds in that name. Today many online databases allow researchers to search for their names based upon a Soundex option. Most sites that support a Soundex search do so on the last, but not the first name.

Before I do any Soundex searches for a name, I refer to my list of variant spellings for the surname. I like to have one sheet of variants (either on paper or in a word processing document) for each surname on which I'm working. For each of these spellings, I then list the Soundex code (usually a letter and three numbers). Most online databases do not require that I enter in the Soundex number (or even know what it is). Soundex searches of online databases are usually performed by checking a box for Soundex. However, having the list of my variants and their codes tells me how many separate searches I will have to conduct even if the Soundex option is used.

For example:

NEIL has Soundex code N400
NEILL has Soundex code N400
NEAL has Soundex code N400
NIEL has Soundex code N400
NEALL has Soundex code N400
O'NEIL has Soundex code O540
ONEAL has Soundex code O540

A Soundex search for Neil will not catch O'Neil. Consequently, two separate searches will have to be conducted for these variants. One for any of the names that have a code of N400 (such as NEIL) and one for a name that has a code of O540 (such as O'NEAL). Notice that a different first letter in this case generated a different Soundex number.

Another example:

TRAUTVETTER has Soundex code T631
TROUTVETTER has Soundex code T631
TRAUTFETTER has Soundex code T631
TROUTFETTER has Soundex code T631
TRANTVETTER has Soundex code T653

Again two separate searches are necessary. A Soundex search for Trautvetter will catch the first four references, but a separate search is required for the Trantvetter spelling.

Soundex searches work well in some situations, particularly when the name is pronounced in a language with a pronunciation that is reasonably similar to English and the handwriting of the original record is easy to read. In other cases, a Soundex search may not be the most effective tool available to the researcher. For many search interfaces, there are other options in addition to the Soundex that can allow the researcher to overcome the limitations that hinder Soundex.

Wildcards
Wildcard searches allow the user to enter some, but not all, of the letters of the name. The search will then pull those results having the desired letters. Using wildcard searches effectively requires that the researcher have an understanding of how the wildcard search works and how the name being researched could be spelled.

Before our discussion of wildcards continues, it is important to note that the wildcard operator used can vary from one site to another and the number of known characters required before the search can be conducted varies among different sites.

Multicharacter Wildcards
Typically an asterisk (*) or a percentage symbol (%) is used for multicharacter wildcards. On a site that supports wildcard searches, a search for the last name of Nel* will result in matches such as:

Nelson
Nellson
Nelsen
Nelton
Nel

Any last name containing the first three letters Nel will be returned. The wildcard operator * or % typically means that any number of characters can be put in the place of the operator, including none.

It may be possible to use this operator in places other than the name box. In the Social Security Death Index at RootsWeb, a zip code for last residence can be entered as 614*.

In this case, all the results will have 614 as the first three numbers of the zip code. This can be a way to broaden the search geographically, without entering any other locality information.

The United States Geological Survey Geographic Names Information System (USGS GNIS) also allows wildcard operators. A search was conducted for b%ville in the state of Illinois (by choosing Illinois from the dropdown menu of states and entering b%ville as the search term). At this site, the % serves the same function as * does at other sites (such as Ancestry and Rootsweb). The search for b%ville in Illinois at the USGS GNIS site resulted in several hits, including:

Bartonville
Belleville
Bondville
Blueville

Of course, it would have located Bubbaville, Illinois, if there had been such a place!

The multicharacter wildcards are great, but sometimes they return too many hits or are not the most effective search tool. This is particularly true if there are a specific number of characters within a name that can vary.

The last name Kile is a good example. By far, the two main variants on this name are Kile and Kyle. If the site allows me to conduct a search for K*le (or K%le), I will get hits such as:
Kastle
Kertle
Kile
Kyle
Keetle

Some are too far off the mark to be the name I need. In this case, the main variants differ by one letter. This is where the single-character wildcards are convenient to use.

Single Character Wildcards
With Kile, the real variant is Kyle. If the site allows, searches for this name are best conducted using a single character wildcard. This wildcard frequently is either an underscore (_) or a question mark (?). My search for Kile is best conducted as K_le or K?le depending upon which operator the site allows. This way the variants obtained are:
Kile
Kyle
Kale
Kole

Results other than Kyle and Kile will be returned, but at least the number of matches has been reduced. Of course, if Kile is spelled as Coil the name will not be located using this method.

Combining Wildcards
Depending upon the name (and the capabilities of the site) wildcards may be combined in one search term. The name Augusta can be spelled in many ways, including

August
Augusta
Auguste
Aguste
Agusta

Considering only these variants a wildcard search could be constructed for this name as A*gust*.

The * (or %) typically does not have to be replaced by anything, so this search should catch those names that end in an e and those that have no letter after the final t. I find it helpful before conducting wildcard searches to write down as many of the name variants as possible and determine what letters each have in common. Those letters should be included in my search. Where there are differences, a wildcard operator should be placed.

Check Yourself
After performing a wildcard or Soundex search, look at your results. Are you getting names you thought you would? Are you not getting any names at all? If so, perhaps the search is not working in the way you thought it should. If I get really confused regarding the way a site uses wildcard operators, I generally read the Frequently Asked Questions section, the help guide, or whatever looks like it will provide more information on the search interface.

Failing that, I conduct these searches in the surname box:

Smi*
Smit_
Smit%
Smit?

If the database is an English language database there will almost always be a few Smith entries.

How Many Characters?
Some sites, like Ancestry.com and RootsWeb.com, do not allow search terms to begin with a wildcard operator. At these sites, three characters must be entered before a wildcard operator can be used. Some sites provide good explanations of their search interfaces and how wildcards should be used. Some do not. Experimentation is a great way to learn. Just remember that beginning a search term with a wildcard operator may significantly slow down your search results. This is why many sites do not allow searches for *orn, or any search beginning with the wildcard.

In general:

- Read the help guide
- Determine what wildcards can be used
- Write down all your variants
- Determine what your variants have in common
- Look at your results to see if you are getting what you think you should.

Experiment. You never know what a site will do until you try!

Michael John Neill is the Course I Coordinator at the Genealogical Institute of Mid America (GIMA) held annually in Springfield, Illinois, and is also on the faculty of Carl Sandburg College in Galesburg, Illinois. Michael is the Web columnist for the FGS FORUM and is on the editorial board of the Illinois State Genealogical Society Quarterly. He conducts seminars and lectures on a wide variety of genealogical and computer topics and contributes to several genealogical publications, including Ancestry Magazine and Genealogical Computing. You can e-mail him at mjnrootdig@myfamily.com or visit his website at http://www.rootdig.com, but he regrets that he is unable to assist with personal research.

Copyright 2004, MyFamily.com.

Other Genealogy Articles by Michael John Neill