|Given Name(s)||Last Name|
Soundex, Part I
Editor's Note: This article is the first in a two-part series. Read Part II.
A rose by any other name may smell just as sweet, but it's going to be difficult to find in an index. The same is true with the names of various family members. Roses and other flowers have guides to assist in determining names and their proper spelling. It's not so with ancestors.
The Soundex is one indexing procedure that attempts to work around name variations. Like any index, it will have limitations and drawbacks. Since some passenger lists and many federal population census records since 1880 have Soundex indexes, a working knowledge of the system is necessary for the family historian. Some online databases even offer a Soundex search option. So being clueless about Soundex is not a good idea.
The gentle researcher asks, "But isn't Soundex where you have to code numbers?"
Yes. A name like Nowell has a Soundex code of N400.
The gentle researcher responds, "So why should I bother? There are Web sites that will do it for me. I donít need to bother with learning the minutia of this algorithm. It just clogs my brain."
Yes, there are sites that will determine the Soundex code for a name. These sites can save time and reduce errors in code creation. While it's no longer necessary to code each surname by hand, knowing how surnames are coded will make you a better user of Soundexes (just like knowing how to add, subtract, multiply, and divide still comes in handy in this calculator age). There's always the chance the Soundex conversion program or Web site you are using could contain glitches. And the original Soundexers did not use computer algorithms to determine the Soundex code; they did it by hand and hence were not always consistent. A type-it-in-and-get-the-code box is great, but it will not troubleshoot for you when things do not work.
Soundex codes are based upon sounds. Names having similar pronunciations are ideally assigned the same code, a series of four characters. The first character is the surname's initial letter. The second three characters are numbers based upon the other letters of the surname. The three numbers are coded according to the following relation:
2= C, S, K, G, J, Q, X, Z
3= D, T
5= M, N
Saying the letters aloud makes the groups a little easier to see (er, hear) how Soundex groups similar-sounding letters together. Vowels and the letters H, Y, and W are ignored. Letters of the name are converted until three numbers have been assigned. Double letters are coded as if they appeared singly.
Names that do not generate enough numbers for a Soundex code have zeroes to complete the assignment. My surname Neill is coded as N400. The "N" comes from the first letter. The E and I are not coded. The LL is only coded once as "4" (because the Ls are adjacent to each other). Zeroes are used to complete the assignment.
Additionally, any two adjacent letters that have the same number assignment are coded only once, even if the letters are different. For example, the surname Brodt codes as B630. The "B" is for the initial letter of the surname. The R would result in the "6," and the DT would only be coded one time, as both letters appear in the listing for "3." If you think about how the name is spoken, the DT at the end usually results in just one sound coming from your mouth, hence one number for the D and the T.
What types of spelling variants does Soundex catch?
All have T631 as their Soundex code.
If they sound alike, do they always have the same Soundex? Not necessarily.
Think about Neigh and Nay. Neigh would have Soundex code N200. Nay would have Soundex code N000 (because there are no letters to code after the N). In this case, the silent G in Neigh gets a code number.
How does this actually create an index? It is different from other indexes genealogists use. For microfilmed Soundexes, the name and other identifying information is usually put on a card with the Soundex code at the top of the card. The organization of the cards can be confusing at first.
Most other print or microfilmed indexes are similar to a phone book. Individuals are indexed first by last name and then by first name. The phone book is so ubiquitous that the organization seems obvious. The Soundex cards are placed in an index differently. The cards are sorted initially not by last name, but by Soundex number. Ideally, all cards with the same Soundex code are grouped together. Then the cards are sorted by the individualís first name.
If I'm looking for Samuel Neill in a Soundex, I go to the N400 portion of the index. The cards within this section may be arranged as:
It looks a little bit confusing since the surnames are not in strict alphabetical order. They are not supposed to be. The surnames all code out to N400, and the cards are sorted by first name (not surname).
Does it all work perfectly? In a word, no. When a Soundex helps you find someone, it's wonderful, but it doesnít always work. Next week's article will discuss some of the drawbacks, difficulties, and limitations of the Soundex. And believe me, theyíre out there.
Online Soundex Converters
Michael John Neill, is the Course I Coordinator at the Genealogical Institute of Mid America (GIMA) held annually in Springfield, Illinois, and is also on the faculty of Carl Sandburg College in Galesburg, Illinois. Michael is the Web columnist for the FGS FORUM and is on the editorial board of the Illinois State Genealogical Society Quarterly. He conducts seminars and lectures on a wide variety of genealogical and computer topics and contributes to several genealogical publications, including Ancestry Magazine and Genealogical Computing.
© Copyright 2000, MyFamily.com.
Beyond the Index-Michael John Neill
Michael John Neill's articles from the Ancestry Daily News