LDS Church archives to become the Google Books of genealogical records

David March 21st, 2008

FamilySearch logoLast night I attended the Utah Java Users Group (UJUG) meeting and heard a presentation by senior developers and leadership from the LDS Church FamilySearch Digital Pipeline teams. I believe that it is certainly worth mentioning.

To give you a little background…

If you haven’t had a chance to see how the Google Books indexing project is coming along, take a look. They are taking scanners into university libraries across the US and scanning and indexing the full text. Not all of this is searchable online because of copyright issues but nevertheless huge number of books are now available because the copyright is out of date or because the publisher has granted Google rights to make them available.

The LDS Church has been scanning historical documents since the 1930s onto microfilm and microfiche and stored them at the Granite Mountain Vault for safekeeping. Now they are digitizing these scanned records as well as digitizing other records as they become available.

FamilySearch Indexing

With the newest scanning technology, they anticipate being able to completely scan all documents in the vault in 8 to 10 years. With terabytes of digitized images of censuses, birth / marriage / death certificates, and other records, the next step is to index this data. The technology for automatic indexing of handwritten documents is still not ready for production use but when you have an army of 130,000+ volunteers, you can utilize the strengths of technology to present the necessary information quickly and use the strengths of individuals to identify handwritten text. Doing so, they have been able to index up to 500,000 names per day. This includes double entry (two separate extractors) and arbitration if the data doesn’t match perfectly.

The Granite Mountain Vault isn’t the only digital data that is being processed by this program. Several US states have donated their records to the Family History Department to scan and preserve their data. The LDS Church is under negotiations with other public entities to extend the records that will be available. If you want to participate in this program, go to http://familysearchindexing.org. Don’t worry, they won’t run out of work for you to do any time soon!

FamilySearch Record Search

Although this isn’t open for public beta yet (summer 2008) all of the records that these volunteers are indexing are already available at http://search.labs.familysearch.org. They have developed Rich Internet Applications (RIA) utilizing a RESTful Web Services framework running on Java and open source technologies. They are building a highly scalable, parallel architecture to handle 100 requests per second (currently handling 80/sec). The presenter, Rob Edwards, said that in early negotiations with a 3rd party development company who supports eBay Japan (their architecture handles 3 requests per second) walked away from negotiations because they didn’t believe that their technology would handle so many more transactions.

They didn’t say this, but in a competitive work environment I’m sure that many companies are trying to recruit developers from these teams because they have been able to solve problems top companies are currently facing. Did I say that they are hiring?