Now that you have an Archive set up, and you have established an overall organizational system, it’s time to tackle those individual files. This step in the process is the most time consuming, especially as you’re getting started, but once you get used to the process, it will be much faster!
For each file that will enter your archive, you’ll need to do four things:
- Decide if it’s something that you need to save.
- Decide if you need to resave the file in a different file format.
- Rename the file into a standard format, and add other information to the file if needed.
- Move the file into its appropriate place in your new Archive.
Decide whether to archive it
Take a look at your lists that you made in the first step. Pick an easy group of files to tackle first. Perhaps you’ll choose a bunch of files that you just downloaded from Ancestry, or you’ll want to go through a group of photos that were recently scanned. First, put these files together on your computer so that you can easily look through them and sort them. (i.e. If they are on a flash drive or other media, move it onto your computer. If the files are online, download them to your computer.)
Briefly browse through your files to see if there’s anything that does not need to be archived. Be selective, because you do not need to save everything. Do you accidentally have two copies of the exact same death certificate? Did you save the same newspaper clipping twice? Do you have two scans of the same photo? Do you have a series of photos that are very similar? Do you have several digital photos of the same group, but someone is blinking in one? It’s okay to delete extra copies of files, or delete blurry or poor quality photos if you have a better one. Do you have a lot of records that have a particular surname, but you aren’t sure how they connect to your family tree? You may decide to keep these on your computer, but you don’t necessarily need to add them to your archive until you can connect them to your family.
As you’re going through all your files, you may find that there are large groups of files that are duplicated, because you had previously backed them up. You’ll only need to save one copy of each file in your archive, so any exact duplicates can be deleted. If they are the same document or photo, but are different file formats, you will need to decide whether to save both formats, or just save the best quality file. For photos or images, these could be the copy that is the highest resolution.
If you end up having lots of duplicates to search through, there are software programs that can help you sort through these and help you identify which copies should be kept. (Just be sure that the software give you the decision-making power, so that you can ensure that the program is not unintentionally deleting files that should be kept!)
Check the file format
For any files that you’re going to save, you’ll need to check that the file format is archive-ready, or a file format that is likely to be functional in the future. There are some file formats that are more likely to be future-proof than others. If it’s not in an archive-ready format, you’ll need to resave or reformat that file into a better format. Archivists call the process of moving from an obsolete format to a better, archival format “migration.” It’s worth saying that the vast majority of your files will already be in one of these better formats, and you won’t have to resave your file. However, you’ll likely find a couple files that need to be changed into a better format before putting it into your archive. This mini-step is geared toward those files.
How do you know what’s a good format to use? What kinds of file formats are more likely to be “future-proof?” For your digital archive, it’s best to use file formats that are high quality, lossless (which means that the file won’t lose any quality or data each time that it is saved), and non-proprietary (which means that they don’t rely on one specific company’s software or hardware to open). For example, some software becomes obsolete when companies are sold, go out of business, or decide that they aren’t going to support that software anymore. For example, AppleWorks and WordStar were word processing softwares that are no longer used, and if you have files in these formats, they may not be able to be easily opened anymore. Some old scanners used to save files in uncommon file formats, so double check any old scanned photos. Likewise, Kodak cameras used to use .FPX file formats for digital images, but this file format is no longer used. Even long-standing programs like Microsoft Word may not be backward-compatible, meaning that your Microsoft Word program today may not be able to easily open a Microsoft Word file from 1995 (or if it can open it, some of the original formatting, graphics, or other features may be lost). If the file format names a particular software program, it is likely to be proprietary. If a file is commonly used today, and is able to be opened in a variety of software programs, it’s much more likely to be supported in the future, too.
Good archival, stable, and non-proprietary file (“future-proof”) formats for your files include:
- .GED for family trees – A GEDCOM is the standard file format for genealogy databases. It does not rely on particular software programs to open. So, make a backup of your family tree from wherever you keep it (online or in a software program) as a GEDCOM and put this file into your archive. Look in the help section of your family tree program to learn how to export your family tree in this format. (Learn more about what a GEDCOM file is and how it works in this article.)
- .PDF or .DOC or .DOCX or .TXT for text-based files – Although .pdf appears to be a proprietary file format owned by Adobe, it is actually an open, standard file format, and is widely supported. You can often create .PDFs from a webpage or document from the “Print” dialog box. The file format .docx is also an open file format, meaning that it’s supported by a variety of softwares, and is likely to be supported in the future. The .txt file format is a simple text format that does not allow for many formatting options, but is great for basic text-only files, and it is able to be opened by many different programs. Archived emails are sometimes saved as .txt files. Any of these file formats for text-based files should be accessible in the future.
- .TIF (and .JPG) for images – I prefer .tif (or .tiff) file format for images because it is uncompressed and lossless and is archival quality. I always save or scan my images in a little higher quality than necessary to make sure that it’s the best quality possible. I want to be able to zoom in closely with any photo and still have a fairly clear image. For any images that I’m going to share online, I also save a .jpg (or .jpeg) copy for easy sharing. The .jpg files will be smaller, but are compressed files, and a tiny bit of quality can be lost every time it’s changed and saved. Both .tif and .jpg file formats are very likely to be supported and useable in the future. If I have two copies of an image (one TIF and one JPEG, I always name them the same thing (with different extensions of course) so that they stay together in my folders.
- .WAV or .AIFF for audio files – Both .wav and .aiff are lossless file formats that allow for uncompressed files, and are high quality. When you are saving or exporting an audio file, your audio file bit depth should be 24 bits per sample or higher, and the sample rate should be 44.1 KHz or higher whenever possible. .MP3 is another common file format, and is highly accessible, but it is lossy and usually sacrifices quality for file size, so I don’t recommend it for an archival audio file format.
- .AVI or .MOV or .MP4 for video files – Video files take up huge amounts of space if they are uncompressed, and can be about 1 GB per minute. If you have the space, choose an uncompressed file type like .avi, which is a standard and well-used file format, and most video applications can open this file format. You may also choose a compressed file format with the highest quality possible such as .mov (if you have a Mac) or .mp4 (if you’ll be using a PC or other computer). Both of these file formats are commonly used and are industry standards and although they are slightly lossy, they are a high quality choice. (For more information about archival video formats, read this article.)
(Note: If you have any files that end with .zip, this simply means that a file or files have been compressed into one package, and these ZIP files should be opened or uncompressed before you archive them. DNA raw data is often downloaded in a ZIP file; these files can be saved in both zipped and unzipped format. Also, this list is not intended to be a be-all and end-all list. You may have other needs and preferences when it comes to file formats for your archive. Also, as technology progresses, recommended file formats may change. To learn more technical information about recommended digital archiving formats, see this statement from the Library of Congress.)
To figure out what the file format of your file is, look at the file in your Finder (for Mac) or My Computer (for Windows). If the file format is not immediately visible at the end of the file name or in the screen, you can right-click on the file once to open “Properties” (on a PC) or “Get Info” (on a Mac). This screen will provide you with more information about the file, and should give you the file format. Once you know what the file is, look at the list above for recommended file formats for each kind of file. If your file is in a file format that is not on the above list, consider saving both the original file, and a copy of the file in a better archival format. If you do need to convert files between file formats, find specific instructions for doing this by searching online with search terms like “converting ____ files to ____ files on Mac” (or PC if you have a PC). For some programs, you may be able to open it on your computer, and then go to File → Save As or File → Export to select a different file format. Other file formats are a little more complicated, and you’ll find instructions online. Or, go to your local library and ask a librarian or technology support person if they would be able to recommend any resources or programs.
You can always save two copies of a file; one “working copy” in a proprietary format like Photoshop, and one “archive-ready copy” that will be more easily accessible in the future, especially once you’ve created your final product. You’ll want to catch any software-dependent and proprietary file formats and save them in an archival version before that software becomes obsolete, and before you don’t use that software anymore. For example, you may have a bunch of photos that you’re working with in Photoshop, and you have those working files saved. Once your editing is done, you should save the photo as a .JPEG or .TIFF. And, if you ever decide to end your subscription to Photoshop, you’ll want to open and export all the images that you’ve been working on before your subscription ends.
If the file format is already in one of the above file formats, take a quick look at when the file was last modified or opened. If it’s been 5-10 years since you’ve opened that file, double-check that it’s still a functional file by opening it on your computer. (If you have lots of similar files from the same time period, you might only check one or two to make sure that the set is still accessible.)
Again, the vast majority of your files are already going to be in one of these archive-ready file formats, and you don’t need to do anything to them. Don’t get stuck on this mini-step!
Rename the files and add metadata
Now, we’ve looked at the file, decided to keep it, and decided that it’s in a good archival file format. Before we move it into our Archive, we need to add some information to the file so that we know what it is. I recommend renaming all your files in a standard file naming format. If you are consistent, you’ll easily be able to sort and find your files when you need them, and you can identify what each file is at a glance. Once you’ve decided how to name your files, be sure to write down your standard format in your Family Archive Guide.
For your Family Files, I recommend using a naming system that includes at a minimum the name of the person in the record, the year of the record, and what the record is. Other information in the file name could include the full date, where you got the record, or the location of the record or image. You can use underscores (_) or dashes (-) to separate different sections of the filename. For example, all of my files follow this same pattern: LastName_FirstName_YearInRecord_WhatItIs_WhereItCameFrom.FileFormat
I organize everything by last name, then first name, so that all records for each person will be together when it’s in its folder. Then I put the year that the record or photo was originally created. Then in a few words, I describe what the record is (Census, Death Certificate, or short description of article or photo), and then I briefly describe where I originally got this image or record (is it from Ancestry or another database? Did I scan this from my own photo collection? Is it from a county courthouse? etc.) Some people prefer to write the whole date, if known, to help them sort things (such as, Kaiser_Andrew_1923-10-20_obit_DekDailyChron.pdf for a newspaper article published on Oct. 20, 1923.) If you have several photos that are very similar or are from the same group, you can name them the same, and add a number as part of the name to differentiate them. (See the first two examples of Andrew Kaiser’s Garden Photo.) If the file or record relates to multiple people (for example, a census record), I usually list the primary person or head of household in the file name. You may choose to do something slightly different.
Your Locality Files could use a similar pattern, adapted for location names instead of people names. For example, many of my locality files use this pattern: SpecificLocation_GeneralLocation_Year_WhatItIs_WhereItCameFrom.FileFormat
Even though your files will be contained in family or locality folders that will also help identify what the file is, and who it belongs to, your file names will also help you identify the file when it’s not in its proper folder. And, it will help you search for things. For example, I have ancestors who lived in both Kingston, IL and Kingston, NY. I have some maps and photos relating to both places. Because I put “Kingston_DeKalbCoIL_” and “Kingston_UlsterCoNY_” in the file names, I can differentiate between these two locations at a glance, and I immediately know which folders they belong in.
You may choose a slightly different naming pattern. As always, use something that works for you! The key is to be consistent. Any files that are going to be entering your archive should get a new or updated name.
You should be able to rename your files without opening them. For most computers, you can right click and select the option to “Rename.” To save yourself some time, you can rename multiple things at a time by selecting all the files that you’d like to rename, right click, and select “Rename.” If your computer is slightly different, you should be able to find more specific instructions online.
If there’s still more information that you’d like to include with your file, such as a caption, more complete source information, or other location information, you may be able to add more information by right-clicking on the file and selecting “Properties” (if you’re on a PC) or “Get Info” (if you’re on a Mac). Information about a file (which is also called metadata) will travel with the file if it’s imbedded into the file itself. Captions that are added to photos in a software program, for example, may not travel with the actual photo, and may not be saved when your photo is archived. (Learn more about adding metadata to your digital photos and files here.) I highly recommend adding this additional metadata to your important photos, audio files, and videos to provide context, name everyone in the photo, and provide as much additional information about it as possible.
Move it into your archive and Repeat!
Once your file is in a comfortable format and is renamed, it’s finally time to move it into your Digital Archive! Drag the file into the appropriate folder in your archive, using your Family Archive Guide as a guide if you haven’t yet memorized your organizational structure. Now it’s in your archive, and safe and sound! Time to tackle another group of files on your list!
Once you’ve sorted through and archived one group of your files, move on to the next folder, flash drive, or floppy disc on your list. Do all the easily accessible media first. Eventually, you may encounter old media that you can no longer access (like those floppy discs!) or obsolete file formats. Try to determine if the files that are on the old media had already been backed up somewhere, and you were able to archive a copy of those files from another copy. If you don’t know what’s on that old media, or you don’t think they were ever copied to another storage device, you may need to figure out a way to open those files again. If you have an older computer that still works, you may be able to access that media or files from that older computer. Some libraries or archives have adapters that can read old media, or you may find a local computer business that can fix old hard drives and open old media. If you can’t find any local solutions, you may search online for “USB External Disc Drive” and “CD drive” or “floppy disc drive” to find a drive that you can purchase and use with your computer. If you came across any files that have strange file formats, and you aren’t sure what they are, or where they came from, you can search online for the origins of that file format, and how to open that file format. (Websites like fileinfo.com should be a big help for you.) Again, you may find assistance from a library or technology expert if you get stuck.
Keep going, and don’t give up, and eventually you’ll make your way through your whole list, and you’ll get all your files safely into your Digital Archive. Be patient, and just keep working on it!
But wait! You aren’t done yet! You’ve created your Digital Archive, but there’s one last thing you need to do… Protect it! Stay tuned for the last blog post in this series, coming soon!
Emails, Websites, Audio, Video and more all have unique archiving challenges and processes. Learn more about archiving specific kinds of files from these resources:
- Association for Library Collections & Technical Services: http://www.ala.org/alcts/preservationweek/howto/digital-preservation-tips
- Library of Congress: http://www.digitalpreservation.gov/personalarchiving/index.html