Wednesday, January 21, 2015

Easy email archiving for Gmail/Google Apps

I've always seen the value of email archiving, but have never really known how to start.  At my institution, we use Google Apps for Education, which means we essentially use Gmail for our email.  In the past, this has been problematic for email archiving.  But, then this morning, I came across a great Google script to download emails as .pdf files right in to Drive.

The website says:
All you to do is apply the label “PDF” to any email thread in Gmail and the message, along with all the included file attachments, will get saved to your Drive. Unlike the previous options that can only work against individual message, this one can save a batch of messages automatically. Just apply the label “PDF” and a copy of those message would show up in your Drive in few minutes. [link]

After a morning of tinkering, I'm happy to say that it works!  I made an "Archived Emails" folder and within a few hours, all 140 emails were in .pdf form that showed the to/from/date headers and included attachments.

I'm happy to share my experiences here in case anyone else can benefit from them.  Better yet, if you've got some dev skills, maybe you can even improve upon it!  The source code is here:

A few things to note:

1.  Time for them to show up in Google Drive was more than "a few minutes."  

It may have been because I had so many messages, but it took closer to 3 hours by the time all 140 messages were in Google Drive.  Still, since it was automatic, it didn't need me to baby sit the process and so ultimately the time it takes to show up didn't matter in my opinion.

2.  Attached files are downloaded as separate files and then linked from within the email.  

For example, take this email:

The script downloaded the email message as a .pdf file and the attachment as another .pdf.  The email message .pdf had the subject line as its file name and the attachment had the regular attachment name as its .pdf.  

The downside of this is that there isn't a way to have them together if sorted by file name / title.  But, I kept my Drive folder sorted by date uploaded, which meant they were kept together.  The upside of this is that when you click in to see the .pdf file of the Gmail message, there is a link under attachments that automatically connects you to the downloaded attachment .pdf.

3.  Inline images / embedded images don't seem to download.

Instead, I just get an empty email.  I've emailed the developer to see if there is a way to change this, but no word yet.  If you think you have a lot of emails with embedded images (I do), I would maybe recommend flagging those emails individually to manually save as .pdf via "Print" and "Save as PDF."

Sad face.

4. Customize the scripted Google Spreadsheet.

From Digital Inspiration's website, you can download a Google spreadsheet with the script already in it.  Go down to the bottom and choose "Click here to copy the Google Sheet in your Google Drive."  Once that is in your drive, you can customize it to be what you want.  I kept the PDF as the label, but I did change the title of the spreadsheet ("Archive Gmail" is what I used) and changed the Drive folder name to be "Archived Emails."  I also moved the spreadsheet to the Archived Emails folder to keep it all together.

The Step by Step instructions on the website is pretty self-explanatory.  You click the "Save PDF" and authorize the script to run (I ended up doing this a couple of times for some reason) and then clicked "Run program."  It does it all automatically.  You can close the spreadsheet and never open it again, if you want.  It will keep running, as far as I know, until you go under "Save PDF" and choose "Uninstall."

Tip: Make sure you are only signed into the account that you want to use.  If you're like me, you have 5 different Google accounts and fully take advantage of their multiple account sign in.  I ran into a few hiccups when I was signed in with multiple accounts, so I started in an "incognito" or private browser to just be logged into the single account.

5. What happens to the PDF label?

After they are downloaded as PDF, the script automatically removes the PDF label within Gmail.  This way, you can track how many are still pending conversion.  Again, while it claims to take minutes, for me it varied in time.  For a few of the messages, it happened immediately.  For the rest of the messages, it was spread out over a 3 hour window.  

6. Multiple instances!

So after I tinkered with this for awhile, I decided to be bold.  I tried a few different things: nested folders in Google Drive and trying multiple scripts looking at different labels.  I'm happy to say they all worked!  Now I have two archived email folders - one for the reference interactions (it was really just a test; I'm not actually saving these) and one for the campus wide fac/staff announcements.

What's next?

Well, I will eventually plan to download the entire folder (it will download as a .zip) and when we get a digital preservation system up and running, I'll add these files there for bit-level preservation and access.  Until then, I am just doing the Two Copies/Two Places thing: one copy in Google and on their drive and email servers and one copy on our network's share drive.  Better than nothing!

Also, I will need to work out a collection development policy for email archiving. It's hard to really filter out what is worth keeping in a medium that used so heavily.

*Disclaimer:  I fully acknowledge that saving emails like this as PDF is equivalent of "printing and putting it in a box."  There are many components of email archiving that I haven't acknowledged.  However, for experimental purposes, this is what I played around with and can make work as a stop gap.  :)

No comments:

Post a Comment