Technology Solutions for Everyday Folks

Exporting Legacy/Archival Emails with Google Apps

At work, we've had a number of folks retire over the last 18 months, and a number of those folks are holders of some significant institutional knowledge and memories. Recently I had the opportunity to work with one of these individuals as they prepared to both hand off information to others, which led me to recall several conversations I've had with my Libraries/archives friends about preserving the "right" subsets of digital messages such as email.

The Basic Issue

We're Google Apps folks, and while there are some substantially cool self-service tools available for folks to export their data, there's a distinct lack of ability to do this on a more nuanced basis. The major tool is kind of an all-or-nothing bit, and especially for these circumstances folks don't want (or need) to export their entire email history...just to parse through the export and pick out the valid messages.

What we do have en masse, however, are folks who have adequately labeled their messages over time, which provides a basis for the ask I received.

Is there a way I can export or save all the emails I have labeled "whatever" to PDF so I can share them with colleagues?

The Solution

Having done several magical things with the Google API, I knew there would be an easy way to manage this with a simple Apps Script. I could cobble something together to generate these files and drop them in a Google Shared Drive as necessary, which makes the process of sharing with others trivial. No need to overcomplicate things...

Thanks to the help of the general Internet, after a mere couple minutes of effort such a script exists:

function exportEmailToPDF() {
  // Script Variables to Set
  var drivePath = DriveApp.getFolderById("LongStringOfTheDestinationFolderIdentifier");
  var plaintextPath = DriveApp.getFolderById("LongStringOfTheIntermediateFolderIdentifier");
  var label = GmailApp.getUserLabelByName("YourLabelNameGoesHere");
  // End of Script Variables to Set
  
  var sheet = SpreadsheetApp.getActiveSpreadsheet();
  //Logger.log(label.getName()); // Write to log the name of the label (can help identify label typos)
  var threads = label.getThreads();
  for (var i=0; i<threads.length; i++)
  {
    var messages = threads[i].getMessages();
    for (var j=0; j<messages.length; j++)
    {
      // Grab the message 
      var html = messages[j].getBody();
      var subject = messages[j].getSubject();
      var date = Utilities.formatDate(messages[j].getDate(), 'America/Chicago', 'MM-dd-yyyy');
      //Logger.log(subject+'-'+date); // Write to log the subject and date of the message (troubleshooting)
      // Write the intermediary message to file
      var textFile = plaintextPath.createFile(label.getName()+'-'+subject+'-'+date, html,"text/html");
      // Convert to PDF and save
      var pdfBlob = textFile.getAs(MimeType.PDF);
      var pdf = drivePath.createFile(pdfBlob);
      var pdfLink = pdf.getUrl();
      // Write the basics to our parent spreadsheet
      sheet.appendRow([date,subject,pdfLink])
      // Remove the intermediary file
      plaintextPath.removeFile(textFile);
    }
  }
}

With three variables (two being the Drive folder IDs, more about that in a second), the first run of this will prompt the user for proper OAuth permissions. Assuming the user approves (this script is run within the user's context), it will generate PDFs of the matching messages and drop them in the appropriate Drive folder. Voila!

Some Nuance

It'd be totally rad to have this work in one simple action, but as it turns out there are two steps in creating a PDF in this case:

  1. Export the raw data/format to a flat file; and
  2. Convert the flat file to PDF.

This is where the "two" Drive folder IDs come to be A Thing. Depending on the circumstances, it may be wise or necessary to have the temporary files live in their own distinct location from the destination folder. At the end of the day the intermediate files are cleaned up, but YMMV. Set these folder IDs to two distinct locations...or use the same one. Whatever works for the use case.

That's All!

In "production," I had folks use this from the Script of a Google Sheet, because most of those folks were familiar enough with Google Sheets to be able to walk them through the process. There isn't a reason this couldn't be spun out as its own Apps Script proper by adding some details about the source/logging sheet, though.

At the end of the day, though, this is a low-effort solution to a problem, and I've created a Gist of the above if you're interested in trying it out!