Google Takeout notes

I finally got my Google Takeout archives downloaded, the most important of my cloud backups. I tend to prefer Google for my services so a lot of stuff is there dating back 20 years. Google Takeout is an excellent product. What’s in the dump?

I chose to download Takeout’s default: everything but Access Log Activity. It came to about 67GiB of zip files, 72GiB uncompressed.

The top level of the dump includes a nice HTML explanation of what’s in it. With per-product documentation for the types of files you’ll find there and what formats they might be in.

Here’s info on some of the meaningful data I found that I was expecting to see.

  • 9GiB: Mail. The biggest and most precious of all my Google data. All stored in a single .mbox file, which is awkward but not an unreasonable choice. The file is not date sorted. The volume per year is confusing: 2012 and 2020 are big at 13,000+ messages. 2019 I only had 6000. Maybe it’s spam related. There’s also a few extra data files for things like filters, blocked users, etc.
  • 35GiB: Photos. I’m a heavy user of Google photos. Contains both original and edited images, also metadata in a simple JSON format. A little confused at the directory structure; many of the files are in folders like “Photos from 2020” but some are in per-Album directories, I think there are duplicates.
  • 23GiB: Drive. If you asked me I’d tell you I didn’t use Google Drive. I have no idea how this got so big. The useful stuff there is copies of my Google Docs; spreadsheets mostly. The big stuff is a bunch of photos I then imported to Google Photos, I could probably delete them from Drive. It’s a very random and poorly organized collection of stuff.
  • 2.5GiB: YouTube. All the content I’ve created (videos, comments). But also detailed watch and search histories going back 11 years.
  • 0.3GiB: Groups. MBOX format archives of Google Groups I’m an admin for.
  • 0.1GiB: Contacts. VCF format contact lists.
  • 0.1GiB: Calendar. ICS format, single file.
  • 0.3GiB: Location History. JSON files tracking my movements, used for Google Timeline.

And some of the less interesting or accidental stuff.

  • 0.3GB: My Activity, Google Pay. The biggest surprise to me; Google records meticulous details on when I use specific products, there’s online version of the product here. Goes back at least 10 years and includes Android apps, details of what Google Map views I’ve looked at, credit card transactions, YouTube video views, every search query for two years. It’s all stored in a generic format that seems to apply across Google products. Also the dump is an absolutely terrible HTML format with like 4KB of styled HTML per record. Example of a record for an Android app launch:
  • 0.1GiB: Maps. Data spread out over several directories. Bookmarked places, some KMZ files for custom maps I made.
  • 0.4GiB: Location History. Google’s recorded where my phone has been since I first installed Google Maps. They have a nice history browser for this, I also built my own visualizer product for the data. I really like having it but I think most people would find it surprising and creepy Google keeps this.
  • 0.7GiB: Google Play Games Services, Google Play Store, Android Device Configuration Service, Recorder. Stuff related to my Android phone, including a record of every version of every app I’ve installed and some saved game state.
  • 0.8GiB: Nest, Google Home. Stuff about my thermostat. Including 2 years of detailed temperature readings, etc from my house.
  • 0.6GiB: Blogger. I forgot I ever had a Blogspot blog but google didn’t. Also records of my comments on a lot of other blogs.
  • 0.6GiB: Voice. I have a Google Voice number I basically never use. But it gets spam voicemails, for which sound files and text transcripts have been saved for 7+ years.
  • 0.4GB: Google Account, Profile. A record of a year of explicit logins (as opposed to passive authentication). Love the inactive account emails I apparently wrote a few years ago: “What a horrible thing, but apparently I’m no longer able to access my Google account which means I’m likely dead or incapacitated.”
  • 0.1GB: Chrome. I don’t use Chrome much so this is very small. Among other things it contains a history of visited URLs going back 3 months.
  • 0.1GB: Hangouts. An archive of some GChat messages from 2017?

I’m pretty sanguine about all this data. I want Google to be keeping a lot of data for me and I trust them to be careful caretakers of it. Some of it is incredibly useful; I was really excited when I learned Google Maps had my location history, for instance. Google does a reasonable job letting you control just what you can track and I really appreciate being able to download the data.

The My Activity stuff is the one thing that made me nervous. Partly the awful format is coloring my impression. (There’s several projects on GitHub that parse and analyze it.) But also they’re storing a lot of sensitive data in a generic way that’s not sensitive to the particular app. I don’t really care they have a record of my credit card transactions, but I am a bit nervous at this second record of my Google Maps views or the exact times and places that I launched Grindr. I believe Google keeps this data in this format for security audits.