How To Download Your Twitter Archive Using curl

Without fail, for the past several months, every time I've tried to download my Twitter archive from my web browser, I've not been able to do so. The download gets to about 50 or so megabytes, downloading at like 50 kilobytes per second, before uncermoniously failing:

Snippet of my Firefox downloads panel showing a zip file of my Twitter archive. Underneath the download name is the status "Failed", along with a button on the right to retry.

I couldn't tell you why downloading from my browser fails. Maybe it's a Firefox bug. Maybe it's a Twitter bug1 . Maybe my hardware is old. It could be a combination. Regardless of the reason, I want to download my Twitter archive, and I'm being prevented from doing so.

Each time I remember to creating an up-to-date archive, I try the direct browser link, completely forgetting that it's not going to work. And each time, I have to relearn the process I went through to successfully download my archive. So this is my attempt to document it for future-me and others.

All subsequent text assumes that you have requested your Twitter archive, and that Twitter has notified you that your archive is ready. I suggest trying to download no later than 4 days after Twitter notified you, to give you a few days to experiment before Twitter deletes the archive.

After watching the download fail a couple of times 2 , I copied the download link to my clipboard and pasted it into wget and curl in frustration:

Screenshot of failing attempts to download my Twitter archive directly using wget and curl. wget returns a "401 Unauthorized" error, while curl doesn't emit any output or error information. But I assure you neither command succeeded.

I knew in my gut that this would not work and it cannot possibly work. Yet I did it anyway. I am not the only one either. The simple solutions are tempting when one is frustrated.

The reason the download fails is that Twitter wants some extra data in the HTTP headers3 . wget and curl do not- and cannot- provide these headers without your help. You can use your web browser to find the correct data needed.

What Does Work- Your Web Browser Monitor And curl

Preparing The Browser And Monitor

Both Firefox and Chromium-based browsers provide tools to get the required data- the Network Monitor. This post focuses on Firefox because that's what I use, but the process for Chromium should be similar.

To prepare Firefox and Monitor, I open a new Firefox window, and set the address to about:blank4 . The Monitor can be accessed by right-clicking the about:blank page, and selecting Inspect from the pop-up:

Screenshot of a newly-opened about:blank page, showing a context menu after I right clicked. My mouse cursor is hovering over the Inspect option of the context menu.

Clicking Inspect opens the Firefox Developer Tools. The Network Monitor tab in Developer Tools should look something like below:

Picture of Firefox Developer Tools window with the Network
Monitor tab open. This picture was taken right after I right-clicked `Inspect`
in the `about:blank` page. The Monitor tab is 'empty', and is
asking me to reload the page to start collecting data.
By default, Developer Tools opens up to the Inspector tab. You need to switch over to the Network tab to open the Network Monitor.

Collecting Headers

The web browser and Monitor are prepared. We will get the data we need to make a curl suceed at downloading our Twitter archive succeed by first using our newly-opened browser window to (start) download(ing) our Twitter archive.

You can get the direct download link to your archive by right-clicking the "Download archive" from the "Donwload an archive of your data" page, like so:

Snippet of the Twitter webpage to download your archive. A "Download archive" button is on the right side. I have right-clicked this button to open a context menu. My mouse cursor is hovering over Copy Link in this context menu.

Then, copy the link to the fresh about:blank page browser window we just opened:

The same about:blank page, except the direct download link to my Twitter archive has been pasted into the context menu. I have removed part of the download link that I'm unsure is sensitive or not.

Once you press enter in the address bar of that about:blank page, the Network Monitor will collect data about HTTP(S) request(s) the browser is making behind the scenes.

As a side effect of the HTTP(S) request(s), your browser should open a Save Dialog. You can ignore it. The important headers that curl needs should be a row in the Network Monitor tab.

Picture of Firefox Developer Tools window, with the Network
inspector tab open. The inspect shows exactly one network request with my
Twitter archive link. I have right clicked the request row to open a context
menu and have selected 'Copy Value' and 'Copy as cURL (POSIX)'
with my cursor. Other context menu options include 'Copy URL',
'Copy as cURL (Windows)', 'Copy as Powershell', and
'Copy as Fetch'.
As of this writing (2024-09-18), accessing the direct archive download link from the about:blank page should result in a lone request captured by the Monitor. The request should be a file of the form twitter-{{date}}-{{hash}}.zip; that's your archive!

Right clicking this request gives you several options to "fill in" the missing required data to download your archive using curl or other tools. You may want to try all the options to experiment.5

If you right-click this row, you will get several options to copy the headers missing from your initial wget or curl command. In my case, I have chosen Copy as cURL (POSIX), which copies a curl command-line invocation with the relevant arguments/header to your clipboard.

After I paste the clipboard curl command into my terminal with an explicit output file, the download begins at a reasonable speed. Most importantly, the download doesn't fail halfway through6 :

Terminal snippet showing my Twitter archive successfully downloading using `curl`.
A progress bar is underneath my curl invocation showing a download in progress. I have removed
arguments to `curl` that might be sensitive- mainly the cookies.
I provide an extra --output argument to curl to specify where to write the archive data, e.g.
curl https://example.com/example.zip --output example.zip
Without --output, curl will refuse to download your archive, because ZIP files are binary data. The --output argument is not part of the clipboard data from Monitor!

The net effect of going to developer console and creating a curl invocation is that Twitter cannot tell the difference between your web browser and curl downloading your archive. All is well in my archiving world for now7 .

Failure Modes

Sometimes curl may fail to download the archive, even with the relevant headers provided by the browser.

Terminal snippet showing my Twitter archive failing to download
using `curl`. Instead of showing a progress bar indicating a download has begun,
the command has immediately exited without any output. I have removed
arguments to `curl` that might be sensitive- mainly the cookies.
Despite the lack of error output, this curl invocation has failed to download an archive.

Me forgetting the output parameter is unrelated to the error above (although that is indeed another error :)).

My experience is that the time window that Twitter allows you to download the link is short. If you see the above image, you need to redo the verification to download your archive and get back to the download link page, like here:

Snippet of the Twitter webpage to download your archive. The top of the snippet shows a back arrow, and the title "Donwload an archive of your data". A "Download archive" button is on the right side.

Getting back to this page should re-verify you, and should refresh the time window that the download link is valid.

However, don't click the download link this time! The curl invocation you copied from the browser inspector earlier should still be valid, even after you get back to the direct download link page. Instead, run curl again with the params added by Monitor and an --output param, and the download should succeed8 .

Happy Downloading! Make Copies!

It's not clear to me how long Twitter will last. For various reasons, I want to have an archive of my data in case the site goes under, especially considering that present-me and future-me often need to be reminded of the context of past-me9 .

I back up my archive irregularly; maybe I should do something about that. But when I do attempt a backup, I am persistent. I don't know if it'll be the last backup I can make before my backups are the only access to (that portion of) past-me that I have left. Even if the web browser direct download works for you, let this post be a reminder to make an archive of your data if you so wish.

In the meantime, I'll work on following my own advice.

Footnotes

1 I feel like there's been a lot more of these Twitter bugs since late 2022. Couldn't tell you why, it's a mystery...

2 I also got locked out of accessing the download link at all due to repeated failures for 24 hours. Go me.

3 At the very least, Twitter wants cookie headers. But other headers may be required, and I didn't experiment to figure out the mandatory ones.

4 It's not clear to me when Monitor starts collecting data after the Developer Tools are open. To make looking at Monitor's data easier on my eyes, I prefer starting from a blank page.

This step may not be required, but I prefer to not deviate from what I know works.

5 In my case, although I run Firefox on Windows, I ultimately wanted to store my Twitter archive on an NAS running Linux.

Using curl on Linux, I can skip the intermediate download to a Windows machine. I'm sure curl on PowerShell works fine too.

6 Even if it did fail, I vaguely recall being able to use curl to resume the download from where I left off.

7 All is never well in my archiving world. I'll be organizing data until I'm six feet under, I'm afraid...

8 At least, I hope it succeeds. If you try the procedure in this post and it fails, I'm interested to know. Contact me in all the usual places!

9 In fact, it's not clear to me that the method described will continue to work for future-me (today is 2024-09-18). For instance, Google seems to have introduced elaborate code to stop youtube-dl from working, and that already has to emulate a decent chunk of a browser due to Javascript.

I'll have to deal with that when the time comes.

Last Updated: 2024-09-20