curl
Without fail, for the past several months, every time I've tried to download my Twitter archive from my web browser, I've not been able to do so. The download gets to about 50 or so megabytes, downloading at like 50 kilobytes per second, before uncermoniously failing:
I couldn't tell you why downloading from my browser fails. Maybe it's a Firefox bug. Maybe it's a Twitter bug1 . Maybe my hardware is old. It could be a combination. Regardless of the reason, I want to download my Twitter archive, and I'm being prevented from doing so.
Each time I remember to creating an up-to-date archive, I try the direct browser link, completely forgetting that it's not going to work. And each time, I have to relearn the process I went through to successfully download my archive. So this is my attempt to document it for future-me and others.
All subsequent text assumes that you have requested your Twitter archive, and that Twitter has notified you that your archive is ready. I suggest trying to download no later than 4 days after Twitter notified you, to give you a few days to experiment before Twitter deletes the archive.
wget
Or curl
After watching the download fail a couple of times 2
, I
copied the download link to my clipboard and pasted it into wget
and curl
in frustration:
I knew in my gut that this would not work and it cannot possibly work. Yet I did it anyway. I am not the only one either. The simple solutions are tempting when one is frustrated.
The reason the download fails is that Twitter wants some extra data in the
HTTP headers3
. wget
and curl
do not- and cannot- provide these headers without your help. You
can use your web browser to find the correct data needed.
curl
Both Firefox and Chromium-based browsers provide tools to get the required data- the Network Monitor. This post focuses on Firefox because that's what I use, but the process for Chromium should be similar.
To prepare Firefox and Monitor, I open a new Firefox window, and set the
address to about:blank
4
. The Monitor can be accessed by
right-clicking the about:blank
page, and selecting Inspect
from the pop-up:
Clicking Inspect
opens the Firefox Developer Tools. The Network Monitor tab
in Developer Tools should look something like below:
The web browser and Monitor are prepared. We will get the data we need to make
a curl
suceed at downloading our Twitter archive succeed by first using our
newly-opened browser window to (start) download(ing) our Twitter archive.
You can get the direct download link to your archive by right-clicking the "Download archive" from the "Donwload an archive of your data" page, like so:
Then, copy the link to the fresh about:blank
page browser window we just opened:
Once you press enter in the address bar of that about:blank
page, the Network
Monitor will collect data about HTTP(S) request(s) the browser is making behind
the scenes.
As a side effect of the HTTP(S) request(s), your browser should open a Save
Dialog. You can ignore it. The important headers that curl
needs should be
a row in the Network Monitor tab.
If you right-click this row, you will get several options to copy the headers
missing from your initial
wget
or curl
command. In my case, I have chosen Copy as cURL (POSIX)
, which
copies a curl
command-line invocation with the relevant arguments/header to
your clipboard.
After I paste the clipboard curl
command into my terminal with an explicit
output file, the download begins at a reasonable speed. Most importantly,
the download doesn't fail halfway through6
:
The net effect of going to developer console and creating a curl
invocation
is that Twitter cannot tell the difference between your web browser and
curl
downloading your archive. All is well in my
archiving world for now7
.
Sometimes curl
may fail to download the archive, even with the relevant
headers provided by the browser.
My experience is that the time window that Twitter allows you to download the link is short. If you see the above image, you need to redo the verification to download your archive and get back to the download link page, like here:
Getting back to this page should re-verify you, and should refresh the time window that the download link is valid.
However, don't click the download link this time! The curl
invocation you
copied from the browser inspector earlier should still be valid, even after
you get back to the direct download link page. Instead, run curl
again with
the params added by Monitor and an --output
param, and the download should
succeed8
.
It's not clear to me how long Twitter will last. For various reasons, I want to have an archive of my data in case the site goes under, especially considering that present-me and future-me often need to be reminded of the context of past-me9 .
I back up my archive irregularly; maybe I should do something about that. But when I do attempt a backup, I am persistent. I don't know if it'll be the last backup I can make before my backups are the only access to (that portion of) past-me that I have left. Even if the web browser direct download works for you, let this post be a reminder to make an archive of your data if you so wish.
In the meantime, I'll work on following my own advice.
1 I feel like there's been a lot more of these Twitter bugs since late 2022. Couldn't tell you why, it's a mystery...
2 I also got locked out of accessing the download link at all due to repeated failures for 24 hours. Go me.
3 At the very least, Twitter wants cookie headers. But other headers may be required, and I didn't experiment to figure out the mandatory ones.
4 It's not clear to me when Monitor starts collecting data after the Developer Tools are open. To make looking at Monitor's data easier on my eyes, I prefer starting from a blank page.
This step may not be required, but I prefer to not deviate from what I know works.5 In my case, although I run Firefox on Windows, I ultimately wanted to store my Twitter archive on an NAS running Linux.
Using curl
on Linux, I can skip the intermediate download to a
Windows machine. I'm sure curl
on PowerShell
works fine too.
6 Even if it did fail, I vaguely recall being able to use curl
to
resume the download from where I left off.
7 All is never well in my archiving world. I'll be organizing data until I'm six feet under, I'm afraid...
8 At least, I hope it succeeds. If you try the procedure in this post and it fails, I'm interested to know. Contact me in all the usual places!
9 In fact, it's not clear to me that the method described will continue to work
for future-me (today is 2024-09-18). For instance, Google seems to have introduced
elaborate code to stop youtube-dl
from working, and that already has to emulate a decent chunk of a browser due to Javascript.
I'll have to deal with that when the time comes.
Last Updated: 2024-09-20