An Archive of Your Own
Keeping a copy of Wikipedia on your local network with Kiwix
hey can i factcheck something on your copy of wikipedia real quick
what do you mean, my copy
well the live site is overrun by chuds, i want to check the pre-2025 version you archived
i didn’t do that
what about all those urgent toots you boosted about downloading a copy
i was encouraging other people to do it, i don’t have free terabytes just lying around
it was only 19 gigs!
why didn’t you do it, then?
i thought everybody else would!
so did i!
So I was setting myself up a little local intranet1 a couple of months ago—calibre on http://ebooks.local
, media on jellyfin.local
etc, etc—and the power went a little bit to my head, culminating in installing an entire local copy of (English-language) Wikipedia. I don’t regret it.
I had about a thousand words here about the Interesting Times we live in,2 the way this is making many people worry for the few remaining public goods on the still American-administered internet, and the ways this can and can’t be resisted. I still have those thoughts, but even after a month of sitting on the first draft of this post they’re not particularly coherent and (while I’m happy to help you cross-check things should our horrible future eventuate) the fact remains that I did it because I could.
There are a few options for a Wikipedia mirror that I can think of. You could scrape the site itself, but aside for this being inconvenient for you it’s also just really rude. Wikimedia does offer database and source dumps for its wikis, which is a much more reasonable prospect, however setting up my own real Wikipedia (or developing a custom way to render that data) sounds like a lot of work. Instead, you can take advantage of mahi already done by other projects over the last couple of decades.
One of systems you see mentioned is XOWA, which unfortunately does not seem to be updated anymore.3 The more well-known alternative is Kiwix,4 which was originally intended to provide on-device rendering of Wikipedia in developing countries with poor internet but which has expanded to include many other sites and includes a server—allowing what I want to do, which is install it on one machine and read from others.
I had already installed the desktop version to play around with, and used it to download a zim
file of the top 100 articles. This works perfectly but was more than 300MB, which is a bad ratio considering that there are nearly 7 million of the things these days! That implies a total size circa 20TB, which I don’t have in any form. So how do we get down to 19GB, per the quote at the top of this post?
Basically, those top 100 are longer than average and lavishly illustrated, including videos in some cases. Ditching all media gets that file down to just 15 megs. The full Wikipedia, including thumbnail-sized images, can be had for just 110GB, and this is what I went for.5
The CLI programs you’ll want for the server are in the kiwix-tools repo (I just installed the version in the debian repositories). kiwix-serve
, aside from letting you pick the port etc, has an option to let you include your own main page to replace the default which I’m not a fan of. After setting up a service6 and the mDNS and all the other stuff that I’m deliberately trying not to get into7 I get this pretty neat setup at wiki.local
:


As you can see I have a number of other local website copies, including Project Gutenberg8 and Bulbapedia;9 you can also see that I’ve marked them with their date, with the Wikipedia archive having already been a year old when I installed it—this isn’t a deliberate choice on my part, they just don’t update it that often.
Now, pros and cons before I waffle on again. The two most important aspects to me are whether it’s a good citizen, playing nice with other things on the same server, and whether I actually use it. The answer is yes to both: it uses negligible CPU and RAM on the second-hand (and now decade old) mini-PC that I use for a home server, and it has replaced a lot of my use of the real websites it hosts even two months later. I don’t usually miss the larger pictures (if I need to track them down I can), and the site is fairly snappy and overall a great experience for going down rabbit-holes in particular.
There are cons, but they’re not deal-breakers to me. For some reason article titles are not rendered at the top of pages,10 meaning that if you come to them via a redirect or the random article button it may be take you a second to figure out where you are. Cross-wiki links, particularly from Wiktionary to Wikipedia, don’t connect internally and instead point to their original URLs. Article titles in italics are slightly garbled in search results, though not so that you can’t read them.
Search is where the biggest issue is, in that it’s not as tuned as we’ve come to expect over the last few decades. For example I wanted to know what the airport code YVR corresponded to, and the top five results for that search were:
- YVR Sustainability11
- YVR–Airport station
- Templeton station
- Vancouver International Airport
- YVR Skylynx

The answer is Vancouver International Airport, but despite “YVR” being a redirect to that page it’s not the first thing on the list. Instead this search seems entirely generated from the full text, which is actually extremely impressive given how fast it is. There is a secondary search system that works on article titles and which does place the redirect we wanted first, but I can’t figure out how to use it outside of the CLI and the drop-down on the search bar. Ultimately I think it’s fine, and I’m happy to retrain myself out of always expecting the first item given by a search to be exactly what I was looking for.
An additional piece of jank comes from our mDNS .local
domains, in that not every device can access them. For example some older Android 12 devices I have—along with, more concerningly, my flatmate’s Windows 11 desktop—don’t work with that system, and instead need to be pointed to the ip address/port directly. Otherwise they work fine, and I should note that text based browsers like w3m
and lynx
handle it quite well not least because there is less preamble before the start of the articles proper12. Old graphical systems may have issues with the images due to their use of the modern webp
format to conserve space, but don’t let that stop you.

Should you do this? If you’re the kind of person who reads this site, then probably—I expect it’s something you could set up in an afternoon and then forget about. An alternative though is to find an old android tablet or phone, either from an e-recycler or your own cupboard of obsolete electronics, and install the Kiwix app on that with the files stored on a cheap microSD card much like this person has:
I have Kiwix and an offline copy of wikipedia on my phone, and my local Gen Z was so surprised and excited by the concept that I thought we should have a dedicated thing for it. Cheap e-reader plus 128GB SD card plus no wifi.
This would be more useful to have in an extended powercut (or in the post-apocalypse, if you’re worried about that) than my desktop server solution. It’s up to you though.
What you shouldn’t do though is abandon the public internet to the chuds. An old copy of Wikipedia will still be useful, at least for some purposes, for many years, but we deserve the up-to-date version that—by some miracle—has survived more than two decades. You don’t need me to tell you to make your own website or tip your instance admin, but maybe we all also need to volunteer to edit some articles from time to time.
Without going into details, I adapted this post more-or-less wholesale to allow the same physical server be pointed to by different mDNS
.local
domains, except for the subdomain aspect.↩︎We’re not even done with the last set! I call BS.↩︎
Assuming the software itself still works you should be able to construct an up-to-date copy of the files needed from the aforementioned database dumps, but that’s a lot of work that we shan’t be bothering with.↩︎
One thing that bugs me about Kiwix is that I don’t know why it’s called that, save as a play on ‘Wiki’. I’m inherently suspicious of anyone using a Kiwi (the bird, not the fruit; that’s a whole other can of worms.) for a mascot without a clear connection to Aotearoa New Zealand—in this case the project appears to originally be French.↩︎
I can’t actually find a 19GB copy of the Kiwix ‘nopic’ file; the current version is 50GB which seems way too large. You can get a download of approximately that size as compressed XML but then you’d need to do some processing, and you can also get much smaller files that only include the introductory sections of every article or only the articles in a particular category. But I had space for the big one and I suspect so do you.↩︎
The
ExecStart
on my systemd service looks something like/usr/bin/kiwix-serve --library -p 8097 -z -c /path/to/kiwix/index.html -t 2 -L 6 /path/to/kiwix/library.xml
, where--library
means it points to anxml
library file created withkiwix-manage
rather than a list ofzim
files,-p 8097
is the port,-z
shortens URLs to remove the date part of the filename,-c /path/to/kiwix/index.html
is the handmade front page,-t 2
limits the threads used by the program,-L 6
limits the number of connections per IP, and/path/to/kiwix/library.xml
is the aforementioned library.↩︎About 85Gb, if you’re wondering. The rest are much smaller.↩︎
Yes I have a direct link to the type chart, I’m not remembering that thing.↩︎
This differs between wikis, with many others not having this limitation. Perhaps an over-zealous attempt the get the file sizes down?↩︎
As it happens the first item, “YVR Sustainability”, has been merged with and redirected to Vancouver International Airport since this copy was created.↩︎
If you’ve previously tried to browse Wikipedia—or many other websites—this way you know what I mean.↩︎