The static ZIM mirror is lightweight to download and host (and requests are easy to cache), it has full-text search, but it has no interactivity, talk page history, or Wikipedia-style category pages (though they are coming soon). A caching proxy is the most lightweight option, but if the upstream servers go down and a request comes in that hasn't been seen before and cached it will 404, so it's not a fully redundant mirror. Users should expect their mirrors to be able to serve articles with images and search, but should not expect it to look exactly like on the first try, or the second.Įach method in this guide has its pros and cons. Setting up a Wikipidea mirror involves a complex dance between software, data, and devops, so beginners are encouraged to start with the static html archive or proxy and before attempting to run a full MediaWiki Server. □Don't expect it to look perfect on the first try Run a full MediaWiki server (hardest to set up, ~600GB for XML & database, high CPU use).Serve the static HTML ZIM archive with Kiwix (10~80GB for compressed archive, low CPU use).Run a caching proxy in front of (disk used on-demand for cache, low CPU use).□ There are several ways to host your own mirror of Wikipedia (with varying complexity): Production also runs a number of extra plugins and modules on top of MediaWiki. itself is powered by a PHP backend called WikiMedia, using MariaDB for data storage, Varnish and Memcached for request and query caching, and ElasticSearch for full-text search. Download a compressed Wikipedia dump from (79GB, images included!) Download the Kiwix-Serve static binary from This aim of this guide is to encourage people to use these publicly available dumps to host Wikipedia mirrors, so that malicious actors don't succeed in limiting public access to one of the world's best sources of information.Ī full English clone in 3 steps. I'm also a big advocate for free access to information, and I'm the maintainer of a major internet archiving project called ArchiveBox (a self-hosted internet archiver powered by headless Chromium). Growing up in China behind the GFC I often experienced Wikipedia unavailability, and in light of the recent DDoS I decided to make a guide for people to help demystify the process of running a mirror. Wikipedia's infrastructure (2 racks the USA, 1 in Holland, and 1 in Singapore, + CDNs) cant always stand up to large DDoS attacks, but thankfully they provide regular database dumps and static HTML archives to the public, and have permissive licensing that allows for rehosting with modification (even for profit!). Unfortunately, Wikipedia attracts lots of hate from people and nation-states who object to certain articles or want to hide information from the public eye. The pretty HTML version is here and the source for this guide is on Github.Ī summary of how to set up a full mirror using three different approaches.ĭid you know that just runs a mostly-traditional LAMP stack on ~350 servers? (as of 2019) With Nginx, Kiwix, or MediaWiki/XOWA + Docker Originally published on.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |