The MEXEC package: Magic Exec for MOSIX + MPMAKE: Mosix parallel make
---------------------------------------------------------------------

Introduction:
=============
Mexec is an emulation of the "execve" system-call (and related calls),
magically assigning the process to an appropriately available MOSIX node.
The emulation was designed to be sufficiently accurate for the needs
of "GMAKE", although it could also be used for many other applications.

This package assumes that the work is carried in a directory that is
commonly visible to all participating nodes: this can be accomplished
by NFS, GFS, CODA or any similar file-system.

Installation:
=============
1) run "make install" (as Super-User)
	This must be run on a MOSIX system - or else the following
	MOSIX #include files will not be found:
		/usr/include/linux/mosctl.h
		/usr/include/mos/defs.h

   Results:
	/usr/bin/mexecvp
	/usr/sbin/mexecd
	/usr/lib/libmexec.a
	an empty directory: /var/mexec/sent/

   Those files/directories need to be copied to all participating nodes.

2) spawn the daemons:
	in the testing phase, you may wish to spawn the daemons manually by
	running "/usr/sbin/mexecd" (as super-user) on each participating node.
	Later on, you may wish to make the daemon start automatically by
	copying the executable file "mexecd" from this distribution to
	/etc/rc.d/init.d/mexecd and linking it to the appropriate run-levels
	(possibly using the run-level editor: it works on RedHat - may need
	to be modified for other Linux distributions).

	In the testing phase, if you wish to see the jobs allocated to each
	server, run "make serverdebug" before the "make install".
	(do not run "make debug" as compiling "mexecve" with debugging may
	 cause "gmake" to fail).

	To limit "/usr/bin/mexecd" from receiving unauthorized requests
	from untrusted nodes - or even from the internet, it should be
	called with pairs of arguments, of which the first specifies an
	IP address and the second specifying an IP mask of bits that must
	be matching that address.
	For Example:
		mexecd 101.102.103.1 255.255.255.0 104.105.106.1 255.255.0.0

	Will accept calls from any computers with IP addresses 101.102.103.xx
	or 104.105.xx.yy .
	It will also always except calls from the local-host.

Running your own programs:
==========================
1) Recompile your program, replacing some or all occurences of
   "exec" by "mexec", so "execl" becomes "mexecl", "execv" becomes "mexecv",
   "execve" becomes "mexecve" and "execvp" becomes "mexecvp".

2) Change-Directory to a common directory commonly visible to all
   participating nodes.  Use 'pwd' to verify that all nodes see this
   directory in the same place (AMD can sometimes cause a directory
   to be seen in 2 places).

3) set the environment variable NODES to the list of participating MOSIX nodes,
   for example:
	setenv NODES 1-3,5,8-11
   (this could also be eventually placed in your ".login" file,
    as specifying extra nodes that are in fact down, or not-serving,
    is not fatal and only causes a tiny delay)

4) If you wish to run a node-dependent environment-filtering shell-script
   before each application of "mexec", set the environment variable ENVSETTER
   accordingly, for example:
	setenv ENVSETTER /usr/local/bin/efilter
   (each participating node should then have an executable shell-script named
   "/usr/local/bin/efilter")

Preparing and running MPMAKE
============================

1) prepare a "make":
	Obtain the "gmake" sources (we used GMAKE version 3.77).
	Add the following 2 lines to the "gmake"'s "Makefile":

	CFLAGS += -Dexecve=mexecve -Dexecvp=mexecxvp
		(place after the initial declaration of CFLAGS)
	LIBS += -lmexec
		(place after the initial declaration of LIBS)
	then "make" the make.

	(note that in CFLAGS we use "mexecxvp" rather than "mexecvp":
	 this is because "gmake" uses "vfork", which prevents the parent
	 from continuing without a "real" exec taking place).

	install the new "make" where you want to run it.

2) Copy the sources that you wish to compile to a directory seen by
   all participating nodes at the same spot, and with the same permissions:
   (this could accomplished be NFS, GFS, CODA, etc.)
   Use 'pwd' to verify that all nodes see the directory in the same place
   (AMD can sometimes cause a directory to be seen in 2 places).
   It was found that unless the NFS-server's computer and disk are very fast,
   the NFS-server can become a bottleneck - this was alleviated by placing
   the sources on an MFS (memory-file-system) on the NFS-server
   (GFS has no such problem, of course).

2) Make sure that the clocks of all participating nodes are synchronized.

3) Change-Directory to the sources-area.

4) set the environment variable NODES to the list of participating MOSIX nodes,
   for example:
	setenv NODES 1-3,5,8-11
   (this could also be eventually placed in your ".login" file,
    as specifying extra nodes that are in fact down, or not-serving,
    is not fatal and only causes a tiny delay)

5) If you wish to run a node-dependent environment-filtering shell-script
   before each branch of the "make", set the environment variable ENVSETTER
   accordingly, for example:
	setenv ENVSETTER /usr/local/bin/efilter

6) If you either placed the parallel make in the standard path, or you are
   satisfied with only one level of parallelization, simply run:
	{make} -j{number-of-parallel-branches}

   Suppose, however, that you placed "make" in /bin/p_make, because you wish
   to retain the original standard "make" AND your Makefile calls for several
   levels of parallelization that you wish to use, you should then make sure
   that "/bin/p_make" is copied to all participating nodes and run:
	/bin/p_make -j{nn} MAKE=/bin/p_make

Advanced multilvel scheduling:
------------------------------
If some brances of your program use "mexec" recursively on their assigned
nodes (such as the call to "/bin/p_make" in the example above), they will
see a different image of the "/var/mexec/sent" directory, which can make
performance less than optimal.
It is possible to improve the performance further if they all used the
same directory, by assigning it to a common file-system, however, to use
this option you must first make sure that the clock-times on that common
file-system's server is exactly identical to the clock-time of all
participating nodes.
To specify a common directory instead of "/var/mexec/sent", run:
"make clean ; make install SENTDIR={directory_name}".

CAVEATS:
--------
If you compile this package on a Linux distribution that does not support
"sighold/sigrelse" (RedHat 5.1 does not, RedHat 6.0 does), then there is
a very slight possibility that some branch of the "make" will continue to
run to completion even after you interrupt the whole "make" from the keyboard.

The detour via "mexecxvp", rather than using "mexecvp" directly, is only because
GMAKE relies on the properties of "vfork", requiring a *real* exec before the
parent can continue and fork another branch:  This makes it not 100% compliant
with "execvp" - in case of executable-errors, but since the only executable
used by "make" is "/bin/sh" anyway, and since it is always present, there is
no detriment to "make".  However, if your "make" does not use "vfork", you can
bypass "mexecxvp" and use instead:
	CFLAGS += -Dexecve=execvme -Dexecvp=mexecvp

The ENVSETTER script should not produce any output or errors, since they
would become part of the program's output and some Makefiles do not like it.
If the script needs to produce diagnostics, they should be sent to /dev/console
or a similar device.

There is no authentication mechanism on the server to verify the real UID/GID
of its client: a malicious user could edit and re-compile "mexec.c" in a way
that would tell the "mexecd" daemon that it has a UID other than his/her
real one (or even the Super-User):  hence, all users on the allowable calling
nodes must be trustable.

A simple test-program, "test" is included in the distribution to verify that
the daemons are working properly
(run "make test", followed by "./test any_program args").
