We have collected/mirrored and put up for anon FTP some GB of tech reports, papers, etc about distributed systems in general, with emphasis and load balancing, scheduling, mapping, checkpointing.
Slightly outdated information about our research in load
balancing and process migration (P-Beam) as well as the list of
published papers can be found in our load
balancing project home page.
From: shap@cobra.cis.upenn.edu (Jonathan Shapiro)
Newsgroups: comp.os.research
Subject: Re: Checkpointing
Date: 16 Dec 1995 20:15:02 GMT
Organization: University of Pennsylvania
Message-ID: <4av9c6$mi9@darkstar.UCSC.EDU>
References: <4anocr$3dl@darkstar.ucsc.edu> <4aq11q$dmo@darkstar.UCSC.EDU>
You might also want to check out the KeyKOS home page:
http://www.cis.upenn.edu/~KeyKOS
They have an extremely light-weight global checkpoint mechanism.
A paper describing it and various others on the system can be
found from that home page.
From: pstephan+@RUBIX.MC.CS.CMU.EDU (Peter Stephan)
Newsgroups: comp.parallel.pvm
Subject: ANNOUNCE: Release of Dome (version 1.0)
Date: 23 May 1996 17:08:58 GMT
Organization: Carnegie Mellon University
Message-ID: <4o263a$foj@cantaloupe.srv.cs.cmu.edu>
-------------------------------------------------------------------
Announcing the release of
Dome
version 1.0
(Distributed object migration environment)
-------------------------------------------------------------------
Overview
--------
Dome, the Distributed object migration environment, provides a C++
library of distributed objects for parallel programming. These
objects perform dynamic load balancing and support fault tolerance.
Programmers using Dome can, with modest effort, write parallel
programs
that are automatically distributed over a heterogeneous network,
dynamically load balanced as the program runs, and able to survive
compute node and network failures. Thus, Dome provides a means for
writing simple simple and efficient distributed programs.
The focus of the Dome system is to support parallel programming over
networks of workstations. Dome's load balancing and fault tolerance
play an integral role in producing efficient and survivable parallel
programs in such an environment. Dome uses a single program multiple
data (SPMD) model to perform the parallelization of programs which
use the Dome library, and Dome uses PVM to provide its underlying
process control and message passing.
The Dome system is available in a package via anonymous ftp. The
package includes the Dome source code, makefiles, related build
scripts, documentation, and example programs. To obtain the Dome
package login via anonymous ftp to ftp.cs.cmu.edu. The directory
project/dome will contain the file dome1.0.tar.Z and a README file.
The dome1.0.tar.Z file contains the Dome system in compressed, tar
format.
More information on the Dome project is available at
http://www.cs.cmu.edu/~Dome
The authors of Dome can be contacted at dome-help@cs.cmu.edu.
-------------------------------------------------------------------
* Dome version 1.0: Distributed object migration environment *
* Carnegie Mellon University *
* Authors: J. Arabe, A. Beguelin, B. Lowekamp, E. Seligman, *
* S. Simon, M. Starkey, P. Stephan, and K. Walker *
* (C) 1996 All Rights Reserved *
-------------------------------------------------------------------
One of the Papers is:
James S. Plank and Micah Beck and Gerry Kingsley and Kai Li:
Libckpt: Transparent Checkpointing under Unix.
In Usenix Conference Proceedings, New Orleans, January 1995.
plank.html
"Checkpointing is a simple technique for rollback recovery: the state
of an executing program is periodically saved to a disk file from
which it can be recovered after a failure. While recent research has
developed a collection of powerful techniques for minimizing the
overhead of writing checkpoint files, checkpointing remains
unavailable to most application developers. In this paper we describe
libckpt, a portable checkpointing tool for Unix that implements all
applicable performance optimizations which are reported in the
literature. While libckpt can be used in a mode which is almost
totally transparent to the programmer, it also supports the
incorporation of user directives into the creation of checkpoints.
This ``user-directed'' checkpointing is an innovation which is unique
to our work."
Resource Management
Also available as Technical Report TR94-1468, Department of Computer Science, Cornell University, USA, November 1994.
Newsgroups: comp.parallel
From: itf@mcs.anl.gov (Ian Foster)
Subject: Mirror Sites for DESIGNING & BUILDING PARALLEL PROGRAMS
Message-ID: <81687650327264@dalek.mcs.anl.gov>
Organization: Math and Computer Science, Argonne National Laboratory
Date: Mon, 20 Nov 1995 14:08:23 GMT
Many of you have seen the text, "Designing and Building
Parallel Programs", available both from Addison-Wesley and
(thanks to A-W's enlightened publishing policies) on the Web
I'm glad to announce that the online version is now also
available at two mirror sites:
http://www.cs.rdg.ac.uk/dbpp/
http://www.qpsf.edu.au/mirrors/dbpp/
Thanks a lot to Jonathan Chin and Paul Pritchard for making these
available.
Additional mirror sites will probably be added in the future;
these will be listed at: http://www.mcs.anl.gov/dbpp/mirror_sites.html
Happy reading!
Ian Foster.
Designing and Building Parallel
Programs