Skip Navigation
 Search | Directories | Reference Tools
UW Home > Discover UW > IT Connect > Unix 

Running Big Jobs Politely

A "big job" is any compute-bound job which requires over one minute of CPU time to run. In order to run such jobs without drastically slowing down the entire system and inconveniencing all users, certain procedures should be followed when you are running big jobs.

By default, most programs will run at nice value 0. But, the following minumum nice values are suggested for big jobs:

                CPU time                Minumum Nice Value
             1 -  5 minutes                      4
             5 - 15 minutes                      8
            15 - 30 minutes                     11
            30 - 45 minutes                     14
            over 45 minutes                     17

To achieve this - suppose the program is called `a.out'. To run it in the background with nice value `nv', enter:

        nice +nv a.out &

If you using the C-shell, you can limit the CPU usage of your jobs with the following C-shell command:

        limit cputime [max. usage in seconds]

Thus 'limit cputime 100' would allow your job to run for a maximum of 100 seconds. A disadvantage of this is that your job may be aborted when it's almost done. See 'man csh' for more information.

Some people run 2-hour CPU-time jobs only to discover afterwards that the program didn't do what they wanted. Avoid this. Debug your program using small test cases until you're sure you've got it right. Only then should you run the big monster.

If you have a very long job, you should try to have the program write out intermediate results so that you can restart the program at its half-way point, or whatever.

By default, output to files is buffered. That is, the output is saved up and written out in big blocks for efficiency. If you have a long program with a relatively small amount of output you should unbuffer the output. You can do this in f77 with the subroutine "flush(3f)". The call is

        call flush(lunit)

where lunit is the logical unit number. It will cause any output buffered for logical unit lunit to be written out immediately.

Another way to run jobs politely is to use the command `at' to run your job at some particular time during the night. See `man at'.