References:
To install condor:
Next, edit /etc/condor/condor_config.local and add the following lines (the file is blank to start)
Now that you have a running condor pool, issue
To run a sample job, let us first create a sample job. We'll use reference 2 listed at the top, and steal the sample c code from there:
.
- Creating a mini-cluster with HTCondor: https://www.youtube.com/watch?v=mt5dtDrOt2g
- Submitting your first condor job: http://research.cs.wisc.edu/htcondor/tutorials/fermi-2005/submit_first.html
To install condor:
sudo apt install htcondorPart of this installation process, you'll be run through the condor configuration screens. On these screens:
- Manage initial HTCondor configuration automatically? Yes
- Perform a "Personal HTCondor installation"? No (since we're setting up a cluster)
- Note: If you just want to setup a single node installation, you say yes here. This will bind condor to the local loopback 127 prefix address, and will not ask you any of the below questions.
- Role of this machine in the HTCondor pool:
- select appropriate roles
- File system domain label: <leave blank>
- user directory domain label: < leave blank > (unless you know what you're doing)
- Address of the central manager: <your central manager >
- Machines with write access to this host: < comma separated list >
Next, edit /etc/condor/condor_config.local and add the following lines (the file is blank to start)
CONDOR_ADMIN = <username>@<hostname>PS: to inspect a particular variable's value and where it is coming from, you can use
CONDOR_HOST = <hostname>
ALLOW_WRITE = <comma separated list >
condor_config_val -v <VARIABLE>At this point, you can run condor by starting the service using
sudo service condor startHowever, one would rather set it up to autostart. You can do that with
sudo update-rc.d condor enableAll of the above steps are done on all hosts. The CONDOR_HOST variable is what decides which host is your master.
Now that you have a running condor pool, issue
condor_statusto see the status of the pool
To run a sample job, let us first create a sample job. We'll use reference 2 listed at the top, and steal the sample c code from there:
This can be compiled by:#include<stdio.h> main(int argc, char **argv) { int sleep_time; int input; int failure; if (argc != 3) { printf("Usage: simple <sleep-time> <integer> \n"); failure = 1; } else { sleep_time = atoi(argv[1]); input = atoi(argv[2]); printf("Thinking really hard for %d seconds...\n", sleep_time); sleep(sleep_time); printf("We calculated: %d\n", input * 2); failure = 0; } return failure; }
Create a submit file with contents:% gcc -o simple simple.c
Submit job using the command:Universe = vanilla Executable = simple Arguments = 4 10 Log = simple.log Output = simple.$(Process).out Error = simple.$(Process).error Queue Arguments = 4 11 Queue Arguments = 4 12 Queue
condor_submit submitAnd check the status of the job using the command:
condor_qIf the job gets held, release it by using the command:
condor_release -allMy condor install kept setting some of my jobs to held state. So I ran a
watch -n 10 condor_release -ato keep releasing jobs every 10 seconds. Haven't figured out the right fix on this one yet.
.
No comments:
Post a Comment