What is a lockfile
You may have experienced it before, you create a cronjob to change some data every X hour or minutes and one day this job takes longer than it usually does and cron spawns another job before the first one is finished.
This can result in data corruption or deletion of data that should not have been deleted, all depending on what the cronjob is set up to do
To prevent bad things from happening, a good rule of thumb is to always use a lockfile
A lockfile is a small file, it virtually takes up no space, at least so little you won’t care (The actual size depends on your filesystem). Sometimes it contains a PID, sometimes a timestamp or just plain empty. Depending on how the lockfile is managed
How lockfiles work
There are multiple ways to write a lockfile, i’ll explain the basics of a lockfile here, in two different ways
Empty file
The first and most simple way is to make your script/program check if a file exists at the beginning of a script, let’s say the filename is /var/lock/myscript.lock
If the lockfile exists, then just exit the script since it seems like the script is already running based on the existence of the lockfile. However if the lockfile does not exist, then create it and continue on with doing what the script has to do
When the script is done doing it’s job, the lockfile has to be deleted before the script exists
That’s basically it, the lockfile is just a file indicating that the script is already running. However this method with just an empty file has one big problem and advantage.
If the script fails and exits before it gets to delete the lockfile, the script will never run again before you go in and delete the lockfile manually, or if your server reboots/crashes while the script is running you will have the same problem
However that is not necessarily bad and can be useful in some cases. Sometimes your script may be written to do some changes that can not be restarted if interrupted before it’s finished, in this case this type of lockfile is a must because the script will not restart on it’s own before you delete the lockfile manually to let it
Lockfile with PID
Let’s say your script name is myscript.sh and the lockfile is located at /var/lock/myscript.lock
If the lockfile exists, your script will read it to see if it has any content, if it finds data in the file, the script will assume it’s a PID (Process ID, every process gets an ID. The ID is just a number starting from 1 which is incremented by 1 for every process spawned) and check if a process with that ID is running
If no process with the PID from the lockfile is found or the lockfile does not exists at all, the script will create the lockfile with the current running scripts PID (Process ID) as the content of the lockfile. Nothing else, just the PID
This way, you do not have to delete the lockfile when done, and in case of a script or system crash your lockfile will still be there, but it does not matter since the script with the PID from the file is no longer running so when the script runs again, it will not find it doing the check at the beginning and therefore write the new PID into the lockfile and continue on with it’s job
I have used this method in multiple scripts and even though it has some downsides, for example if a process with the same ID is spawned (PID’s are reused when you hit the max). But I have never run into any problems like this
Lockfiles the “hard” way
I call it the hard way because it requires you to add some code to your script, it’s not really hard but it’s not as easy as the easy solution further down in this post, but it helps people who are new to lockfiles to understand how it works
Adding the following code on top of a bash script will:
- Create the lockfile if it does not already exists
- Read the data from the lockfile
- Check if a process with the PID matching the data from the lockfile is running
- If no process with the PID is running, then write the current PID to it
- However if a process with the PID from the lockfile is running, then just exit the script
Here is the code for a bash script with comments:
# Variable to hold the location of the lockfile lf=/var/lock/myscript.lock # Create empty lock file if none exists touch $lf # Read the content of the lockfile into a variable read lastPID < $lf # If lastPID is not null and a process with that pid exists, exit the script [ ! -z "$lastPID" -a -d /proc/$lastPID ] && exit # Write the PID of the current running script to the lock file echo $$ > $lf # Your code goes here and will do it's job from this point on. No further code related to the lockfile is needed
And here is the code to use just an empty lockfile:
# Variable to hold the location of the lockfile lf=/var/lock/myscript.lock # Check if the lockfile exists, exit if it does [ -f $lf ] && exit # Create the lockfile touch $lf # You script has to do it's job here # At the very end of the script or before it exists, delete the lockfile rm $lf
Lockfiles the easy way
Above i showed you how a lockfile works, and the “hard” way to manage them
Now let’s look into the easy way. This method however required you to install a tiny program that is in the official repositories
The program is called “flock”
To install flock run the following command:
Debian:
apt-get install flock
RHEL/CentOS:
yum install flock
Once installed, it’s really easy to use with the following syntax:
/usr/bin/flock -n /path/to/lockfile.lock /path/to/myscript.sh
the -n makes flock exit in case the script is already running, without the -n flock will wait until the first process is done
That’s it, flock will handle it all for you then, just run the script with flock in front of it every time you run it and you will be safe from the script accidentally running multiple times in parallel
Process ID numbers start at 1 and increase, but they have an upper limit depending on the system and configuration. You need to take this into account and add a test to see if the process that has the ID is the same as the script that created it.
That way if another process is using the same PID your script won’t prematurely terminate.
Great explanation! Thanks
You may want to say something about atomic writes and race conditions.
ie. for your second example if two copies for the script are running at the same time, they both check there is no file then both touch it then both think they have the lock. A simple workaround is to use mkdir instead of touch and create a directory that way one of the script will get an error that the dir already exists.
Agreed that something like flock avoids this.
very nice article.