Greetings everyone, I was using goredo on an NFS and noticed that I sometimes ran into issues where my program would fail with the following error: run.go:234: interrupted system call /gpfs01/.../folders/.redo/1.zip.lock After doing some digging, it seems like the problem is that calling unix.FcntlFlock with F_SETLKW can be too slow over an NFS and will get interrupted (see `man 2 flock`, Section on errors [1]). Apparently there is an automatic restart mechanism [2], but it's also unreliable, so I thought it's better to handle it explicitly and basically extend the error check from if errors.Is(err, unix.EDEADLK) { to if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) { This seems to resolve the interrupted system call error above. Unfortunately I cannot realiably reproduce this error, but since the fix is reasonaly easy, I was hoping that it could be incorporated into goredo proper. I have attached the small diff, and here it is also reproduced, in case the explicit line is not clear: diff --git a/run.go b/run.go index 506fd35..5423b49 100644 --- a/run.go +++ b/run.go @@ -227,7 +227,7 @@ func runScript(tgt *Tgt, errs chan error, forced, traced bool) error { tracef(CLock, "LOCK_EX: %s", fdLock.Name()) LockAgain: if err = unix.FcntlFlock(fdLock.Fd(), unix.F_SETLKW, &flock); err != nil { - if errors.Is(err, unix.EDEADLK) { + if errors.Is(err, unix.EDEADLK) || errors.Is(err, unix.EINTR) { time.Sleep(10 * time.Millisecond) goto LockAgain } Cheers and happy belated new year Nik [1]: https://www.man7.org/linux/man-pages/man2/fcntl.2.html#ERRORS [2]: https://unix.stackexchange.com/questions/509375/what-is-interrupted-system-call