Short: prevents buggy ethernet from locking up Author: megacz@usa.com & AmitCP/IP Group Uploader: megacz usa com Type: comm/tcp Version: 0.5 Replaces: comm/tcp/myinterfacesuck.lha Requires: see notes Architecture: generic; m68k-amigaos myinterfacesuck-0.5 -------------------- --- SHORT: This proggy is mainly for use with buggy network(ethernet) drivers and/or broken hardware, what it does is to transparenty reinitialize the network interface after it locked up. --- HOW: When my 3c589D NIC locks up(usually with long uploads at full speed) then, the TCP kernel dont know about that at all, it does not give any errors, it looks like there is no hosts around so connections get timed out. I still dont know what causes such weird behaviour, however i have read somewhere that NIC i got is somewhat problematic and even Linux people have similar problems with it. My proggy is to prevent such "events" from lasting long, it simply emits 4k ICMP messages(small packets like 64 Bytes wont allocate free requests very fast) to a non forwardable host 255.255.255.255 every user defined amount of time. This way it causes device requests to melt(if the device is locked up then it can not free them), thus after allocating all free requestes kernel sets the error to 55(No buffer space available) and this is when reinit. comes in. Firstly it brings down symbolic interface, then the device driver, but puts it back online(this is like reset, this cures the hardware) and as a last step makes the interface up and running again. All active transfers should resume shortly without any reconnections (depends on how fast reqs. will molt, in other words if it will do that before the timeout). If you know how to detect interface lockups without sending anything over the network then please contact me asap! --- NEWS: [27-Feb-2008] 0.5 ------------------- Reduced the packet to 4000 Bytes, due to the fact that some Windgows firewalls treat higher values as someting abnormal and attempt to log that traffic. Changes in controlc handler. [14-Jan-2008] 0.4 ------------------- Did more probes and it turned out, that the best results can be achieved with 4096 Bytes(tried to reduce but this is absolute minimum). Some other small changes. Removed 0.1 completly. [14-Jan-2008] 0.3 ------------------- Lowered packet length to 1480 Bytes, removed bloating code from icmp part and fixed some minor issues. Removed the executable of 0.1 . [12-Jan-2008] 0.2 ------------------- This version adds some nice controls trough Arexx, there is still 0.1 in 'srcsrcsrc/'. I had a choice between 'getenv()' and ARexx messaging, so i picked the second method. If you open the shell and type: > rx "address MIFS1;pause" or > rx "address MIFS1;'PAUSE'" then it will stop sending anything over the socket, > rx "address MIFS1;type_here_whatever_you_want" will enable the broadcast of the ICMP echos again, the cmd might be anything except 'PAUSE'. If you switch between gateways on different interfaces from time to time then this will be surely very helpful. --- NOTES: Requires 68000+(no FPU), OS 2.04+, ~24 KiB of free memory, ARexx, AmiTCP 30b2+, Not tested with Miami(might not work)!!! Uses 'syslog()'(TCP/IP log) to tell if it detected particular error. Use of 0 to 5 to see the message in AmiTCP log file/console or 6 or 7 if you want to treat it as a debug, 8 no syslog. Interesingly or not but info and debug do not work in AmiTCP 4.6... If it turns out that this program cant help with your current settings then increase WRITEREQ(amitcp:db/interfaces) of your interface by a 2 to 4 additional requests. I found that when all requestes are already occupied and lock up occurs then this program will be simply waiting until someone free one, but you will be right thinking that this should give an error(its TCP/IP kernel design or maybe fault...). If you want to fix something and recompile then you must have SAS/C 6.5x, 'netinclude' and 'netlib' from AmiTCP 30b2 extracted to the dirs in 'srcsrcsrc/'. Then just simply 'cd srcsrcsrc' type 'compile'. Installation is pretty easy, put this thing after your 'startnet' or in 'startnet' with the arguments that match your interface and the device. You may also like to 'changetaskpri -2' so it wont be handled too often by the cpu, but keep in mind that all the hard work is done by the kernel so you wont gain too much. Anyway, this program is not very cpu intensive, i made some tests without any interval and measured MIPS of my cpu with 'bogomips' and it was still acceptable. Another thing is to remember about non-network related cpu time eaters(like PC-Task), in such case they should be pushed to -2 or even less to make this program function. Yes, i have cosidered a check if there are any active xfers so it could be checking only then, but this seems to be too much hassle and what can be possibly gained by this? There is more programs in this package see 'srcsrcsrc/source/'. Note however that these programs wont probably be updated and may contain some nasty bugs (especially 'mifsallocreq' - treat it as a one big bug ;). --- ERROR CODES: * 0 - ENOERROR Undefined error: 0 1 - EPERM Operation not permitted 2 - ENOENT No such file or directory 3 - ESRCH No such process 4 - EINTR Interrupted system call 5 - EIO Input/output error 6 - ENXIO Device not configured 7 - E2BIG Argument list too long 8 - ENOEXEC Exec format error 9 - EBADF Bad file descriptor 10 - ECHILD No child processes 11 - EDEADLK Resource deadlock avoided 12 - ENOMEM Cannot allocate memory 13 - EACCES Permission denied 14 - EFAULT Bad address 15 - ENOTBLK Block device required 16 - EBUSY Device busy 17 - EEXIST File exists 18 - EXDEV Cross-device link 19 - ENODEV Operation not supported by device 20 - ENOTDIR Not a directory 21 - EISDIR Is a directory 22 - EINVAL Invalid argument 23 - ENFILE Too many open files in system 24 - EMFILE Too many open files 25 - ENOTTY Inappropriate ioctl for device 26 - ETXTBSY Text file busy 27 - EFBIG File too large 28 - ENOSPC No space left on device 29 - ESPIPE Illegal seek 30 - EROFS Read-only file system 31 - EMLINK Too many links 32 - EPIPE Broken pipe 33 - EDOM Numerical argument out of domain 34 - ERANGE Result too large 35 - EAGAIN,EWOULDBLOCK Resource temporarily unavailable 36 - EINPROGRESS Operation now in progress 37 - EALREADY Operation already in progress 38 - ENOTSOCK Socket operation on non-socket 39 - EDESTADDRREQ Destination address required 40 - EMSGSIZE Message too long 41 - EPROTOTYPE Protocol wrong type for socket 42 - ENOPROTOOPT Protocol not available 43 - EPROTONOSUPPORT Protocol not supported 44 - ESOCKTNOSUPPORT Socket type not supported 45 - EOPNOTSUPP Operation not supported 46 - EPFNOSUPPORT Protocol family not supported 47 - EAFNOSUPPORT Address family not supported by protocol family 48 - EADDRINUSE Address already in use 49 - EADDRNOTAVAIL Can't assign requested address *50 - ENETDOWN Network is down 51 - ENETUNREACH Network is unreachable 52 - ENETRESET Network dropped connection on reset 53 - ECONNABORTED Software caused connection abort 54 - ECONNRESET Connection reset by peer *55 - ENOBUFS No buffer space available 56 - EISCONN Socket is already connected 57 - ENOTCONN Socket is not connected 58 - ESHUTDOWN Can't send after socket shutdown 59 - ETOOMANYREFS Too many references: can't splice 60 - ETIMEDOUT Connection timed out 61 - ECONNREFUSED Connection refused 62 - ELOOP Too many levels of symbolic links 63 - ENAMETOOLONG File name too long 64 - EHOSTDOWN Host is down 65 - EHOSTUNREACH No route to host 66 - ENOTEMPTY Directory not empty 67 - EPROCLIM Too many processes 68 - EUSERS Too many users 69 - EDQUOT Disc quota exceeded 70 - ESTALE Stale NFS file handle 71 - EREMOTE Too many levels of remote in path 72 - EBADRPC RPC struct is bad 73 - ERPCMISMATCH RPC version wrong 74 - EPROGUNAVAIL RPC prog. not avail 75 - EPROGMISMATCH Program version wrong 76 - EPROCUNAVAIL Bad procedure for program 77 - ENOLCK No locks available 78 - ENOSYS Function not implemented 79 - EFTYPE Inappropriate file type or format * - perhaps the only useful codes --- USAGE: myinterfacesuck [ival] - your main network interface, eth0, eth1, ... - device used by the - device unit, usually 0 - error code at which reinitialization should occur, 0 = reinit at any error - 0 - emerg, 1 - alert, 2 - crit, 3 - err, 4 - warn, 5 - notice, 6 - info, 7 - debug, 8 - no log [ival] - time to wait between checks, 1-60 seconds, default: 3 which is reasonable value and gives good results --- EXAMPLE/PROBE: Shell-A " 27.[local]RamDisk:> ifconfig eth0 down 27.[local]RamDisk:> " Shell-B " 3.[local]STUFF:myinterfacesuck-0.1> myinterfacesuck eth0 3c589.device 0 50 8 1 /// ARexx port 'MIFS1' has been attached, 'PAUSE' or 'WHATEVER' controls. /// monitoring interface 'eth0' for error 50, check every 1 sec(s) ... /// error 50 detected, transparent reinitialization in progress ... /// interface should be up and running again. " --- RECOMMENDED USAGE FOR 55: (we want to check as often as possible, but we dont care if something is xferring at full speed and loads the cpu, so thats why pri is -2) changetaskpri -2 run >nil: myinterfacesuck eth0 3c589.device 0 55 5 changetaskpri 0 --- megacz@usa.com