I have an SE VPN server deployed in Google cloud and more than 1500 clients connected to it. In some short time servers slows down with disconnect events processing, then is being killed by OOM killer. Symptoms are: number of sockets in CLOSE_WAIT state grows, and memory usage grows synchromously. After looking in packet dumps I discovererd that server fails to send FIN reply packets on time. In other words, SE VPN server closes connections (sockets) too late, and slows down in time.
I'm using v4.22-9634-beta SE VPN build. The deployment scheme please find below.
I was able to enable debug trace system and noticed that despite server has already logged LS_CONNECTION_END_1 "Connection ... has been terminated." corresponding TCP socket is being closed after additional 15 seconds. Also it doesn't explain why it takes that long (minutes) to respond to connection close request.
Another observation I made: when VPN server goes in that 'failure mode' some VPN clients start to initiate connections to server every 15 seconds. Such clients are short in number and are different every time. Client log doesn't contain any record on VPN session/connection (re-)establishing. All clients are configured to have 1 additional TCP connection. Primary connection is always in place, so it looks like client was attempting to re-establish additional TCP connection. Connection attempts took place with 15 seconds interval. I observed 4 sockets in FIN_WAIT2 (waiting for server to close connection) every time. Each of such socket had been present for 1 minute (looks like system timeout).
There were also a group of clients (different every time) that were completely disconnected and failed to re-connect. Moreover in this 'server failure mode' it is impossible to connect to server even with admin tool.
Thanks in advance.
Post your questions about SoftEther VPN software here. Please answer questions if you can afford.
2 posts • Page 1 of 1
- Posts: 1
- Joined: Wed Nov 14, 2018 8:35 am
- Posts: 2454
- Joined: Mon Feb 24, 2014 11:03 am
If you think this is a bug, posting to GitHub will give you a good response.