Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Advertise on LowEndTalk.com
Cloudflare "Error 525 SSL handshake failed" on Hetzner server
New on LowEndTalk? Please read our 'Community Rules' by clicking on it in the right menu!

Cloudflare "Error 525 SSL handshake failed" on Hetzner server

JohnRoeJohnRoe Member
edited February 3 in Help

Hi. I have this weird issue since early January.

I am hosting a domain on Hetzner server with multiple subdomains. Sometimes I got Error 525 SSL handshake error and usually a reload will make that error go away.

I was using the same server, the same domain, the same Nginx configurations before this and I had to reinstall the server due to a problem and after that, I intermittently getting this 525 error code.

What I have tried:

  • Upgrading and downgrading Nginx
  • Enabling debugging in Nginx log, nothing get logged when this error shows up
  • Deleting origin cert in Cloudflare and regenerate them
  • Using letsencrypt cert
  • Rebooting the server

There are few other things but I cannot recall.

I cannot think any differences between before and after reinstall. After reinstall, I got less files because I cleaned up, and I have IPv6 enabled. I disabled IPv6 before this and leave it enabled after reinstalling. I have tried allowing Nginx to only listen on IPv4, but it still happen.

Also I can access my server fine without Cloudflare proxy. This issue only happen sometimes when I turn on the proxy.

Anyone got an idea how to debug this? I have been patient for so long. I contacted Cloudflare but they suggest me to use FLEXIBLE **SSL Mode instead of **FULL which I am using now. I have no problem trying that but at least I want to pinpoint the cause first.

Any thoughts?

Thanks in advanced!

Edit:

  • Hetzner auction server, 6TB Disk, 32GB RAM

Sorry for my bad English

Comments

  • isunbejoisunbejo Member
    edited February 3

    Increase the error log level on nginx,
    Increase value sysctl session time out,enable AES on CPU and Tune ssl on nginx :

    listen 0.0.0.0:443 rcvbuf=64000 sndbuf=128000 backlog=20000 ssl http2;
     ssl_session_cache     shared:TLSSL:30m;
    
  • @isunbejo said:
    Increase the error log level on nginx,
    Increase value sysctl session time out,enable AES on CPU and Tune ssl on nginx :

    listen 0.0.0.0:443 rcvbuf=64000 sndbuf=128000 backlog=20000 ssl http2;
     ssl_session_cache     shared:TLSSL:30m;
    

    I have enabled debug in nginx logging which is I believe the highest level. Nothing get logged when I am having the 525 error.

    I have applied the rcvbuf=64000 sndbuf=128000 backlog=20000 ssl http2 and will report to you later.

    Also I already have ssl_session_cache configured.

    Sorry for my bad English

  • @isunbejo Nope, still happening.

    Sorry for my bad English

  • @JohnRoe said:
    @isunbejo Nope, still happening.

    sysctl -a |grep tcp_keepalive

    http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

  • Unfortunately no. These are the results:

    1. I am the owner
    2. I have a valid certificate. I also tried Letsencrypt cert
    3. Nginx is listening on port 80 and 443, both IPv4 and IPv6
    4. I have no idea what this is even after reading many explanation, but i assume it is property configured since I can load multiple domains with different certificates
    5. Tried
    6. This seems to be the issue I am having, but I don't know where to start troubleshooting
    7. I increased Nginx error log and nothing got logged when the error occurs
    8. When pausing, website can load fine. Still it is hard to confirm since this error randomly appearing. I also have a subdomain running with SSL without Cloudflare proxifying and it always load fine.

    Sorry for my bad English

  • I am currently using intermediate config of https://ssl-config.mozilla.org/

    Sorry for my bad English

  • webdevwebdev Member
    edited February 3

    try https://www.ssllabs.com/ssltest/

    try move ssl related config out of server tag

  • @isunbejo said:

    @JohnRoe said:
    @isunbejo Nope, still happening.

    sysctl -a |grep tcp_keepalive

    http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

    This is the output

    net.ipv4.tcp_keepalive_intvl = 75
    net.ipv4.tcp_keepalive_probes = 9
    net.ipv4.tcp_keepalive_time = 7200
    sysctl: reading key "net.ipv6.conf.all.stable_secret"
    sysctl: reading key "net.ipv6.conf.default.stable_secret"
    sysctl: reading key "net.ipv6.conf.enp2s0.stable_secret"
    sysctl: reading key "net.ipv6.conf.lo.stable_secret"
    

    Sorry for my bad English

  • @webdev said:
    try move ssl related config out of server tag

    Remove then test on ssllabs or test on sslabs before and after?

    Tested before, Got A for all
    https://i.imgur.com/2la2lK1.png

    Sorry for my bad English

  • click inside and check Handshake Simulation, if you got A, then it's your browser issue, some old browser?

  • @webdev said:
    click inside and check Handshake Simulation, if you got A, then it's your browser issue, some old browser?

    It is not just me. My users, my Jellyfin, even support staff of Cloudflare can reproduce the error. Yes I have contacted Cloudflare directly and they say they can reproduce the error as well and we are still communicating through support ticket. But since I am on free plan, it is quite slow.

    When getting 525 error, there is no error logged in Nginx error_log. A few browser refresh would solve the issue temporarily in browser. On something that cannot be refresh like downloader, android apps like Trandroid, you need to wait until the error gone by itself.

    Sorry for my bad English

  • upme88upme88 Member

    @JohnRoe Suffering the same issue as you, did you find any fix?

    Also can you confirm your NIC model? Mine is Intel i219-LM

  • umiumi Member

    tcpdump to look what's going on on a packet level + another webserver (apache,litespeed,h2o) to check if they affected too.

  • HyperK9HyperK9 Member

    Is the full cert chain in your config?
    Turn off cloudflare and try this https://www.ssllabs.com/ssltest/

  • comXyzcomXyz Member

    I got the same problem before.
    I switched the SSL setting inside Cloudflare from Full to Flxible, then switched back to Full after 5 mins. It worked.

  • upme88upme88 Member

    @comXyz What's you NIC model?

  • comXyzcomXyz Member

    @upme88 said:
    @comXyz What's you NIC model?

    Too many different models :-?

  • upme88upme88 Member

    @comXyz said:

    @upme88 said:
    @comXyz What's you NIC model?

    Too many different models :-?

    I switched to different NIC but still get the same issue sometimes when network is under load. :(

    Do you still the face the same issue sometimes?

  • comXyzcomXyz Member

    I don't think it has anything to do with NIC @uptime88

  • lighterlighter Member
    edited July 25

    @upme88 said:

    @comXyz said:

    @upme88 said:
    @comXyz What's you NIC model?

    Too many different models :-?

    I switched to different NIC but still get the same issue sometimes when network is under load. :(

    Do you still the face the same issue sometimes?

    I even had a similar problem in cloud. (Nuremberg / Falkenstein) :(

    And I'm sure this issue is not related to Cloudflare.

  • appcomqappcomq Member

    I've encountered the same issue, from the NIC driver.

    Just upgrade kernel to 4.17+

    https://community.hetzner.com/tutorials/installing-the-r8168-driver?title=Installation_des_r8168-Treibers/en

    Thanked by 1vimalware
  • Cloudflare ssl only support 1 level of subdomain. If you have sub.sub.domain.com, Cloudflare ssl won't work. You need to Grey it.

  • lighterlighter Member
    edited July 26

    @yokowasis @appcomq @upme88
    I believe it has nothing to do with Cloudflare at all. (or kernel, I have 4.19)

    This problem only occurs when the network load is high, sometimes. (~10MB/s & ~1k PPS) (I use rsync to transfer files)

    I did a lot of testing on this.
    Both Nuremberg and Falkenstein can reproduce the problem. (CPX31/CX41)(AMD/Intel)

    I think there is a fault with their firewall configuration. Looks like they "nullroute"(I don't know how to call it) my IP during the period.

    • SSH connection was disconnected
    • Sometimes error appears: "ssh_exchange_identification: read: Connection reset by peer" while reconnecting to SSH
    • "Cloudflare Error 525 SSL handshake failed" on my website. Sometimes refreshing the page could help, but the problem again and again
    • Certain addresses are unreachable
      E.g. (The results of nmap)
      office.com - 80 port and 443 port is unreachable
      github.com - 80 port and 443 port is unreachable
      google.com - 80 port and 443 port is reachable
      youtube.com - 80 port and 443 port is reachable
      Cloudflare's IP - 80 port is reachable, but 443 port sometimes reachable, sometimes not.

    The "nullroute" usually lasts for 10 to 30 minutes.
    During this period I created a new cloud instance in the same data center, everything works fine.

    @Hetzner_OL :'(

  • appcomqappcomq Member

    @lighter

    It may be caused by DDoS firewall.

    I got the answer, From Hetzner technical support mail


    Hello,

    It is not possible to turn of the DDoS protection.
    However the DDoS protection is not always active. It is only active when we detect an attack.

    You can check that yourself with a mtr to your server. When a mitigation is active, there will be a hop called "ddos-mitigation"


    Thanked by 1lighter
  • appcomqappcomq Member

    @lighter said: I believe it has nothing to do with Cloudflare at all. (or kernel, I have 4.19)

    One exception, when I upgraded to the 5.4 kernel, I didn't find any errors.

  • lighterlighter Member

    @appcomq said:
    @lighter

    It may be caused by DDoS firewall.

    Thanks @appcomq . I will check it when it happens again.

  • lighterlighter Member

    @appcomq said:

    @lighter said: I believe it has nothing to do with Cloudflare at all. (or kernel, I have 4.19)

    One exception, when I upgraded to the 5.4 kernel, I didn't find any errors.

    But I prefer to use a stable kernel version. 4.19 is the latest kernel of debian 10.4.

  • alvaroalvaro Member

    I have already faced this problem, and it has nothing to do with SSL.

    Cloudflare can send a lot of traffic to your server through just one IP, it can have packets rejected by your server's firewall (depending on the settings) or by your provider's firewall (DDoS protection).

    The solution is to whitelist Cloudflare IPs for both firewalls (I'm not sure if Hetzner will do that).

    https://www.cloudflare.com/ips/

  • lighterlighter Member
    edited July 26

    You are right @appcomq . I reproduced it again and I saw that ddos-mitigation hop (ddos-mitigation.dc1.nbg1.hetzner.com) when I check with mtr. I rebuilt the server to ubuntu 20.04 (Kernel 5.4), but it didn't help.

    I just use rsync to transfer files(10MB/s) to this server without any other network traffic, can't understand why the ddos-mitigation activated. @Hetzner_OL

    I'm tired. I have to find another provider. o:)

  • Hetzner_OLHetzner_OL Member, Provider, Top Provider

    @appcomq and @lighter -- I recommend that you respond to your last ticket to our support team about this issue. Perhaps our technician can give you an idea of what might be triggering the DDoS mitigation. Please share as much information as you can with them so they can help you.
    Unfortunately, I am not a techncian and do not have access to our Networking team's systems. --Katie

    Thanked by 1lighter

    We (Katie and Helena) will do our best to answer your Hetzner questions and pass on your feedback. Hetzner Online's not liable for any corny jokes that we make. (https://www.hetzner.com)

Sign In or Register to comment.