Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


BuyShared lu-shared02 down, anyone knows why?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

BuyShared lu-shared02 down, anyone knows why?

jetchiragjetchirag Member
edited April 2017 in Outages

Hi,
Buyshared is down from past almost 2 hour. I opened a ticket and didn't got a response. Anyone here knows why? Or any other detail?

Comments

  • alexnjhalexnjh Member
    edited April 2017

    For Las Vegas I received this email, maybe related?

    Hello,

    Earlier today we had a failure on our Brocade SuperX, our switching/layer 2 fabric in our Las Vegas facility. Our Vegas based technical staff were able to quickly (given the 8 car pile up on the freeway) get into the facility and address the failed piece of equipment.

    We're actively speaking with vendors to replace our entire Brocade deployment in Las Vegas as we've had 3 failures in 2017 alone (2 dead optics and a failing blade in our MLX core router) and 4 - 5 failures in the last 14 months. The MLX failure, while not a complete outage, caused internal routing issues where many users couldn't reach the traffic within the same network or it would be spotty.

    We're currently favoring a CIsco 6880-X for our core and Cisco 4948's for switching. We will able to then run redundant fiber runs to each rack instead of having a single central 'mega switch' like we have with the SuperX right now.

    We apologize for this all. If you wish to request an SLA please log a ticket with billing and we'll get that sorted for you.

  • @masterqqq said:
    For Las Vegas I received this email, maybe related?

    Hello,

    Earlier today we had a failure on our Brocade SuperX, our switching/layer 2 fabric in our Las Vegas facility. Our Vegas based technical staff were able to quickly (given the 8 car pile up on the freeway) get into the facility and address the failed piece of equipment.

    We're actively speaking with vendors to replace our entire Brocade deployment in Las Vegas as we've had 3 failures in 2017 alone (2 dead optics and a failing blade in our MLX core router) and 4 - 5 failures in the last 14 months. The MLX failure, while not a complete outage, caused internal routing issues where many users couldn't reach the traffic within the same network or it would be spotty.

    We're currently favoring a CIsco 6880-X for our core and Cisco 4948's for switching. We will able to then run redundant fiber runs to each rack instead of having a single central 'mega switch' like we have with the SuperX right now.

    We apologize for this all. If you wish to request an SLA please log a ticket with billing and we'll get that sorted for you.

    All my services are in Luxembourg and thanks for quick follow up. Is luxembourg affected too?

  • budi1413budi1413 Member
    edited April 2017

    @Francisco will be using cisco at the end. Suits the name. :p

    Thanked by 2jetchirag Francisco
  • I've got a reseller on lu-shared02. Been down for the past two hours

    My VPS in Lux isn't down though. Opened a ticket around 12 (currently 12.46) with no reply yet

    Once its back I'm grabbing the data and moving it elsewhere. Will probably keep my VPS with them for now though

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    Hello,

    Sorry, we don't have a night time person right now.

    LU's fine, it wasn't affected by anything like that.

    Could you PM me your IP for your site or the ticket ID?

    Francisco

  • @Francisco said:
    Hello,

    Sorry, we don't have a night time person right now.

    LU's fine, it wasn't affected by anything like that.

    Could you PM me your IP for your site or the ticket ID?

    Francisco

    PM'd you

    Even can't get on to https://lu-shared02.cpanelplatform.com/ so seems server-wide

    cPanel/WHM/Mail are all down, pings still work though

  • Pm'ed and yes, @lukehebb server-wide as main hostname isn't loading as well

  • seems to be shared 2 only in LUX though ... I'm on LU-Shared01 and everything is running fine here :)

  • Francisco PM'd me (and responded to ticket) - should be back soon. Seemed to softlock and not trip any alerts that would normally wake him up

  • That SuperX is EOL since 2010 and no support since 2016, issues are expected
    http://www.brocade.com/en/support/product-end-of-life.html

  • jetchiragjetchirag Member
    edited April 2017

    @lukehebb said:
    Francisco PM'd me (and responded to ticket) - should be back soon. Seemed to softlock and not trip any alerts that would normally wake him up

    Do they setup physical alerts like this one?

    Edit: Sorry bad joke ^

  • FranciscoFrancisco Top Host, Host Rep, Veteran
    edited April 2017

    @jetchirag said:

    @lukehebb said:
    Francisco PM'd me (and responded to ticket) - should be back soon. Seemed to softlock and not trip any alerts that would normally wake him up

    Do they setup physical alerts like this one?

    Edit: Sorry bad joke ^

    Ahaha.

    I'll be 100% honest/transparent here.

    I have different alert sounds for buyshared/etc but because of the constant litespeed crashes over the past 8 months I slept right through it. They fixed the crashes just last week with the 5.1.14 release, but it still didn't wake me.

    It is/was a soft panic, just waiting on the IPMI to reboot and it should be golden.

    Francisco

  • This is where the good ole' warchest comes in handy.

    Post network pron after everything is resolved.

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @budi1413 said:
    @Francisco will be using cisco at the end. Suits the name. :p

    I think the 6880 won't be a viable option in the end. I really liked the unit but it doesn't tick all the boxes I want. The MX240 is the likely go-to option, just waiting to see what comes back from the vendor quotes I put out for. Want to see it go in place within 60 days or so.

    Anyway, i'm actively telling the IPMI to reboot but it's being a big of a jerk about it so I got that spamming the reboot for me. Should be roses shortly.

    Francisco

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @vimalware said:
    This is where the good ole' warchest comes in handy.

    Post network pron after everything is resolved.

    For the router? Sure, we'll snag some pictures of that. Our network guy lives in Vegas now so I may see if he wants to go to the range and we take the SuperX along.

    The MLX we'll sell/keep for parts since we just bought new blades for it the other week when we had the inter-lan connectivity issue.

    Francisco

  • jetchiragjetchirag Member
    edited April 2017

    If you could be bit more honest, what's better? LU 02 or LU 01?
    because my one reseller on LU01 looks more stable

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    With that being said, things are golden again. I'll go change the sound for the buyshared alerts when I wake back up.

    Sorry about that.

    Francisco

    Thanked by 1Foul
  • It did come back now MySQL is dead?

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @jetchirag said:
    If you could be bit more honest, what's better? LU 02 or LU 01?
    because my one reseller on LU01 looks more stable

    We've had kernel panics on both.

    LU-Shared01 was a lot more stable webserver wise though since the litespeed graceful restart bug seemed to affect CentOS 7 a lot more than our CentOS 6 ones. We're talking CentOS 6 almost never had it happen and we were working on rebasing all the CentOS 7 nodes to 6 just to get that monkey off our back.

    With 5.1.14 ( http://www.litespeedtech.com/products/litespeed-web-server/release-log ) they addressed this though and we've not had any alerts for any of the nodes webservers since.

    We actively have 4 boxes on CentOS 7 and we'd have 3 - 4 notifications a day of small outages lasting 1 - 3 minutes from Litespeed restarting. They all stopped doing that at the same time with the mass update to 5.1.14.

    This is a good thing since I was actively looking into alternatives like engintron which is an NGINX reverse proxy infront of Apache. It works good but litespeed's PHP is a lot faster than what Apache has.

    Francisco

    Thanked by 1Foul
  • FranciscoFrancisco Top Host, Host Rep, Veteran
    edited April 2017

    @lukehebb said:
    It did come back now MySQL is dead?

    Should be fine again, I gave it a kick to be safe and I'll stay up a bit longer to make sure it doesn't flip out. We've seen some strange things like this:

    root@lu-shared02 [/usr/sbin]# ps -ef | grep mysql

    mysql 134961 1 1 14:33 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe

    mysql 135093 1 1 14:33 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --basedir=/usr

    mysql 135921 134961 53 14:33 ? 00:00:02 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/lib/mysql/lu-shared02.cpanelplatform.com.err --open-files-limit=50000 --pid-file=/var/lib/mysql/lu-shared02.cpanelplatform.com.pid

    mysql 136004 135093 9 14:33 ? 00:00:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/lib/mysql/lu-shared02.cpanelplatform.com.err --open-files-limit=50000 --pid-

    file=/var/lib/mysql/lu-shared02.cpanelplatform.com.pid

    root 136366 17821 0 14:33 pts/0 00:00:00 grep --color=auto mysql

    Where 2 copies of MySQL are run and we end up having file locking issues as they both try to access InnoDB. It doesn't cause any corruption but it sure makes a mess of things. I've read through the init scripts to see where it's making it do that, but haven't spotted anything fishy yet.

    Anyway, I always just kill the basedir=/usr one and its sub mysqld process and it's good for months.

    Francisco

    Thanked by 2jetchirag lukehebb
This discussion has been closed.