Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Unstable Hp proliant server, advice needed
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Unstable Hp proliant server, advice needed

So i am running a Hp proliant server as my home server and workstation, running windows server 2019 and a few vms. After a reboot it performs at it should, but after around 48 hours, my remote desktop get sluggish, lagging. Still cinebench shows no performance drop.

Used to have 384gb ram in this thing, but recently sold 256gb of it. Right now the specs is :

-2x intel xeon 2690v2
-128gb ddr3 ecc 1666 memory
-2 x ssd disks, not with any raid on them, disks are under 500 hours old

Any advice on where i should start ?I know you will scream OH NO WINDOOOOWS, and i know.. But because i need a workstation, it works fine. Will setting it up on vmware and run a windows vm as workstation change the stability of the server overall? Where to start?

Comments

  • DPDP Administrator, The Domain Guy

    Have you checked what’s happening/running in the background when it starts to get sluggish and laggy?

  • @thedp said:
    Have you checked what’s happening/running in the background when it starts to get sluggish and laggy?

    I have, cpu usage 0-3%, no disk io, ram under 10%

  • DPDP Administrator, The Domain Guy

    @Barnesanger said:

    @thedp said:
    Have you checked what’s happening/running in the background when it starts to get sluggish and laggy?

    I have, cpu usage 0-3%, no disk io, ram under 10%

    Weird.

    How are you connected to it?

  • If its connected to a public internet then its most likely rdp brute force.
    Try to change rdp port to something else !

  • team_traitorteam_traitor Member
    edited October 2020

    Our admin (from previous work) who manages windows server reboots the machine daily (scheduled reboot when no one is accessing) he has this problem with memory all consumed.

  • I am connected to the machiene trough local 10gbit ethernet. The servers port 3389 is closed in my router so

  • Tired of this, crashed again tonight while i am sleeping. Will move all over to vmware and run my windows as a vm and see if it improves. Could be a faulty memory module or something too, will see.

  • JSunJSun Member
    edited October 2020

    @Barnesanger said:
    I am connected to the machiene trough local 10gbit ethernet. The servers port 3389 is closed in my router so

    I can't replicate this on an older ML350G6 - 64GB, Dual CPU 24 threads, SSDs x 2 480GB, Bluetooth dongle, 3D Audio card, Nvidia silent card, multi-screen Win 10 Pro, ILO2 working, Gigabit twin NICs etc. I dual boot with Ubuntu 20.04 as well - no issues - never powered off for more than a year.

  • @JSun said:

    @Barnesanger said:
    I am connected to the machiene trough local 10gbit ethernet. The servers port 3389 is closed in my router so

    I can't replicate this on an older ML350G6 - 64GB, Dual CPU 24 threads, SSDs x 2 480GB, Bluetooth dongle, 3D Audio card, Nvidia silent card 1GB, multi-screen Win 10 Pro, ILO2 working, Gigabit twin NICs etc. I dual boot with Ubuntu 20.04 as well - no issues - never powered off for more than a year.

    Just noticed that only one of my cpus shows up right now. Will investigate later today, gotta work first..

  • The host still gets sluggish over time. After 48 hours remote desktop is useless. Any advice?

  • Are you able to run a basic diagnostic tool?

    If you can boot from usb or CDROM, why not check your ram with memtest86+ first, I would let it run in a loop maybe 2 or 3 times.

    If that's a pass, check CPU with simple stability test, maybe prime95 or similar, if it can run 24 hours plus with no errors it's probably not that.

    Other things that can cause a problem is a iffy PSU.

    Not an exhaustive list of things to check, but stability is really important.

  • My guess is you are running a HP Proliant DL360 G8.
    With a DL380 you'd probably have a RAID controller.
    If you do run the Drives behind a RAID controller, you may want to test the I/O performance.
    Consumer grade drives don't TRIM behind the HPE RAID controller (from this generation)
    Meaning the performance will degrade over time. (how much depends a bit on the SSD's model/health)

    The G8 series don't support Server 2019, at least not officially.
    Now that doesn't mean it shouldn't run just fine, but something to keep in mind.

    A thing I ran into with the DL320 G8 and Server 2019 was incorrect power readings, causing the CPU to be stuck to the lowest profile supported by the Intel Speedstep (EIST)
    A fix for this was to configure the powerprofile to "OS control".
    Not sure if it applies here, but certainly worth a try.

    Second thing you may want to do is install the latest firmwares and drivers.
    You can do both at the same time by running the Service Pack Proliant (SPP) in the OS (don't boot from the ISO, just mount in Windows and run the .bat)

    On a last note, the Second CPU not being seen is a bit iffy. You can easily run a diagnostic from within the Intelligent provisioning (F10 during boot, insight diagnostics)
    You may want to update the Intelligent provisioning before that.

    Hope this helps you on your way a bit :)

  • I have been testing ram for 24 hours now, went fine, no errors. Also been testing I/O for dome time, cannot see any slowing down here. Also cpus have been running at 12 hours at 100% just to test, also seems fine.

    The issue happens in vmware also. Tried running it off a USB drive key, and also a USB ssd, also the raid controller with dedicated ssd just to vmware. Still happens after about 24 hours.

    I just pulled ALL ram modules and barrowed a few 16gb sticks off a friend, so now the server is running with 64gb ram under windows server 2019 again. Will test this for a while and see.

    Next step i guess is to replace the whole thing. Also thinking of trying to run only one cpu for a while, and also swapping the psu (i only run one at the moment).

    Out of ideas. Time will tell.

  • HostSlickHostSlick Member, Patron Provider
    edited November 2020

    HP Gen8 (Bl460c Gen8) server of a customer recently went slow and stucking at random times due to a failing RAID Controller (P220i). No errors shown either first. ILO4 showing controller as "good".

    It sometimes already stuck when login to SSH and typing in user root

    Maybe you can try replacement of the controller as well.

  • If only RDP session is slow and background sessions inside vm works fine, change vm network driver or vm network type. I have already encountered it on vmware.

  • @keliledemne said:
    If only RDP session is slow and background sessions inside vm works fine, change vm network driver or vm network type. I have already encountered it on vmware.

    Everything gets slow, but not totally slow, if that makes sense.
    Going to try to bypass the controller and see if that makes a difference. There is a sata port on the motherboard, so ill try plug my ssd into that. Just need to find somewhere to pull power to it..

  • Just found the issue. Im missing the plastic tray the cpus are supposed to lay in on one of the cpus, making probably to low mount pressure because of this. Testing more now to see if that fixes the issue.

    Thanked by 1NewToTheGame
Sign In or Register to comment.