Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Hard disk failure or corrupted software/file-system or what else?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Hard disk failure or corrupted software/file-system or what else?

sidahmedsidahmed Member
edited October 2011 in General

I know this suppose to be asked somewhere else, but I feel belongs to here and i believe we have many expert members, So sorry for any inconvenience..


short story:

  • I cann't login using putty.
  • Sometimes I can/n't login using WinSCP.
  • I can ssh the server from other servers on LAN.
  • When I ssh from other servers, I can't change password & ls & top (authentication failed for passwd; ls & top are not working)

Long story:
I was doing some experiments on a cluster of four dedicated servers, these servers are only accessible from a master node (which is the only server connected to the internet). The home directory is shared. I usually keep two software connected at the same time, WinSCP as I need to upload files frequently and Putty to run my experiment. So WinSCP will always be connected to the master, and I use Putty to SSH the cluster after I logged in to the master.

Yesterday, while I was logged in to the cluster I noticed that WinSCP couldn't connect to master server (authentication failed). I tried to SSH again to the master, the authentication failed. Since I was already logged in to the cluster I tried to SSH the master, I SUCCEEDED!. I thought I get hacked, so I tried to change my password, but i couldn't.
After some of 10-15 minutes I get WinSCP connected. still no luck with with Putty.
top and ls stopped as well.

Again I am sorry for writing this much. but I really don't know what information could be important..

Unfortunately I can't access the cluster any more, and I can't ask the system administrator for any information or give advice of any kind. ALL what I know is that they tried to reboot the master server, but the server hang.

And I would like to learn, what do you think most likely happen??

Your thought is much appreciated.

Comments

  • JustinBoshoffJustinBoshoff Member
    edited October 2011

    Hi sidahmed

    It sounds like a storage subsystem failure."disk or controller"
    When you say it hangs on boot? dos it hang on post "bios"?

  • Thanks JustinBoshoff, Unfortunately, I don't know at which level it hangs.

    With such failure you think it may be, is it possible copy some files using WinSCP as I did?

  • JustinBoshoffJustinBoshoff Member
    edited October 2011

    Hi sidahmed

    It is very hard to tell what is wrong remotely with the limited info.
    Where you involved when the box was built?
    Do you know how the disks where configured?

    It could be that the controller is dying or that the raid set has gone faulty.
    Sometimes a controller reads and it will work and then fail and work and fail.
    A raid5 or so might act like that if say a disk dropped and it was not picked up in maintenance and then a second started to fail as the load gets very high and the disks work twice as hard to compensate for the lost disk, (this normally looks like a single disk with bad sectors where your box tries to access data and the disk goes klank klank, the box locks up for a couple of seconds and then responds with the data or an error.)
    if that happens some data will be accessible and some data not as the raid set will have some data on the disks that are still intact and some data in the antilogarithm-ed redundancy stripe but as the second disk starts failing the entire set starts to go.

    Sorry I'm rambling on and on.

    It sounds like your have a coloed setup and the remote hands have knocked off for the day?

    You need to get someone in front of the box to get more info.

    Thanked by 1sidahmed
  • sidahmedsidahmed Member
    edited October 2011

    No, I don't know how disks were configured, but I think it is a single disk. And I think what you said is very close to what happen.

    Yes, it is closed setup. But any how the admin is looking at it, but I can't ask him.

    Thanks JustinBoshoff,, this what I wanted, someone to tell me about most likely possible scenarios..

  • Cool bud, no problem. Glad to be of service!
    You need to rethink the single disk setup? please!
    A second disk is not that expensive and will save you a lot of trouble.

Sign In or Register to comment.