Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Trying to restore old static website - will not accept Norwegian language - like ÆØÅ
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Trying to restore old static website - will not accept Norwegian language - like ÆØÅ

myhkenmyhken Member
edited March 2017 in General

I have a old website, almost 20 years old, that I used to share with my friends. Lots of fun stuff, pictures etc from we was young.
I'm trying to restore it, and upload the files to my current webserver. The site is build with Microsoft FrontPage 6.0 and contains of static .html pages.

On my home computer, all the pages is correct. It's a Norwegian site, so all the pages has a lot of æ ø å characters on it. I first tried to upload the files with FileZilla FTP, but as soon as the pages is uploaded on the server, all æ ø å is changed to �.
So my next try was to .zip the site, upload it and then extract it directly on the server.
But all æ ø å is changed to � then also.

If I edit the pages, I can change all � with æ ø å, save the page, and then it's shows correctly. But of course, it will be a pain in the ass, to replace it manually. it's lots of pages.

Why do my server convert all Norwegian characters like æ ø å to � ?
Is there any way I can stop it doing that?
And why do it work, when I edit the pages and put in æ ø å, save it, and then it's stays saved? So my server can clearly use æ ø å and show it when you access the pages.

Comments

  • joepie91joepie91 Member, Patron Provider
    edited March 2017

    The problem is likely that your browser is trying to display it in the wrong encoding. You should either send along a header from the HTTPd that specifies the encoding, or use a meta-tag with a charset attribute.

    Note that if it's 20 years old, it might be using a Norwegian codepage rather than UTF-8. You'll have to figure out the right encoding (or, preferably, convert it to UTF-8).

    EDIT: To be clear, this is not a server issue. The server just sends bytes, and lets the browser worry about turning it into text.

  • angstromangstrom Moderator

    As @joepie91 said, it's most likely a client-side encoding issue. My guess is that the text encoiding of the HTML files is CP-1252 (the legacy Windows encoding for Western European languages), which is a superset of ISO 8859-1 (the latter a.k.a. latin1).

  • joepie91joepie91 Member, Patron Provider

    @angstrom said:
    As @joepie91 said, it's most likely a client-side encoding issue. My guess is that the text encoiding of the HTML files is CP-1252 (the legacy Windows encoding for Western European languages), which is a superset of ISO 8859-1 (the latter a.k.a. latin1).

    Hmm. Does 1252 include all Norwegian characters? I thought you needed a separate codepage for that.

  • @joepie91 - ok. I use the Filemanager in Virtualmin to edit the pages, and it's also convert all æ ø å to �. But I tried downloading a page now with FileZilla, and you are correct, the æ ø å is not changed on the pages on the server, it's only changed when I edit the pages in Virtualmin Filemanager, and when it's showed in the browser.

    But it's so strange that in Filemanager, I can change � to æ ø å, then save the file, and then open it again, and then it's shows æ ø å correctly, and the file is also showed correctly in the browser.

    Here is the file in virtualmin Filemanager

    Here is the file on my computer

    Here have I edit a part of the file in Filemanager, and saved it. Then the changed part stays changed.

    And here is the changed part in a browser, the title is not changed, and contains still of �, but the changed part is showing correctly.

  • angstromangstrom Moderator

    @joepie91 said:

    @angstrom said:
    As @joepie91 said, it's most likely a client-side encoding issue. My guess is that the text encoiding of the HTML files is CP-1252 (the legacy Windows encoding for Western European languages), which is a superset of ISO 8859-1 (the latter a.k.a. latin1).

    Hmm. Does 1252 include all Norwegian characters? I thought you needed a separate codepage for that.

    Yes, pretty sure. In fact, I think that ISO 8859-1 (latin1) also includes all Norwegian characters. I guessed CP-1252 because of the MS software used to produce the files.

  • angstrom said: My guess is that the text encoiding of the HTML files is CP-1252

    Yes, you are correct. The encoding is set to charset=windows-1252 can I just change the charset to another, and what will the correct value be then?

  • angstromangstrom Moderator

    @myhken said:

    angstrom said: My guess is that the text encoiding of the HTML files is CP-1252

    Yes, you are correct. The encoding is set to charset=windows-1252 can I just change the charset to another, and what will the correct value be then?

    Well, if the text encoding is 1252, then you don't want to change that unless you change the text encoding first.

    What puzzles me is why a modern browser wouldn't understanding the encoding, given that statement. Which browser(s) have you tried?

  • angstrom said: Which browser(s) have you tried?

    I have tried Firefox 52, Chrome and Edge. The same issue on all of them.
    But if I edit the files in virtualmin, replacing all � with æ ø å then it's shows correctly in all browsers. But if I don't edit the files, it do only show �
    So it's a super strange issue.

  • angstromangstrom Moderator

    @myhken: It's still unclear to me why you say that the text encoding works locally but not at a distance.

    As an experiment, you could try: charset=ISO-8859-1

  • angstromangstrom Moderator

    @myhken said:

    angstrom said: Which browser(s) have you tried?

    I have tried Firefox 52, Chrome and Edge. The same issue on all of them.
    But if I edit the files in virtualmin, replacing all � with æ ø å then it's shows correctly in all browsers. But if I don't edit the files, it do only show �
    So it's a super strange issue.

    Well, modern browsers assume the text encoding UTF-8 by default. Your files are encoded as Windows-1252, which is incompatible with UTF-8.

    For some reason, modern browsers assume that your files are encoded as UTF-8, despite the charset statement. This is what puzzles me.

    When you edit the files in virtualmin, the files are saved as UTF-8, so this is why modern browsers interpret them correctly.

  • myhkenmyhken Member
    edited March 2017

    @angstrom

    Yes it's very strange. See here, the first picture is the index.htm file opened locally, then the next image is the same file opened on my server.

  • angstromangstrom Moderator

    @myhken: If you send me a link to this file in PM, I can take a look.

  • @angstrom PM sent. Thank you for your time.

  • angstromangstrom Moderator

    @myhken said:
    @angstrom PM sent. Thank you for your time.

    Received. Am looking at the file ...

  • After some help from @angstrom we found out that it has to be a server issue.
    I'm running CentOS 6.8 with Virtualmin on my servers, and the issue happens on all my servers. We confirmed with md5sum that the file do not change from my local computer to my webserver when it's uploaded.

    But the same file worked fine on his server.

    I then tried to upload the site to a Plesk server I have, and there it working fine, showing æ ø å just fine.

    No idea why it will not work on my CentOS 6.8 with Virtualmin servers.

  • You can use iconv on Linux to convert from 1252 to utf-8.

    Notepad++ on Windows can convert these files to utf-8.

  • angstromangstrom Moderator

    Darwin said: You can use iconv on Linux to convert from 1252 to utf-8.

    Notepad++ on Windows can convert these files to utf-8.

    True, but since @myhken has quite a few files and because they are displayed correctly when viewed locally, the puzzle was why they aren't displayed correctly when they are on his server.

    Descriptively, what seems to happen is that his web server adds a header charset=UTF-8 when serving the HTML files, which overrides (for the browser) the statement charset=windows-1252 in the HTML files.

    Thanked by 1myhken
  • WSSWSS Member
    edited March 2017

    Just setting a different code page isn't the brightest. Anything that isn't ASCII-127 should be using extended characters, like ä

  • The issue have to be with Virtualmin, since my Plesk server is also running CentOS.
    The files workes 100% on Plesk, but not on Virtualmin. I have not done anything with the files on Plesk, just uploaded my .zip file and extracted the files on the server.

  • WSSWSS Member

    @myhken said:
    The issue have to be with Virtualmin, since my Plesk server is also running CentOS.
    The files workes 100% on Plesk, but not on Virtualmin. I have not done anything with the files on Plesk, just uploaded my .zip file and extracted the files on the server.

    Did you check the character codes for whatever font is being given? Since it's being fed an actual 8 bit character, and it thinks it's UTF, you'll get whatever the font+charset has. This isn't uncommon. This is why it's so easy to see people who use Word to edit files (the strange quotes). If changing the character to that above representation (ä) fixes it, you've directly identified the issue.

  • angstromangstrom Moderator

    @WSS said:
    Just setting a different code page isn't the brightest. Anything that isn't ASCII-127 should be using extended characters, like ä

    Well, I agree that this would be the safest strategy.

    At the same time, for anyone whose language has accented characters, it's not necessarily the most convenient strategy.

    The various text encodings are part of the HTML standard, so there's every reason to make use of them if one finds them convenient.

    (@myhken originally used MS FrontPage to produce the HTML files, but this is a detail.)

  • I offer another possibility:

    Your FTP client borks it on upload as it is text.

    Upload file in zip or via manager/panel, then try again.

  • WSSWSS Member

    @William said:
    Upload file in zip or via manager/panel, then try again.

    He already tried that.

  • angstromangstrom Moderator

    @WSS, @William: Although I'll let @myhken speak for himself, you're a bit late to the rescue operation. :-)

    The text encoding of his HTML files is indeed CP-1252 and no corruption on the upload. I put his files on my server, we checked MD5 sums, etc. The problem concerns how his (one) web server serves the HTML files -- this is the conclusion.

  • WSSWSS Member

    Have him remove AddDefaultCharset from the httpd config. That still doesn't mean that his HTML isn't wrong in 2017.

  • WSS said: Have him remove AddDefaultCharset from the httpd config.

    It's not in my httpd.conf file

    WSS said: That still doesn't mean that his HTML isn't wrong in 2017.

    Still, do that explain why the same files work fine on a CentOS 6.8 server with Plesk, but not on a CentOS 6.8 server with Virtualmin?
    And it's not just one server, I have tried it on all my three CentOS 6.8 servers with Virtualmin. Same issues on every Virtualmin server.

    William said: Upload file in zip or via manager/panel, then try again.

    Like @angstrom said, I have tried that. Uploading the files/folders with FileZilla, uploading the site in a .zip file, then extracted the file from terminal with unzip xxx.zip and I have also tried to upload both the .zip file and the files with Virtualmin filemanager.
    All three ways fail on Virtualmin server.
    But uploading and extracting the files on my Plesk server, workes 100%.

  • WSSWSS Member

    @myhken said:
    Still, do that explain why the same files work fine on a CentOS 6.8 server with Plesk, but not on a CentOS 6.8 server with Virtualmin?

    You're completely correct, because Virtualmin and Plesk are exactly the same product and do everything in exactly the same way.

    Try this in your .htaccess:
    IndexOptions +Charset=WINDOWS-1252

    Obviously you'll need to support AuthConfig for that to work- or set it as DefaultCharset for your whole subdomain.

  • angstromangstrom Moderator

    @WSS said:

    @myhken said:
    Still, do that explain why the same files work fine on a CentOS 6.8 server with Plesk, but not on a CentOS 6.8 server with Virtualmin?

    You're completely correct, because Virtualmin and Plesk are exactly the same product and do everything in exactly the same way.

    Try this in your .htaccess:
    IndexOptions +Charset=WINDOWS-1252

    Obviously you'll need to support AuthConfig for that to work- or set it as DefaultCharset for your whole subdomain.

    @WSS: You're right that a .htaccess file (added by Virtualmin?) may be the culprit, but I would first suggest looking at what it contains (if there is such a file): it may be sufficient to remove any UTF-8 stipulation.

  • WSS said: Try this in your .htaccess

    I have checked my server now, and none of my static domains has .htaccess files, but my WordPress domains has it.
    But if a such file would have something to do with this, why is the file showing like it should if I edit it in filemanager and add æ ø å, then save the file?

    And I have also tried to just copy all the text in the index.htm file from notepad and into my filemanager, saving the file, and then it's shows the correct language.

    The file is 100% the same as the file I uploaded, but it's saved in filemanager.

    But it's a lot of pages on this site, so it will take some time doing that on each page.

    But I do not want to waste anybody else time with this strange issue. The site is working fine on my Plesk server, so I just host it there.
    Don't have any more sites from 1999 I want to put online again :D :D

    Thanked by 1angstrom
Sign In or Register to comment.