New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Seriously, debian...> 50MB RAM to build locale files?
raindog308
Administrator, Veteran
# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 01:19 ? 00:00:01 init [2] root 1602 1 0 01:19 ? 00:00:00 /usr/sbin/sshd root 3371 1602 0 02:09 ? 00:00:00 sshd: root@pts/0 root 3373 3371 0 02:09 pts/0 00:00:00 -bash root 4011 3373 0 02:14 pts/0 00:00:00 ps -ef # free -m total used free shared buffers cached Mem: 64 9 54 0 0 0 -/+ buffers/cache: 9 54 Swap: 0 0 0 # locale-gen Generating locales (this might take a while)... en_US.UTF-8...memory exhausted done Generation complete. #
Debian 6 x86 on a SecureDragon 32MB/64MB LEB.
Just trying to eliminate those annoying perl messages...
Update: ISO-8859-1 works. Not sure why UTF-8 requires so much more memory to generate.
Comments
UTF-16 and UTF-8 offer 16bit encoded characters, means 65536 possible characters. Obviously this needs more memory to store it, because suddenly a page with the same amount of characters takes up twice the memory.
Yeah, I got that but still...memory exhaustion just to rebuild the locale files?
Well, I guess I'll play with 8-bit for now :-)
Indeed, locales are hell.
Ultra dumb question maybe, but. For what are the locales apart of being used in local applications? (Like Gnome or so).
Imaging how much it takes to generate a Chinese GB18030 locale, 18030 characters. :P
I suppose that's not true, because UTF-8 does not use fixed bits. It ranges from 1 to 4 bytes.
From : http://stackoverflow.com/questions/464426/why-is-my-query-taking-twice-as-long-when-i-change-to-the-field-to-utf8
-- the latin1_swedish_ci character set is a single octet encoding system, meaning that every character encoded with this system takes up exactly one byte. Contrast this with the utf8_general_ci character set, where each character consists of from one to four octets per character, meaning one to four bytes are necessary to represent each character.
This has the obvious disadvantage that utf8 characters takes up more space, more memory, and most importantly, more cpu time to identify. And the most obvious advantage is that utf8 characters can encode for any unicode character.
Since this question is marked with 'query-optimization', you need to ask yourself if you really need to represent the more 'exotic' characters, or if the ones represented in single-octet systems (such as the plain ASCII-table) are enough for your needs. Since by its nature, utf8 will eat more cpu/memory.---