To my horror I've just found out that chr
doesn't work with Unicode, although it does something. The man page is all but clear
Returns the character represented by that NUMBER in the character set. For example, chr(65)" is "A" in either ASCII or Unicode, and chr(0x263a) is a Unicode smiley face.
Indeed I can print a smiley using
perl -e 'print chr(0x263a)'
but things like chr(0x00C0)
do not work. I see that my perl v5.10.1 is a bit ancient, but when I paste various strange letters in the source code, everything's fine.
I've tried funny things like use utf8
and use encoding 'utf8'
, I haven't tried funny things like use v5.12
and use feature 'unicode_strings'
as they don't work with my version, I was fooling around with Encode::decode
to find out that I need no decoding as I have no byte array to decode. I've read much more documentation than ever before, and found quite a few interesting things but nothing helpful. It looks like a sort of the Unicode Bug but there's no usable solution given. Moreover I don't care about the whole string semantics, all I need is a trivial function.
So how can I convert a number into a string consisting of the single character corresponding with it, so that for example real_chr(0xC0) eq 'À'
holds?
The first answer I've got explains quite everything about IO, but I still don't understand why
#!/usr/bin/perl -wuse strict;use utf8;use encoding 'utf8';print chr(0x00C0) eq 'À' ? 'eq1' : 'ne1', " - ", chr(0x263a) eq '☺' ? 'eq1' : 'ne1', "\n";print 'À' =~ /\w/ ? "match1" : "no_match1", " - ", chr(0x00C0) =~ /\w/ ? "match2" : "no_match2", "\n";
prints
ne1 - eq1match1 - no_match2
It means that the manually entered 'À'
differs from chr(0x00C0)
. Moreover, the former is a word constituent character (correct!) while the latter is not (but should be!).