Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 16030

Perl ord and chr working with unicode

$
0
0

To my horror I've just found out that chr doesn't work with Unicode, although it does something. The man page is all but clear

Returns the character represented by that NUMBER in the character set. For example, chr(65)" is "A" in either ASCII or Unicode, and chr(0x263a) is a Unicode smiley face.

Indeed I can print a smiley using

perl -e 'print chr(0x263a)'

but things like chr(0x00C0) do not work. I see that my perl v5.10.1 is a bit ancient, but when I paste various strange letters in the source code, everything's fine.

I've tried funny things like use utf8 and use encoding 'utf8', I haven't tried funny things like use v5.12 and use feature 'unicode_strings' as they don't work with my version, I was fooling around with Encode::decode to find out that I need no decoding as I have no byte array to decode. I've read much more documentation than ever before, and found quite a few interesting things but nothing helpful. It looks like a sort of the Unicode Bug but there's no usable solution given. Moreover I don't care about the whole string semantics, all I need is a trivial function.

So how can I convert a number into a string consisting of the single character corresponding with it, so that for example real_chr(0xC0) eq 'À' holds?


The first answer I've got explains quite everything about IO, but I still don't understand why

#!/usr/bin/perl -wuse strict;use utf8;use encoding 'utf8';print chr(0x00C0) eq 'À' ? 'eq1' : 'ne1', " - ", chr(0x263a) eq '☺' ? 'eq1' : 'ne1', "\n";print 'À' =~ /\w/ ? "match1" : "no_match1", " - ", chr(0x00C0) =~ /\w/ ? "match2" : "no_match2", "\n";

prints

ne1 - eq1match1 - no_match2

It means that the manually entered 'À' differs from chr(0x00C0). Moreover, the former is a word constituent character (correct!) while the latter is not (but should be!).


Viewing all articles
Browse latest Browse all 16030

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>