|
|
 |
 |
 |
 |
Python Programming Language
|
 |
 |
 |
 |
 |
 |
 |
 |
How to print this character u'\u20ac' to DOS terminal
Who could explain the follow issue ? >>> print u' ' >>> print u''
Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'gbk' codec can't encode character u'\x80' in position 0: il legal multibyte sequence
or I just put the unicode number >>> print u'\u0394' >>> print u'\u20ac'
Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'gbk' codec can't encode character u'\u20ac' in position 0: illegal multibyte sequence
My terminal is cmd.exe under windows XP. what's the different between the two character ? what can I do if I want to print the u'\u20ac'?
schrieb: > Who could explain the follow issue ? >>>> print u'\u0394' > >>>> print u'\u20ac' > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeEncodeError: 'gbk' codec can't encode character u'\u20ac' in > position 0: > illegal multibyte sequence > My terminal is cmd.exe under windows XP. > what's the different between the two character ? what can I do if I > want to print the u'\u20ac'?
The problem is that your terminal uses (some form of) the GBK encoding; see http://zh.wikipedia.org/wiki/GBK for details on GBK. It seems that GBK (or, rather, code page 936) supports the delta character, but not the euro sign. To change that, you can use "chcp" in your terminal window. For example, if you do "chcp 850", you should be able to display the euro sign (but will simultaneously use the ability to display the letter delta, and the chinese letters). I don't know whether the terminal supports an UTF-8 code page; you can try setting the terminal's code page to 65001 (which should be UTF-8). Regards, Martin
On 5 30 , 1 23 , "Martin v. Lo"wis" <mar@v.loewis.de> wrote:
> schrieb: > > Who could explain the follow issue ? > >>>> print u'\u0394' > > > >>>> print u'\u20ac' > > Traceback (most recent call last): > > File "<stdin>", line 1, in <module> > > UnicodeEncodeError: 'gbk' codec can't encode character u'\u20ac' in > > position 0: > > illegal multibyte sequence > > My terminal is cmd.exe under windows XP. > > what's the different between the two character ? what can I do if I > > want to print the u'\u20ac'? > The problem is that your terminal uses (some form of) the GBK encoding; > seehttp://zh.wikipedia.org/wiki/GBKfor details on GBK. > It seems that GBK (or, rather, code page 936) supports the delta > character, but not the euro sign. > To change that, you can use "chcp" in your terminal window. > For example, if you do "chcp 850", you should be able to > display the euro sign (but will simultaneously use the ability > to display the letter delta, and the chinese letters). > I don't know whether the terminal supports an UTF-8 code > page; you can try setting the terminal's code page to > 65001 (which should be UTF-8). > Regards, > Martin
Thanks, but it seems not work yet. ---------------------------------------------------- C:\WINDOWS>chcp 850 Active code page: 850 C:\WINDOWS>python Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> print u'\u20ac'
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python25\lib\encodings\cp850.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u20ac' in position 0: character maps to <undefined> C:\WINDOWS>chcp 65001 Active code page: 65001 C:\WINDOWS>python Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> print u'\u20ac'
Traceback (most recent call last): File "<stdin>", line 1, in <module> LookupError: unknown encoding: cp65001 ----------------------------------------------- I find that the u'\u20ac' related 'mbcs' encode is 0x80, I could print it directly >>> print '\x80'
But the string contained the u'\u20ac' is get from remote host. Is there any method to decode it to the local 'mbcs'?
On May 30, 3:05 pm, <kelvin.@gmail.com> wrote:
> On 5 30 , 1 23 , "Martin v. Lo"wis" <mar @v.loewis.de> wrote: > > schrieb: > > > Who could explain the follow issue ? > > >>>> print u'\u0394' > > > > > >>>> print u'\u20ac' > > > Traceback (most recent call last): > > > File "<stdin>", line 1, in <module> > > > UnicodeEncodeError: 'gbk' codec can't encode character u'\u20ac' in > > > position 0: > > > illegal multibyte sequence > > > My terminal is cmd.exe under windows XP. > > > what's the different between the two character ? what can I do if I > > > want to print the u'\u20ac'? > > The problem is that your terminal uses (some form of) the GBK encoding; > > seehttp://zh.wikipedia.org/wiki/GBKfordetails on GBK. > > It seems that GBK (or, rather, code page 936) supports the delta > > character, but not the euro sign. > > To change that, you can use "chcp" in your terminal window. > > For example, if you do "chcp 850", you should be able to > > display the euro sign (but will simultaneously use the ability > > to display the letter delta, and the chinese letters). > > I don't know whether the terminal supports an UTF-8 code > > page; you can try setting the terminal's code page to > > 65001 (which should be UTF-8). > > Regards, > > Martin > Thanks, but it seems not work yet. > ---------------------------------------------------- > C:\WINDOWS>chcp 850 > Active code page: 850 > C:\WINDOWS>python > Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit > (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" for more information.>>> print u'\u20ac' > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "C:\Python25\lib\encodings\cp850.py", line 12, in encode > return codecs.charmap_encode(input,errors,encoding_map) > UnicodeEncodeError: 'charmap' codec can't encode character u'\u20ac' > in position > 0: character maps to <undefined> > C:\WINDOWS>chcp 65001 > Active code page: 65001 > C:\WINDOWS>python > Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit > (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" for more information.>>> print u'\u20ac' > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > LookupError: unknown encoding: cp65001 > ----------------------------------------------- > I find that the u'\u20ac' related 'mbcs' encode is 0x80, I could print > it directly > >>> print '\x80' > > But the string contained the u'\u20ac' is get from remote host. Is > there any method to decode it to the local 'mbcs'?
On May 30, 3:05 pm, <kelvin.@gmail.com> wrote:
> On 5 30 , 1 23 , "Martin v. Lo"wis" <mar @v.loewis.de> wrote: > > schrieb: > > > Who could explain the follow issue ? > > >>>> print u'\u0394' > > > > > >>>> print u'\u20ac' > > > Traceback (most recent call last): > > > File "<stdin>", line 1, in <module> > > > UnicodeEncodeError: 'gbk' codec can't encode character u'\u20ac' in > > > position 0: > > > illegal multibyte sequence > > > My terminal is cmd.exe under windows XP. > > > what's the different between the two character ? what can I do if I > > > want to print the u'\u20ac'? > > The problem is that your terminal uses (some form of) the GBK encoding; > > seehttp://zh.wikipedia.org/wiki/GBKfordetails on GBK. > > It seems that GBK (or, rather, code page 936) supports the delta > > character, but not the euro sign. > > To change that, you can use "chcp" in your terminal window. > > For example, if you do "chcp 850", you should be able to > > display the euro sign (but will simultaneously use the ability > > to display the letter delta, and the chinese letters). > > I don't know whether the terminal supports an UTF-8 code > > page; you can try setting the terminal's code page to > > 65001 (which should be UTF-8). > > Regards, > > Martin > Thanks, but it seems not work yet. > ---------------------------------------------------- > C:\WINDOWS>chcp 850 > Active code page: 850 > C:\WINDOWS>python > Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit > (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" for more information.>>> print u'\u20ac' > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "C:\Python25\lib\encodings\cp850.py", line 12, in encode > return codecs.charmap_encode(input,errors,encoding_map) > UnicodeEncodeError: 'charmap' codec can't encode character u'\u20ac' > in position > 0: character maps to <undefined> > C:\WINDOWS>chcp 65001 > Active code page: 65001 > C:\WINDOWS>python > Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit > (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" for more information.>>> print u'\u20ac' > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > LookupError: unknown encoding: cp65001 > ----------------------------------------------- > I find that the u'\u20ac' related 'mbcs' encode is 0x80, I could print > it directly > >>> print '\x80' > > But the string contained the u'\u20ac' is get from remote host. Is > there any method to decode it to the local 'mbcs'?
forgot to unicode(string) before send it?
??????????????? wrote: > But the string contained the u'\u20ac' is get from remote host. Is > there any method to decode it to the local 'mbcs'?
remote_string = u'\u20ac' try: local_string = remote_string.encode('mbcs') except: # no mbcs equivalent available print "encoding error" else: # local_string is now an 8-bit string print "result:", local_string # if console is not mbcs, you should see incorrect result assert result == '\x80' Mbcs is windows-only so I couldn't test this. If your application handles text, it may be easier to just leave everything in Unicode and encode to utf-8 for storage? Regards, Tijs
On 5 30 , 9 03 , Tijs <tijs_n@artsoftonline.com> wrote:
> ??????????????? wrote: > > But the string contained the u'\u20ac' is get from remote host. Is > > there any method to decode it to the local 'mbcs'? > remote_string = u'\u20ac' > try: > local_string = remote_string.encode('mbcs') > except: > # no mbcs equivalent available > print "encoding error" > else: > # local_string is now an 8-bit string > print "result:", local_string > # if console is not mbcs, you should see incorrect result > assert result == '\x80' > Mbcs is windows-only so I couldn't test this. > If your application handles text, it may be easier to just leave everything > in Unicode and encode to utf-8 for storage? > Regards, > Tijs
Yes, it works, thank you. But I doubt this way may not work on linux. Maybe I should write some additional code for supporting both windows and linux OS.
wrote: > Yes, it works, thank you. > But I doubt this way may not work on linux. Maybe I should write some > additional code for supporting both windows and linux OS.
Depends on what you want to do. Printing to a DOS terminal is hard in Linux :-) If you write server code, best to keep all text in unicode. -- Regards, Tijs
|
 |
 |
 |
 |
|