Home     |     .Net Programming    |     cSharp Home    |     Sql Server Home    |     Javascript / Client Side Development     |     Ajax Programming

Ruby on Rails Development     |     Perl Programming     |     C Programming Language     |     C++ Programming     |     IT Jobs

Python Programming Language     |     Laptop Suggestions?    |     TCL Scripting     |     Fortran Programming     |     Scheme Programming Language


 
 
Cervo Technologies
The Right Source to Outsource

MS Dynamics CRM 3.0

Python Programming Language

Unicode to HTML entities


I was looking for a function to transform a unicode string into
htmlentities. Not only the usual html escaping thing but all
characters.

As I didn't find I wrote my own:

# -*- coding: utf-8 -*-
from htmlentitydefs import codepoint2name

def unicode2htmlentities(u):

   htmlentities = list()

   for c in u:
      if ord(c) < 128:
         htmlentities.append(c)
      else:
         htmlentities.append('&%s;' % codepoint2name[ord(c)])

   return ''.join(htmlentities)

print unicode2htmlentities(u'So Paulo')

Is there a function like that in one of python builtin modules? If not
is there a better way to do it?

Regards, Clodoaldo Pinto Neto

"Clodoaldo" <clodoaldo.pi@gmail.com> wrote in message

news:1180453921.357081.89500@n15g2000prd.googlegroups.com...

>I was looking for a function to transform a unicode string into
>htmlentities.
>>> u'So Paulo'.encode('ascii', 'xmlcharrefreplace')

'S&#227;o Paulo'
On May 29, 12:57 pm, "Richard Brodie" <R.Bro@rl.ac.uk> wrote:

> "Clodoaldo" <clodoaldo.pi@gmail.com> wrote in message

> news:1180453921.357081.89500@n15g2000prd.googlegroups.com...

> >I was looking for a function to transform a unicode string into
> >htmlentities.
> >>> u'So Paulo'.encode('ascii', 'xmlcharrefreplace')

> 'S&#227;o Paulo'

That was a fast answer. I would never find that myself.

Thanks, Clodoaldo

Clodoaldo <clodoaldo.pi@gmail.com> wrote:
> On May 29, 12:57 pm, "Richard Brodie" <R.Bro@rl.ac.uk> wrote:
>> "Clodoaldo" <clodoaldo.pi@gmail.com> wrote in message

>> news:1180453921.357081.89500@n15g2000prd.googlegroups.com...

>> >I was looking for a function to transform a unicode string into
>> >htmlentities.
>> >>> u'So Paulo'.encode('ascii', 'xmlcharrefreplace')

>> 'S&#227;o Paulo'

> That was a fast answer. I would never find that myself.

You might actually want:

>>> cgi.escape(u'So Paulo & Esprito Santo').encode('ascii', 'xmlcharrefreplace')

'S&#227;o Paulo &amp; Esp&#237;rito Santo'

as you have to be sure to escape any ampersands in your unicode
string before doing the encode.

On 29 maj 2007, at 17.52, Clodoaldo wrote:

        In many cases, the need to use html/xhtml entities can be avoided by  
generating
utf8- coded pages.
------------------------------------------------------
"Home is not where you are born, but where your heart finds peace" -
Tommy Nordgren, "The dying old crone"
tommy.nordg@comhem.se
On May 30, 8:53 am, Tommy Nordgren <tommy.nordg@comhem.se> wrote:

Sure. All my pages are utf-8 encoded. The case I'm dealing with is an
email link which subject has non ascii characters like in:

<a href=mailto:exam@sample.com?subject=Dvidas>Mail to</a>

Somehow when the user clicks on the link the subject goes to his email
client with the non ascii chars as garbage.

And before someone points that I should not expose email addresses,
the email is only linked with the consent of the owner and the source
is obfuscated to make it harder for a robot to harvest it.

Regards, Clodoaldo

On May 30, 4:25 am, Duncan Booth <duncan.bo@invalid.invalid> wrote:

I will do it. Thanks.

Regards, Clodoaldo.

Add to del.icio.us | Digg this | Stumble it | Powered by Megasolutions Inc