|
|
 |
 |
 |
 |
UNICODE input for CGI using C
Dear All, I'm trying to accept a multi-lingual string (UNICODE) in a form and am trying to parse it. What i am getting is %XX (which is a single byte, not 2 bytes). So, is the data getting lost? What format is it, if it is not getting lost. Thanx in advance, Punit.
In article <1180444998.814728.246@a26g2000pre.googlegroups.com>, <puneet.p.s @gmail.com> wrote: > I'm trying to accept a multi-lingual string (UNICODE) in a >form and am trying to parse it. What i am getting is %XX (which is a >single byte, not 2 bytes). So, is the data getting lost? What format >is it, if it is not getting lost. You should be getting 2 or more successive %XXs. HTML form data send using GET is part of the URL Non-ASCII characters are represented in UTF-8, then each byte of the UTF-8 sequence is encoded in hex as %XX. See http://www.ietf.org/rfc/rfc3986.txt http://www.ietf.org/rfc/rfc2279.txt For POST data, I can't find up-to-date documentation. The very old http://www.w3.org/TR/html4/interact/forms.html describes the application/x-www-form-urlencoded mime type, but it does not mention non-ASCII characters. I think you'll find that it uses the same method as GET, but it's possible that it might use the encoding specified by the HTTP charset declaration rather than UTF-8. You'll need to ask about that somewhere other than comp.lang.c. -- Richard -- "Consideration shall be given to the need for as many as 32 characters in some alphabets" - X3.4, 1963. |
 |
 |
 |
 |
|