PDA

View Full Version : URL encoding blues + solution


Kees de Kooter
23-02-2009, 11:13
I have been struggling with (url) encoding issues for quite some time. The customer needed to have literal strings in REST-style urls. The default encoding on his platform (Windoze) worked well until they wanted to add words in east european languages, like s-es with funny hats.

On my linux box everything worked well after setting the encoding to UTF-8 in the most obvious places. On the customer's target platform however things were messed up.

After ruling out both IIS and the ISAPI redirector I finally solved it by adding a Filter with the following line:

servletRequest.setCharacterEncoding("UTF-8");


The REST-style URLs are now interpreted properly.

However the app also contains old fashioned urls. One problem remained: encoded strings in request parameters (i.e. after the "?"). The funny thing is that POST requests work fine, GETs mess up.

This time tomcat is to blame. Atlassian to the rescue:
http://confluence.atlassian.com/display/DOC/Configuring+Tomcat%27s+URI+encoding

Apparently tomcat defaults to ISO-8859-1. I added URIEncoding="UTF-8" to the HTTP and AJP connectors and now it is working fine.

admini
25-02-2009, 11:48
The GET parameters (after the ?) is part of the HTTP header. For POSTs the parameters are part of the HTTP body. Both have different defaults, headers use ISO-8859-1 (used to be ASCII). So Tomcat uses a proper default, in my opinion.

Kees de Kooter
25-02-2009, 15:16
Thanks for the clarification. I agree that tomcat uses the proper default.

The trouble begins if you use a different encoding in the body. And apparently the header encoding is outside the scope of the server spec. Something to keep in mind when you are trying to write portable apps.

jingming
03-03-2009, 04:13
I think there must be a good reason for tomcat developers to use ISO-8859-1 character encoding when decoding URLs received from a browser by default.

Generally, I won't add the code URIEncoding="UTF-8" to the conf/server.xml file, but I will use javascript to encode the parameters(after the ?) in the URLs using utf-8, then decode the parameters in servlet.

I haven't known how to deal with the URL path which is NOT in iso-8859-1 character. It seems that the ways to deal with URL path encoding and URL parameters encoding are different.

PS: I think URL = path + "?" + parameter

Kees de Kooter
03-03-2009, 08:55
I do not like to put some piece of configuration my app depends upon outside the app. It limits portability and following Murphy's law someone someday will forget to set it.

But the only solution I found for now is the one I gave you: URIEncoding in the connector.