How to handle URL with JavaScript encodeURI() Function

While working on the UI side, we should not rely on the user’s input. For an instance, consider a URL. Earlier, we used to have few methods to handle a URL string. One of them is escape method but it is deprecated now. So, through this article, I will introduce few methods to handle URL and show how to use them properly.

encodeURI and encodeURIComponent

There are the two JavaScript methods to escape string for URL. What would be the differences between them?

encodeURI does not escape following characters:

  • A-Z a-z 0-9 ; , / ? : @ & = + $ – _ . ! ~ * ‘ ( )

encodeURIComponent does not escape following characters:

  • A-Z a-z 0-9 – _ . ! ~ * ‘ ( )
encodeURI('AZaz09');      // 'AZaz09'
encodeURI(';,/?:@&=+$');  // ';,/?:@&=+$'
encodeURI("-_.!~*'()");   // "-_.!~*'()"
encodeURI(" ");           // "%20"

encodeURIComponent('AZaz09');      // 'AZaz09'
encodeURIComponent(';,/?:@&=+$');  // "%3B%2C%2F%3F%3A%40%26%3D%2B%24"
encodeURIComponent("-_.!~*'()");   // "-_.!~*'()"
encodeURIComponent(" ");           // "%20"

Only encodeURIComponent escapes reserved characters such as ‘; , / ? : @ & = + $’. A point to note, we should not use encodeURIComponent with the entire URL paths especially the one used for a GET or POST requests, because of the special characters as ‘/ : & = +‘.

encodeURI('http://your.ip.address/items?id=123');          // 'http://your.ip.address/items?id=123'
encodeURIComponent('http://your.ip.address/items?id=123'); // 'http%3A%2F%2Fyour.ip.address%2Fitems%3Fid%3D123'

// This is a good example usingencodeURIComponent.
var url = 'http://your.ip.address/places?address=' + encodeURIComponent(address);

Double encoding issue

What if a user types a URL that is already encoded? In this case, the URL would become invalid and a user will not be able to achieve what they intended. See an example below:

var space = " ";
var encodedOnce = encodeURI(space);  // '%20'
var encodedTwice = encodeURI(encodedSpace); // '%2520'
// encodeURI() changes '%' to '%25'

To handle this, we have two solutions.

First Solution: decode and encode a URL

encodeURI and decodeURI methods work in a pair. My first approach is to decode a URL then encode the returned value as below:

// First solution
encodeURI(decodeURI(url));

Since decodeURI is an idempotent method, calling the method multiple times does not change its result. If a user puts not encoded URL, the return value of decodeURI is the same as an input of the method. Both decodeURI(encoded) and decodeURI(notEncoded) yields the same result. Further, using encodeURI method would work as expected.

// Idempotence of decodeURI()

var url = "http://myhome.com/items?name=123";
// Below three lines return the same URL value.
decodeURI(url);
decodeURI(decodeURI(url));
decodeURI(decodeURI(decodeURI(url)));

var encodedURL = "http://google.com/items?name=1%202%203";
// Below three lines return "http://google.com/items?name=1 2 3"
decodeURI(encodedURL);
decodeURI(decodeURI(encodedURL));
decodeURI(decodeURI(decodeURI(encodedURL)));

This solution works in most of the cases. But, when a URL has single ‘%’ between alphabets, the browser throws ‘Uncaught URIError: URI malformed’. Thus, to prevent that, I enhanced the solution with a try-finally block. Even though the decodeURI throws an exception, the URL gets encoded in the finally block.

var url = "http://myhome.com/it%ems";
decodeURI(url);  // Uncaught URIError: URI malformed.

// The first solution is enhanced with a try-finally block.
function sanitizeUrl(url) {
    try {
        url = decodeURI(url);
    } finally {
        return encodeURI(url);
    }
}

Second solution with a HTML img tag

The sanitizeUrl would not be that elegant because of the try-finally block. I also do not like spending too much time to re-invent something that already exists. Thus, I decided to use a HTML img tag. After creating an img tag, assign the URL to a src attribute of the img tag. Further, browsers handle if any additional decoding/encoding of a URL is required. It is simple enough to use.

function sanitizeUrl2(url) {
    var img = document.createElement('img');
    img.src = url;
    return img.src;
}

When I tested the sanitizeUrl2 on IE, it throws an exception if a URL has single ‘%’ between alphabets such as ‘http://myhome.com/it%ems’. The final version of the URL handling method uses two solutions. The first solution with a try-finally block would be used only for IE and the second solution with a HTML img tag for other browsers. Below is the finalized source code.

var isIE = !!document.documentMode;

function sanitizeUrl3(url) {
    var tmpUrl, img;
    if(isIE) {  // Only for IE
        tmpUrl = url;
        try {
            tmpUrl = decodeURI(tmpUrl);
        } finally {
            return encodeURI(tmpUrl);
        }
    }
    // For other browsers
    img = document.createElement('img');
    img.src = url;
    return img.src;
}

To identify the browser, I have added a link in the References section which can help you to decide among the two solutions.

Conclusion

Through this article, we get to know about the JavaScript URL decoding/encoding methods and differences among them. To use these methods in a real business world, we should think about a double encoding issue.

I think this can be a part of product management. If the product has a restriction of never allowing the users to use ‘%’ in a URL field, what they could do is, add a tooltip for the URL field and encode a URL from it. But, I wanted to solve the double encoding issue without this assumption. We almost solve the double encoding issue by creating a HTML img tag. But, IE is not that good kid. Thus, two solutions collaborate to support multiple browsers like IE, Chrome, and Firefox. I could not test this on Edge but it should be fine because we would be able to use one of the two solutions for Edge.

References

Leave a comment