Something you’ll come across alot when dealing with web applications in Base64. What is Base64? Base64 is an encoding scheme used to convert binary data to ASCII text allowing binary data to be transmitted over channels that don’t handle binary data well. What’s great news for bug hunters is that a lot of applications trust Base64-encoded input, giving you more opportunities for discovering vulnerabilities.

How does Base64 work?

Base64 converts binary data (or, really, any data) to a set of ASCII characters (you know, A-B-C-D… and the like). Note that Base64 is not an encryption so encoding and decoding is trivial. You don’t need to know anything secret to reveal what Base64-encoded data contains nor to endcode your own data that the application will be able to decode.

The way Base64 works is that it turns every 6 bits of data into one byte (8 bits). That means that abc will be converted to a four byte string, in this case YWJj.

Recognizing Base64

Base64 is relatively easy to recognize:

Because of its limited character set, Base64 is often easy to recognize because it only uses the characters ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ with one special exception mentioned below.
When encoding input that does not have 24 bits (3 bytes) at the end, Base64 pads the data with = signs. This might look like this:
```
base64("foo") (3 bytes) => "Zm9v"
base64("foo1") (4 bytes) => "Zm9vMQ=="
base64("foo12") (5 bytes) => "Zm9vMTI="
base64("foo123") (6 bytes) => "Zm9vMTIz"
```
Something worth mentioning is that there are some implementations of Base64 that don’t pad the output. For instance, JSON Web Tokens (JWTs) are Base64-encoded but stripped of the padding. Vicki Li has a good post covering many known security issues with how people use JWTs.
If you’re not sure, decode anything you find that looks vaguely like Base64! Note that because Base64 can encode binary data, you might be unsure of what to do with that output. It’s worth investigating if the data decodes cleanly though.

On OSX, decoding can be done with (copy the string to your clipboard) and pbpaste | base64 -D | xxd (xxd will give you hex output in case the data is binary). You can also decode the data in Burp on the Encoder tab.

Where you can find Base64-encoded data

When dealing with applications, almost anywhere! Some examples include:

Embedded content like data: urls. Here’s a write-up using data URLs to get XSS
HTTP basic authentication headers
In a large variety of cookies, especially JWTs
Various headers, query variables, and POST bodies

What you can do when you find a Base64-encoded string

Of course the thing you should do first is decode it! This will give you an idea of what the application is expecting to receive.

Chances are, except for data: urls and a few other cases, if an application sends you some Base64-encoded data it expects it back in future requests, especially in the case of cookies.

Some things are try:

Sending non-encoded data or malformed Base64-encoding to the application. For instance, data without padding or characters such as ! which are not part of the Base64 alphabet.
Sending Base64-encoded garbage to the application. For instance, if the server is expecting Base64-encoded JSON, send it encoded garbage or malformed JSON (hint: fuzz this!).
Manipulating the original data, encoding it, and sending it to the application. For instance, if you have a Base64-encoded JSON object, what attributes can you modify or add to the object that might reveal a vulnerability?
Try more specific attacks based on what the encoded data contains. For instance, if you find what looks like a Base64 string but it’s in three sections separated by dashes (and potentially unpadded), you’re probably looking at a JWT! See Vicki Li’s post discussing many of the known security issues with JWTs.

All of these examples involve sending the server data but what happens if you have a pile of Base64 strings and need to sift through it all? Well, aside from decoding it, you may be able to recognize certain strings by looking at it since same bits are encoded the same!

For example, a Base64-encoded string starting with ey decodes to {" making it very likely that this encoded string will contain a JSON object.

Because of this same property, you can also “reverse search” Base64 for interesting data. As an example, user encodes to dXNlcg (minus padding) so you can search large amounts of Base64 for that text.

A special mention for bypasses

Another great use of Base64 is to bypass filtering mechanisms. Since Base64’s character set only contains A-Za-z0-9+/ and sometimes =, it can get through a lot of filters due to the fact that + and / aren’t filtered in many sanitization routines. In the case of = being filtered, you can remove it by padding your input.

Great examples of these bypasses can be found on OWASP’s XSS Filter Evasion Cheat Sheet!

Summary

I hope this will help you find and exploit more bugs in applications that use Base64 as an encoding mechanism.

Do you have any examples of where you’ve found bugs involving Base64-encoded data? Let me know @subfnSecurity on twitter!