The char Data Type

The char type is used to store single characters (letters, digits, symbols, etc…).


Remember, when values of variables are stored in the computer's memory, they must ultimately be stored in terms of only 1's and 0's. Recall, integer-based values are stored with the "Two's Complement" method, and floating-point decimal-based values are stored with IEEE 754 Notation. Similarly, we need a way to represent characters in terms of 1's and 0's.

Encoding refers to how something (like a char) is converted to its binary representation in the computer's memory.

A character set is a collection of characters along with an encoding scheme to convert them to their binary representations.

There are two popular character sets used frequently in programming languages:

    • Originally required 7 bits of memory to store, but modern extensions of ASCII require 1 full byte (8 bits) of storage space
    • Has a base set of 128 characters, including all of the English letters (upper and lower case), digits 09, some common symbols, and several unprintable characters like tab, return, backspace, etc… The extended versions of ASCII add several more printable characters.
  • Unicode
    • Requires 2 bytes (16 bits) of storage space
    • Consists of 65,536 characters, including characters and symbols from many different languages
    • ASCII is a subset of Unicode. That is to say, if the letter 'A', for example, is encoded to a decimal value of 65 in ASCII, then the Unicode version of 'A' will also be encoded to the same value.
    • Java uses Unicode

Working with the char Data Type

Character literals are expressed with single quotes, as shown below:

char c1 = 'A';
char c2 = '4';

Chars can also be specified by their encoded values:

char c1 = 65; //c1 = 'A'
char c2 = 52; //c2 = '4'

Character literals (especially those for characters not on most keyboards) can be specified in the following way as well:

char c1 = '\u0041'; // c1 = 'A' (note: 65 = 41 (base 16))
char c2 = '\u0034'; // c2 = '4' (note: 52 = 34 (base 16))
char c3 = '\u0060'; // c3 = '`' 
char c4 = '\u00A9'; // c4 = '©'
char c5 = '\u03C6'; // c5 = 'φ'

In general, literals expressed in this way take the form '\u####', where #### is a
hexadecimal (base 16) number corresponding to the Unicode value for the character.

You can use Unicode characters within Strings as well. For example,

System.out.println("\u0041\u0034"); will print "A4" to the console.

Escape Characters

Some characters are hard to put into a string. Suppose we wanted to print

Bob said "That's Great!"

to the console. The following would produce an error:

System.out.println("Bob said "That's Great!"");    \\ERROR!!

since the compiler will think the string ended when it sees the second quotation mark.

We could try the following, but even it causes problems:

System.out.println("Bob said \u0022That's Great!\u0022");   \\ERROR!!

Java gives us special escape sequences for characters like this. Some
important ones are shown in the table below:

Description Escape Sequence Unicode
Backspace \b \u0008
Tab \t \u0009
Linefeed \n \u000A
Carriage return \r \u000D
Backslash \\ \u005C
Single Quote \' \u0027
Double Quote \" \u0022

So, to print out comment about what Bob said, we can write:

System.out.println("Bob said \"That's Great!\"");

Which looks a bit better.

Other things you can do with the char data type...

chars can be used with other numeric values (sometimes requiring a cast) and with numeric operators

char c1 = 97;            // c1 = 'a'
char c2 = (char) 97.25;  // c2 = 'a'
int n = 'A';             // n = 65
int m = '2' + '3';       // (int) '2' = 50, so m = 101

The "+" operator can be used to concatenate a char with a string

char c = 'A';
String s = "BCD";
String s2 = c + s1;
System.out.println(s2);   //prints "ABCD"

The method charAt(int pos) can be used to get the char at a given position in a String

String s = "HELLO WORLD";
char c1 = s.charAt(0);
char c2 = s.charAt(6);
System.out.println(c1 + " is at position 0, while " + c2 + " is at position 6");
//prints "H is at position 0, while W is at position 6"

The increment and decrement operators can be used to get the next or preceding Unicode character

char c = 'B';
System.out.println(++c);  //prints the letter 'C'
System.out.println(--c);  //prints the letter 'A'