How to get and set default Character encoding or Charset in Java? Example

Default Character encoding in Java or charset is the character encoding used by JVM to convert bytes into Strings or characters when you don't define java system property "file.encoding". Java gets character encoding by calling System.getProperty("file.encoding","UTF-8") at the time of JVM start-up. So if Java doesn't get any file.encoding attribute it uses "UTF-8" character encoding for all practical purpose e.g. on String.getBytes() or Charset.defaultCharSet().

The most important point to remember is that Java caches character encoding or value of system property "file.encoding" in most of its core classes like InputStreamReader which needs character encoding after JVM started.

So if you change the system property "file.encoding" programmatically you don't see the desired effect and that's why you should always work with your own character encoding provided to your application and if it needs to be set then set character encoding or charset while you start JVM.

In this Java tutorial, we will see a couple of different ways by which we can set default character encoding or charset of Java and how to retrieve the value of charset inside java program.

Default Character encoding or Charset in Java

This article is in continuation of my post on Java String like Why String is immutable in Java or How the SubString method works in java. If you haven’t read those you may find interesting.

What is character encoding in Java

For those who are not very familiar with character encoding or char-set in Java here is a layman's introduction "since every data in computer is represented in bytes and Strings are essentially collection of characters, so to convert bytes into character JVM needs to know which combination of byte represent which character and this is what character encoding tells JVM. Since there are many languages in world other than English like Hindi, Mandarin, Japanese Kanji etc and so many characters, same combination of bytes can represent different characters in different character encoding and that's why using correct character encoding is must while converting bytes into String in Java".

How to get default character encoding in Java ?

There are multiple ways to get default character encoding in Java like by using system property “file.encoding” or by using java.nio.CharSet class. You can choose whatever suits your need. Let’s see them in detail.

1) "file.encoding" system property

The easiest way to get default character encoding in Java is to call System.getProperty("file.encoding"), which will return default character encoding if JVM started with -Dfile.encoding property or program has not called System.setProperty("file.encoding, encoding). in the later case, it may just give the value of that system property while various

2) java.nio.Charset

java.nio.Charset provides a convenient static method Charset.defaultCharset() which returns default character encoding in Java. Check the example of getting default char encoding in java using Charset in the code section.

3) by using Code InputStreamReader.getEncoding()

This is kind of shortcut where you use default constructor of InputStreamReader and then later gets which character encoding it has used by calling reader.getEncoding() . See the code example of how to get default character encoding using InputStreamReader.getEncoding() method in code section.

How to set default character encoding in Java ?

Just like different ways of getting default character encoding or charset in Java there are many ways to set default charset in Java. Here are some of the way:

1. Using the System property "file.encoding"

by providing the file.encoding system property when JVM starts e.g. java -Dfile.encoding="UTF-8" HelloWorld.

2. Using the Environment variable "JAVA_TOOLS_OPTIONS"

If by anyway you don't have control how JVM starts up may be JVM is starting through some scripts which doesn't provide any way to accept system properties. you can set environment variable JAVA_TOOL_OPTIONS to -Dfile.encoding="UTF-16" or any other character encoding and it will picked up any JVM starts in your windows machine. JVM will also print "Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF16" on console to indicate that it has picked JAVA_TOOS_OPTIONS. here is example of setting default character encoding using JAVA_TOOLS_OPTIONS

test@system:~/java java HelloWorld

þÿExecuting HelloWorld

Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF16

You can also check my post 10 JVM Options developer should know for more on JVM Options

Code Example to Get and Set Default Character Encoding Java

Here is code example of getting and setting default character encoding in Java:

import java.io.ByteArrayInputStream;

import java.io.FileNotFoundException;

import java.io.IOException;

import java.io.InputStream;

import java.io.InputStreamReader;

import java.io.UnsupportedEncodingException;

import java.nio.charset.Charset;

public class CharacterEncodingExample {

public static void main(String args[]) throws FileNotFoundException, UnsupportedEncodingException, IOException {

String defaultCharacterEncoding = System.getProperty("file.encoding");

System.out.println("defaultCharacterEncoding by property: " + defaultCharacterEncoding);

System.out.println("defaultCharacterEncoding by code: " + getDefaultCharEncoding());

System.out.println("defaultCharacterEncoding by charSet: " + Charset.defaultCharset());

System.setProperty("file.encoding", "UTF-16");

System.out.println("defaultCharacterEncoding by property after updating file.encoding : " + System.getProperty("file.encoding"));

System.out.println("defaultCharacterEncoding by code after updating file.encoding : " + getDefaultCharEncoding());

System.out.println("defaultCharacterEncoding by java.nio.Charset after updating file.encoding : " + Charset.defaultCharset());

}

public static String getDefaultCharEncoding(){

byte [] bArray = {'w'};

InputStream is = new ByteArrayInputStream(bArray);

InputStreamReader reader = new InputStreamReader(is);

String defaultCharacterEncoding = reader.getEncoding();

return defaultCharacterEncoding;

}

Output:

defaultCharacterEncoding by property: UTF-8

defaultCharacterEncoding by code: UTF8

defaultCharacterEncoding by charSet: UTF-8

defaultCharacterEncoding by property after updating file.encoding : UTF-16

defaultCharacterEncoding by code after updating file.encoding : UTF8

defaultCharacterEncoding by java.nio.Charset after updating file.encoding : UTF-8

Important points to note:

1) JVM caches the value of default character encoding once JVM starts and so is the case for default constructors of InputStreamReader and other core Java classes. So calling System.setProperty("file.encoding" , "UTF-16") may not have desire effect.

2) Always work with your own character encoding if you can, that is a more accurate and precise way of converting bytes to Strings.

That’s all on how to get default character encoding in Java and how to set it. This becomes more important when you are writing an international application that supports multiple languages. I indeed come across character encoding issues while writing reports in kanji (Japanese) language which I plan to share in another post, but a good knowledge of Character encoding like UTF-8, UTF-16 or ISO-8859-5, and how Java supports Character Encoding in String will certainly help.

Other Java tutorials you may like

Why Multiple inheritances in not supported in Java

How to write Thread-Safe Code in Java

How to Convert String to Double in Java

Difference between Comparator and Comparator in java

How to override the equals method in Java

Why main is declared static in Java

10 best practices to follow while writing code comments

9 comments :

Sonya said...: Character Encoding so far looked little difficult to me but after reading this article I at least got to know that what is character encoding in Java and where does it get used and what issues it can cause if bytes encoded in one character set decoded on another charset. Thanks a lot; January 27, 2012 at 12:05 AM
Anonymous said...: Note that explicitly trying to set the "file.encoding" system property on the command line or via environment variables is not supported; this value is not respected by all the JVM's APIs. See the evaluation comments on bugs.sun.com Bug ID: 4163515 for details. http://bugs.sun.com/view_bug.do?bug_id=4163515; January 31, 2012 at 12:57 AM
Javin @ substring in java said...: @Anonymous thanks for pointing it out. So do you see any alternative except providing character encoding explicitly on constructors ?; February 3, 2012 at 10:43 PM
Anonymous said...: Hello...

I need to write to files with their filenames may include the euro (€) character. I can do it in my own pc with ubuntu 10.04 and java 1.6.0.26 where by default java uses UTF-8. But when I execute the code in the server (where java defaults to ASCII) the filename have a ? character.
I use /usr/local/jdk1.6.0_10/bin/java -classpath . -Dfile.encoding=UTF-8 TestEuro.

Can you help me with this? Thank you very much !!!

import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;
import java.io.UnsupportedEncodingException;
import java.net.URLDecoder;
import java.net.URLEncoder;

public class TestEuro {
public static void main( String[] args ) throws Exception {

System.out.println("file.encoding: " + System.getProperty("file.encoding"));
String path = "/srv/fws/java/indexer/" ;
String s1 = "test_€_encoding.txt" ;
File f = new File(path + s1);
OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream(f));
osw.write( "test file\n" ) ;
osw.write( "€" + "\n") ;
osw.write( "test file" ) ;
osw.flush();
osw.close();
System.out.println("s1: " + s1);
}
}; March 1, 2012 at 1:27 AM
Anonymous said...: I understand character encoding in Java hard way. We had Java program which reads xml file and also calls String.getBytes() to convert XML String to byte array, now this call is subject to character encoding. By default it uses system's character encoding or value returned by System.getProperty("file.encoding"), due to this for one input our program works fine in one of environment but failed in other environment. It took a lot of time to find out where is the issue. ultimately fix was to run your Java program with specified character encoding e.g. -Dfile.encoding=UTF-16, this will make sure that your application will always use correct character encoding and not behave differently on different machine.; December 12, 2012 at 10:55 PM
Midorinohito said...: I've got a number of files in an unknown encoding format. Does anyone here know of a tool that would display the results of multiple encoding assumptions converted to one common output format (such as UTF-8)? The tool would take an input string, then return an array (or display) multiple result strings, each with a different base assumption about the initial encoding? For example: convert string foo= "ç›£è¦–å¯¾è±¡ã�®åœ°åŸŸã‚¯ãƒ©ã‚¹ã�®ä¸€è¦§" into UTF-8, assuming that foo is each of (UTF-8, EUC-JP, Shift-JIS, etc.).; October 11, 2013 at 10:59 AM
Anonymous said...: Messing with character encoding is very difficult bug to solve. first of all if you don't explicitly specifly character encoding to methods like String.getBytes() or new String(byte[]) , it wlll use platform's default encoding, which could be diffeerent in differnet server and operating system. Default encoding may not be even sufficient to display all the characters your appliation is expecting e.g. your default encoding might be able to handle european characters but not the east asian characters.; September 3, 2014 at 9:18 PM
Unknown said...: PS: Default Character Encoding can be overwritten in your process as below.
InputStreamReader reader = new InputStreamReader(is, "UTF-8");; November 25, 2015 at 8:19 AM
rajeev said...: You saved my day!!! Thank you very much!; June 9, 2016 at 7:27 AM

Topics and Categories

Preparing for Java and Spring Boot Interview?

Wednesday, July 14, 2021

How to get and set default Character encoding or Charset in Java? Example

Default Character encoding or Charset in Java

What is character encoding in Java

How to get default character encoding in Java ?

How to set default character encoding in Java ?

Code Example to Get and Set Default Character Encoding Java

9 comments :

Post a Comment

My Books

My Courses

My Newsletter articles

Search This Blog

Interview Questions

Java Tutorials

Get New Blog Posts on Your Email

Get new posts by email:

Followers

Categories

Blog Archive

Pages