Friday, 22 June 2012

Read and Write UTF-8 File in Java

We can use java.io.InputStreamReader class to read a file in a user-specified decoding and  java.io.OutputStreamWriter class to write a file in a user-specified encoding.

public class InputStreamReader extends Reader


An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset. The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.The constructor connects a character reader to an underlying input stream:

public InputStreamReader(InputStream in)
public InputStreamReader(InputStream in, String decoding) throws UnsupportedEncodingException


The first constructor uses the platform's default decoding. The second one uses the specified decoding. For example, to attach an InputStreamReader to System.in with the UTF-8 decoding :

InputStreamReader in = new InputStreamReader(System.in,"UTF-8");

For top efficiency, consider wrapping an InputStreamReader within a BufferedReader. For example:

BufferedReader in = new BufferedReader(new InputStreamReader(System.in,"UTF-8"));

public class OutputStreamWriter extends Writer


An OutputStreamWriter is a bridge from character streams to byte streams: Characters written to it are encoded into bytes using a specified charset. The charset that it uses may be specified by name or may be given explicitly, or the platform's default charset may be accepted.Its constructor connects a character writer to an underlying output stream:

public OutputStreamWriter(OutputStream out)
public OutputStreamWriter(OutputStream out, String encoding) throws
UnsupportedEncodingException



The first constructor assumes that the text in the stream is to be written using the platform's
default encoding. The second constructor specifies an encoding. For example, this code attaches an OutputStreamWriter to System.out with the UTF-8 encoding:

OutputStreamWriter osw = new OutputStreamWriter(System.out, "UTF-8");

For top efficiency, consider wrapping an OutputStreamWriter within a BufferedWriter so as to avoid frequent converter invocations. For example:


Writer out = new BufferedWriter(new OutputStreamWriter(System.out,"UTF-8"));



Example: Copy a file in UTF-8 format


 The following code shows how to copy a text file line by line, using an explicit encoding, UTF-8

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;

public class CopyFileUTF8Format {

 /**
  * @param Pass
  *            Filename as command-line argument 
         *            args[0]------> Source File
  *            args[1]------> Destination File
  * @throws IOException
  */
 public static void main(String[] args) {

  BufferedReader br = null;
  BufferedWriter bw = null;
  String cline;
  try {
   FileInputStream fin = new FileInputStream(args[0]);
   FileOutputStream fout = new FileOutputStream(args[1]);

   br = new BufferedReader(new InputStreamReader(fin, "UTF-8"));
   bw = new BufferedWriter(new OutputStreamWriter(fout, "UTF-8"));

   while ((cline = br.readLine()) != null) {
    bw.write(cline);
    bw.newLine();
   }

  } catch (IOException e) {
   System.out.println(e);
  } finally {
   try {
    br.close();
    bw.close();
   } catch (IOException e) {
    e.printStackTrace();
   }
  }
 }

}

No comments:

Post a Comment