Friday, 07 August 2009
I ran into a problem today when writing an output file which kept containing the prefix FFFE when viewed in a hexadecimal editor.
This prefix was in my source file and attempts to remove it with String.Replace where fruitless.
The FFFE prefix was only visible in a file and not within the Visual Studio debugging environment.

So I started investigating.

Unicode contains a byte order mark (BOM) prefix which defines the byte order of your unicode file.
FFFE = Little Endian
FEFF = Big Endian

So I attempted to parse these characters by setting the UTF8 encoding param when instantiating my StreamWriter.
No luck.

The solution was to instead define a custom UnicodeEncoding and disable both Big and Little Endian byte order marks.
You do this by defining a custom UnicodeEncoding type as the encoding parameter in StreamWriter.

Here is the code.

// open our input file

StreamReader readerEDI = new StreamReader(@"input.txt");

 

// setup custom unicode encoding, disable big and little endian bom's

UnicodeEncoding unicode = new UnicodeEncoding(false, false);

 

// output file stream

Stream filestream = new FileStream(@"output", FileMode.CreateNew);

 

// instantiate new streamwriter, apply our custom unicode encoding

StreamWriter writerEDI = new StreamWriter(filestream, unicode);

 

// write to file

writerEDI.Write(readerEDI.ReadToEnd());

 

// clean up

readerEDI.Close();

writerEDI.Close();

All comments require the approval of the site owner before being displayed.
Name
E-mail
Home page

Comment (HTML not allowed)  

Enter the code shown (prevents robots):