Monday, January 5, 2009
In this article I will explain how to decompress web responses that are compressed with GZip or Deflate compression.

I battled with this problem for 5 hours which was mainly due to the fact that most of the GZip methods out there on the net are flawed when applied to certain scenarios.
This led me down the path of trying nearly every conceivable uncompression method I could find that would work on GZipped http responses.

Firstly, a quick overview of HTTP Compression. (feel free to skip this section)
Web-Servers will use Http Compression if both your browser and the web-server support http-compression.
Your web client will send a header informing the server what encoding-type it supports. The web-server will respond with a header which informs your browser which encoding-type its using.
So if your browser sent an Accept-Encoding of GZip, a web-server that’s supports GZip page compression will respond with content that is GZip compressed.

Okay, so we understand the idea behind a web-server responding with compressed Html or data. So lets move on to a C# example.
What we will do is emulate a web browser by using an instance of the System.Net.WebClient, set it to support GZip compression and then proceeds to decompress the response from the web-server.
Fun!

            /// create web client

            /// set headers: User-Agent & Accept-Encoding

 

            System.Net.WebClient wc;           

            wc = new System.Net.WebClient();

            wc.Headers["User-Agent"] = "Mozilla/4.0"; // You must specify User-Agent type

            wc.Headers["Accept-Encoding"] = "gzip, deflate"; // here we specify that our client supports both GZip and Deflate encoding types



Once we have our client setup, we can go ahead and send your request to the server.


            /// request page from web server


            System.IO.StreamReader webReader;

            webReader = new System.IO.StreamReader(wc.OpenRead("http://www.know24.net/blog/"));           



We now need to check which encoding type the web-server chose so that we can handle the web-response correctly.


            string data = string.Empty; // will be used to store our uncompressed page content

            string sResponseHeader = wc.ResponseHeaders["Content-Encoding"]; // get response header

            if (!string.IsNullOrEmpty(sResponseHeader))

            {

                if (sResponseHeader.ToLower().Contains("gzip"))

                {

                    byte[] b = DecompressGzip(webReader.BaseStream);

                    data = System.Text.Encoding.GetEncoding(wc.Encoding.CodePage).GetString(b);

                }

                else if (sResponseHeader.ToLower().Contains("deflate"))

                {

                    byte[] b = DecompressDeflate(webReader.BaseStream);                   

                    data = System.Text.Encoding.GetEncoding(wc.Encoding.CodePage).GetString(b);

                }

            }

            // uncompressed, standard response

            else

            {

                data = webReader.ReadToEnd();

            }



Above you will see I call the DecompressGzip method which decompresses the GZipped response and returns an array of bytes. I then proceed to convert the bytes to the CORRECT character encoding type.
What you must do is detect the character encoding found in the response and apply this when converting your bytes to a string. Failure to do so will mean death...  ok no seriously, it mean's possible time wasting problem-fixing in the future.

Now for the long awaited method DecompressGzip.


private static byte[] DecompressGzip(Stream streamInput)

        {

            Stream streamOutput = new MemoryStream();

            int iOutputLength = 0;

            try

            {

                byte[] readBuffer = new byte[4096];

 

                /// read from input stream and write to gzip stream

 

                using (GZipStream streamGZip = new GZipStream(streamInput, CompressionMode.Decompress))

                {

                   

                    int i;

                    while ((i = streamGZip.Read(readBuffer, 0, readBuffer.Length)) != 0)

                    {

                        streamOutput.Write(readBuffer, 0, i);

                        iOutputLength = iOutputLength + i;

                    }

                }

            }

            catch (Exception ex)

            {

                // todo: handle exception

            }

 

            /// read uncompressed data from output stream into a byte array

 

            byte[] buffer = new byte[iOutputLength];

            streamOutput.Position = 0;

            streamOutput.Read(buffer, 0, buffer.Length);          

 

            return buffer;

        }



This method reads the webResponse base-stream into a GZipStream and which outputs the uncompressed data to the MemoryStream. From the memory stream we read the data into a byte array and return this to the caller.

I've neglected to add the DecompressDeflate method as you can easily rename the GZipStream to DeflateStream which will then handle that encoding type.

That’s that! If you get stuck or can offer any suggestions to improve the above code, feel free to leave comments.