Raw Stream Has Data, Deflate Returns Zero Bytes

https://stackoverflow.com/questions/5047859

15-11-2019
|

Question

I'm reading data (an adCenter report, as it happens), which is supposed to be zipped. Reading the contents with an ordinary stream, I get a couple thousand bytes of gibberish, so this seems reasonable. So I feed the stream to DeflateStream.

First, it reports "Block length does not match with its complement." A brief search suggests that there is a two-byte prefix, and indeed if I call ReadByte() twice before opening DeflateStream, the exception goes away.

However, DeflateStream now returns nothing at all. I've spent most of the afternoon chasing leads on this, with no luck. Help me, StackOverflow, you're my only hope! Can anyone tell me what I'm missing?

Here's the code. Naturally I only enabled one of the two commented blocks at a time when testing.

_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
  {
  // Skip the zlib prefix, which conflicts with the deflate specification
  compressed.ReadByte();  compressed.ReadByte();

  // Reports reading 3,000-odd bytes, followed by random characters
  /*byte[]  buffer    = new byte[4096];
  int     bytesRead = compressed.Read(buffer, 0, 4096);
  Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
  string  content   = Encoding.ASCII.GetString(buffer, 0, bytesRead);
  Console.WriteLine(content);*/

  using (DeflateStream decompressed = new DeflateStream(compressed, CompressionMode.Decompress))
    {
    // Reports reading 0 bytes, and no output
    /*byte[]  buffer    = new byte[4096];
    int     bytesRead = decompressed.Read(buffer, 0, 4096);
    Console.WriteLine("Read {0} bytes.", bytesRead.ToString("#,##0"));
    string  content   = Encoding.ASCII.GetString(buffer, 0, bytesRead);
    Console.WriteLine(content);*/

    using (StreamReader reader = new StreamReader(decompressed))
      while (reader.EndOfStream == false)
        _results.Add(reader.ReadLine().Split('\t'));
    }
  }

As you can probably guess from the last line, the unzipped content should be TDT.

Just for fun, I tried decompressing with GZipStream, but it reports that the magic number is not correct. MS' docs just say "The downloaded report is compressed by using zip compression. You must unzip the report before you can use its contents."

Here's the code that finally worked. I had to save the content out to a file and read it back in. This does not seem reasonable, but for the small quantities of data I'm working with, it's acceptable, I'll take it!

WebRequest   request  = HttpWebRequest.Create(reportURL);
WebResponse  response = request.GetResponse();

_results = new List<string[]>();
using (Stream compressed = response.GetResponseStream())
  {
  // Save the content to a temporary location
  string  zipFilePath = @"\\Server\Folder\adCenter\Temp.zip";
  using (StreamWriter file = new StreamWriter(zipFilePath))
    {
    compressed.CopyTo(file.BaseStream);
    file.Flush();
    }

  // Get the first file from the temporary zip
  ZipFile  zipFile = ZipFile.Read(zipFilePath);
  if (zipFile.Entries.Count > 1)  throw new ApplicationException("Found " + zipFile.Entries.Count.ToString("#,##0") + " entries in the report; expected 1.");
  ZipEntry  report = zipFile[0];

  // Extract the data
  using (MemoryStream decompressed = new MemoryStream())
    {
    report.Extract(decompressed);
    decompressed.Position = 0;  // Note that the stream does NOT start at the beginning
    using (StreamReader reader = new StreamReader(decompressed))
      while (reader.EndOfStream == false)
        _results.Add(reader.ReadLine().Split('\t'));
    }
  }

Solution

You will find that DeflateStream is hugely limited in what data it will decompress. In fact if you are expecting entire files it will be of no use at all. There are hundereds of (mostly small) variations of ZIP files and DeflateStream will get along only with two or three of them.

Best way is likely to use a dedicated library for reading Zip files/streams like DotNetZip or SharpZipLib (somewhat unmaintained).

OTHER TIPS

You could write the stream to a file and try my tool Precomp on it. If you use it like this:

precomp -c- -v [name of input file]

any ZIP/gZip stream(s) inside the file will be detected and some verbose information will be reported (position and length of the stream). Additionally, if they can be decompressed and recompressed bit-to-bit identical, the output file will contain the decompressed stream(s).

Precomp detects ZIP/gZip (and some other) streams anywhere in the file, so you won't have to worry about header bytes or garbage at the beginning of the file.

If it doesn't detect a stream like this, try to add -slow, which detects deflate streams even if they don't have a ZIP/gZip header. If this fails, you can try -brute which even detects deflate streams that lack the two byte header, but this will be extremely slow and can cause false positives.

After that, you'll know if there is a (valid) deflate stream in the file and if so, the additional information should help you to decompress other reports correctly using zLib decompression routines or similar.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow