Quantcast
Channel: sharpcompress Discussions Rss Feed
Viewing all articles
Browse latest Browse all 239

New Post: Problem with Unknown Header (and possible fix)

$
0
0
I thought I'd give the library a real workout - a 1.8GB .ZIPX file which is > 3GB when uncompressed and contains 100+ nested .ZIP files.

I had a problem with a single 'unknown' header. By adding a Try Catch, I could scan recursively the whole .ZIPX file and get nearly the same counts as with the DotNetZip code I am already using. (The difference being 3 files from a nested .ZIP entry)

Here is the Unit Test so you can see what I am doing:
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;

using Framework.Streams;

using NUnit.Framework;

using SharpCompress.Archive;

namespace Framework.UnitTest
{
    [TestFixture]
    public class SharpCompressTests
    {
        [Test, Ignore]
        public void ParseBigZipFile()
        {
            const string Filename = @"F:\B.Zip";

            ProcessZipFile(Filename);

            Debug.WriteLine("FileCount={0:n0}", FileCount);
            Debug.WriteLine("DirectoryCount={0:n0}", DirectoryCount);
            Debug.WriteLine("ZipFileCount={0:n0}", ZipFileCount);
        }

        static int FileCount;
        static int DirectoryCount;
        static int ZipFileCount;

        static Stream ExtractToStream(IArchiveEntry entry)
        {
            var result = TemporaryStream.Create(entry.Size);

            entry.WriteTo(result);

            result.Position = 0;

            return result;
        }

        static void ProcessZipFile(string zipFilename, Stream zipStream = null)
        {
            using (var archive = zipStream == null ? ArchiveFactory.Open(zipFilename) : ArchiveFactory.Open(zipStream))
            {
                foreach (var entry in archive.Entries.OrderBy(e => e.FilePath))
                {
                    if (entry.IsDirectory)
                    {
                        DirectoryCount++;
                        continue;
                    }

                    var extension = Path.GetExtension(entry.FilePath);

                    if (string.Compare(extension, ".zip", StringComparison.OrdinalIgnoreCase) == 0)
                    {
                        ZipFileCount++;

                        using (var zipEntryStream = ExtractToStream(entry))
                        {
                            ProcessZipFile(Path.Combine(zipFilename, entry.FilePath), zipEntryStream);
                        }
                    }
                    else
                    {
                        FileCount++;
                    }
                }
            }
        }
    }
}
The problem is a 0x06064b50 signature in my .ZIPX file.
This isn't recognized by SharpCompress and so throws the Unknown Header exception.
However I found that if I just ignore this header (by adding a switch case for that signature and creating a null header), then all the rest of the entries are correctly read and I now get exactly the same results as DotNetZip - ie I can now see those 3 missing files

I had a look around in the code and on the internet and found the following:-

There is some ambiguity over that signature:
This page http://www.sxlist.com/TECHREF/language/delphi/swag/ARCHIVES0022.html written by Phil Katz says 0x06064650 is END OF CENTRAL DIR STRUCTURE (and I found this on a couple of other pages)

However, this page http://fileanalysis.net/zip/ says it is "zip64 end of central directory record"

This contradicts the constant in your code which says that
private const uint ZIP64_END_OF_CENTRAL_DIRECTORY = 0x07064b50;
and is specifically searched for and throws the same Unknown Header exception.

Now, the latter webpage says that signature is actually "zip64 end of central directory locator" and their example shows it directly in front of the 0x06064b50 entry.

So here is my interpretation:-
1) My sample ZIPX file has the 0x0606 header but not the 0x0706 header. This was creating using WinZip Pro BTW. Therefore I am guessing that the Locator 0x0706 is either optional or can be omitted when the next header is a zip64 End of CentralDirectory record.
2) I think your naming of 0x0706 is incorrect - it should be Zip64EndOfCentralDirectory__Locator__Signature not Record. This is confirmed by checking the DotNetZip source code.
3) The Zip64xxx headers can be ignored until zip64 support is added.
4) It is preferred to do this rather than throw an unknown header exception because otherwise the entire containing zip may be unreadable. In the sample code above, you can see I'm using Linq to order the Entries by their FilePath. Until I skipped these zip64 headers, I could not use this code because the exception stopped access to all of the entries.

The only changes I needed were these - case statements (including deleting the existing case ZIP64_END_OF_CENTRAL_DIRECTORY:) and an IgnoreHeader class:-
                case 0x07064b50:
                case 0x06064b50:
                    {
                        var entry = new IgnoreHeader((ZipHeaderType) (-1));
                        entry.Read(reader);
                        return entry;
                    }

    internal class IgnoreHeader: ZipHeader
    {
        public IgnoreHeader(ZipHeaderType type): base(type)
        {}

        internal override void Read(BinaryReader reader)
        {
        }

        internal override void Write(BinaryWriter writer)
        {
            throw new System.NotImplementedException();
        }
    }

Viewing all articles
Browse latest Browse all 239

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>