New Post: ZipX files

April 3, 2013, 2:00 am

≪ Previous: New Post: Problem with Unknown Header (and possible fix)

adamhathcock wrote:

2) Yes. The idea with streaming is that everything is done on the fly in a forward-only stream of bytes. You ought to be able to chain the output of a a current entry on a ZipReader to another ZipReader and have it work.

Now I've read a bit more about the structure of ZIP files, I am under the impression that the directory of entries is stored at the end of the stream.
How can the chaining you describe work with a forward-only stream of bytes?

cheers
Simon

↧

New Post: ZipX files

April 3, 2013, 2:42 am

≫ Next: New Post: Problem with Unknown Header (and possible fix)

≪ Previous: New Post: ZipX files

simmotech wrote:

Now I've read a bit more about the structure of ZIP files, I am under the impression that the directory of entries is stored at the end of the stream.
How can the chaining you describe work with a forward-only stream of bytes?

Zip does have a directory at the end. It has a header to show the start of a file as well as an optional signature at the end of files to know when they end and their size. When I write a zip file with the streaming write, it uses this end of file header.

For reading, I just look for the local file signature to show the start of a file. The directories are always there at the end with extra info but that's not necessarily required to read the files in a streaming manner.

↧

New Post: Problem with Unknown Header (and possible fix)

April 3, 2013, 2:44 am

≫ Next: New Post: ZipX files

≪ Previous: New Post: ZipX files

I think you're on to the fix. Looks like I misnamed and forgot to check for certain headers. The Zip64 headers should just be ignored.

I use this for the zip specification: http://www.pkware.com/documents/casestudies/APPNOTE.TXT It has notes for ZIP64 and regular Zip.

If you could submit a patch, that would be great. I may just give you source access :)

↧

New Post: ZipX files

April 5, 2013, 5:46 am

≫ Next: New Post: ZipX files

≪ Previous: New Post: Problem with Unknown Header (and possible fix)

OK, I see how a forward-only stream just uses the local headers rather than the trailing full directory.

I had a look at the AbstractReader.Skip() code which appears to be opening the compressed stream and reading through it to get past the data and onto the next local header.

Why does it do this when the current local header holds the number of uncompressed bytes?
I know the stream is not seekable but surely it just needs a method of reading and ignoring that number of bytes on the 'current' streams rather than letting the decompression stream do it.
This way would have two big benefits
1) It should speed up skipped items immensely
2) It will allow files with non-supported compression types to be ignored (I have a Zip64 compressed file nested in the ZIPX file somewhere which is causing the whole thing to bomb)

Cheers
Simon

↧

New Post: ZipX files

April 5, 2013, 5:52 am

≫ Next: New Post: ZipX files

≪ Previous: New Post: ZipX files

Skip could be optimized for Zip as you said. I just haven't done it.

Not all (at least if I'm remembering correctly) formats can necessarily do that so I only have the generic unoptimized solution.

↧

New Post: ZipX files

April 5, 2013, 5:57 am

≫ Next: New Post: ZipX files

≪ Previous: New Post: ZipX files

Not as easy as it looks!

At the point I wanted to try a real skip, I have this level of stream nesting!

VolumeStream
RewindableStream
NonDisposingStream
DeflateStream
ZLibBaseStream
ReadOnlySubStream
NonDisposingStream
FileStream

↧

New Post: ZipX files

April 5, 2013, 6:04 am

≫ Next: New Post: ZipX files

≪ Previous: New Post: ZipX files

Might have to expose a raw stream property or method that sits next to entry stream that won't try to decompress anything.

I may have been paranoid and put too many NonDisposingStreams in places :)

↧

New Post: ZipX files

April 5, 2013, 9:17 am

≫ Next: New Post: ZipX files

≪ Previous: New Post: ZipX files

adamhathcock wrote:

I may have been paranoid and put too many NonDisposingStreams in places :)
ZipHeaderFactory might be one such place. It tries to reuse an existing RewindableStream but if it is already wrapped with a NonDisposing stream then another RewindableStream covers the whole lot.

I still can't find a good place to manually skip the source stream. Very frustrating!

↧

New Post: ZipX files

April 5, 2013, 10:16 am

≫ Next: New Post: ZipX files

≪ Previous: New Post: ZipX files

I managed to work out a proof-of-concept hack to manually skip the compressed bytes.
The hack doesn't allow multi-nesting of zips (I don't know how reuse the existing forward-only stream for that...yet) so I changed my archive-based tests to match.

Using forward-only streams is ~30% quicker than exporting/parsing nested zip files: 12.63s compared to 17.96s for archive (and 24.15 for DotNetZip-based archive)
This is scanning my large .zipx test file and counting 145,903 files; 16,015 directories; 97 zip files.

This looks very promising!

I love optimizing code but I only need read access and only really for zip files (though support for other formats would be a major bonus).

I have 3 other major projects on the go at the moment and this code is a bit too large and complicated for me to review and make suggestions though I am happy to help where I can with the forward-only-streaming .zip side of things where I can.
I appreciate you are also busy so I won't post any more but I'll wait until you contact me if I can be of help.

↧

New Post: ZipX files

April 7, 2013, 1:44 am

≫ Next: New Post: ZipX files

≪ Previous: New Post: ZipX files

I managed to get the multi-level nesting of .Zip files to work.
One problem I had was a file that was marked as Encrypted for some reason and was throwing an exception because no password was supplied.
1) I doubt it should have been marked as encrypted since it was an .rc file of just 47 bytes
2) I don't think an entry should throw an exception just because the encryption method is either not supported or no password was supplied - that should be deferred until the user actually tries to read it. And maybe have a flag on the header that says whether the entry is accessible or not.

I also found some simple optimization tweaks - for example RewindableStream doesn't need to create a new MemoryStream on each reset, SetLength(0) will do.
Also, since the Seek is now important, it can reuse a field byte[] rather than create one each time.

152,240 files in 16,972 folders over 103 zip files nested to 3 levels deep now takes 15.1s for archive or 12.3 for forward only streaming.

↧

New Post: ZipX files

April 7, 2013, 2:40 am

≫ Next: New Post: ZipX files

≪ Previous: New Post: ZipX files

Sounds like you figured out some good stuff :)

What is the easiest way for me to get your changes?

I have been thinking about moving this project to github as it is much easier to be collaborative with git.

↧

New Post: ZipX files

April 8, 2013, 12:34 am

≫ Next: New Post: WinRT / Metro support

≪ Previous: New Post: ZipX files

Well my changes were just a proof of concept, not releasable code.
For example, I just commented out the first section in ZipHeaderFactory,LoadHeader just so it would skip throwing an exception for an unsupported encrypted file.

I can zip up my current version so you can diff it with the original and see what got changed (I use BeyondCompare for exactly that) and decide what, if anything, you want to transfer across.

If you go across to github, then I'm happy to help where I can but work commitments mean I can't be a full-time contributor.

Cheers
Simon

PS My email address is simon.hewitt@simmotech.co.uk if you prefer to email directly rather in this discussion forum

↧

New Post: WinRT / Metro support

May 3, 2013, 8:23 am

≫ Next: New Post: WinRT / Metro support

≪ Previous: New Post: ZipX files

Hi,

I built the Windows Phone 7 project and include the dll in my Windows Store App but I've got a API error in the Windows App Certification Kit (WACK) :

API System.String.Format(System.String,System.Object,System.Object) in MSCORLIB, PUBLICKEYTOKEN=7CEC85D7BEA7798E is not supported for this application type. SharpCompress.WP7.dll calls this API.

Can somebody share a sample using SharpCompress in a WinRT app whcih pass certification ? Or Could you please give us some more advice to use SharpCompress in WinRT ?

Thx,

Théau

↧

New Post: WinRT / Metro support

May 5, 2013, 1:56 am

≫ Next: New Post: RarReader Entry CRC issue

≪ Previous: New Post: WinRT / Metro support

In Version 0.9, I changed the project layout and added a WindowsStore assembly. I'm also playing around with a WinMD project.

↧

New Post: RarReader Entry CRC issue

May 10, 2013, 7:34 pm

≫ Next: New Post: RarReader Entry CRC issue

≪ Previous: New Post: WinRT / Metro support

I seem to be having an issue with the entry CRC property. Before calling reader.WriteEntryToDirectory, reader.Entry.Crc is incorrect. After calling reader.WriteEntryToDirectory, reader.Entry.Crc is now correct. Is this a bug?

reader.Entry.Crc <- wrong
execute reader.WriteEntryToDirectory
reader.Entry.Crc < - correct

↧

New Post: RarReader Entry CRC issue

May 11, 2013, 3:22 am

≫ Next: New Post: RarReader Entry CRC issue

≪ Previous: New Post: RarReader Entry CRC issue

The CRC is only loaded as the file is written. It's expected behavior at the moment.

↧

New Post: RarReader Entry CRC issue

May 11, 2013, 1:35 pm

≫ Next: New Post: EntryStream Position and Lenght Exception

≪ Previous: New Post: RarReader Entry CRC issue

Thanks for the clarification.

↧

New Post: EntryStream Position and Lenght Exception

May 17, 2013, 4:05 am

≫ Next: New Post: EntryStream Position and Lenght Exception

≪ Previous: New Post: RarReader Entry CRC issue

Hi,

For my application, I don't need to write the files into an output directory/file (as I saw in all your examples), I just need to read the current file into a string/xml and process it.

Because I couldn't find any example of using EntryStream or WriteEntryTo(Stream writableStream) method in the code examples or unit tests, I decided to make the following test.

using (Stream stream = File.OpenRead(file))
{
    using (var reader = RarReader.Open(stream))
    {
        while (reader.MoveToNextEntry())
        {
            if (!reader.Entry.IsDirectory && Path.GetExtension(reader.Entry.FilePath).Equals(".xml"))
            {
                using (var entryStream = reader.OpenEntryStream())
                {    
                  byte[] bytes = new byte[entryStream.Length];
                  entryStream.Position = 0;
                  entryStream.Read(bytes, 0, (int)entryStream.Length);
                  string result = Encoding.ASCII.GetString(bytes);
                 }
             }
        }
    }
}

This code throws two exceptions "The method or operation is not implemented." - at entryStream.Position and entryStream.Lenght.

Am I doing something wrong here? If so, can you please, provide a similar example?

Thanks!
Diana

↧

New Post: EntryStream Position and Lenght Exception

May 18, 2013, 2:51 am

≫ Next: New Post: EntryStream Position and Lenght Exception

≪ Previous: New Post: EntryStream Position and Lenght Exception

Rather than do that with a byte array, make a MemoryStream then use CopyTo from the EntryStream. Then get the byte array from the memory stream.

For reading that text, you can use a StreamReader wrapping the EntryStream and call ReadToEnd on the reader.

EntryStreams are forward only reading. They don't know their length.

↧

New Post: EntryStream Position and Lenght Exception

May 20, 2013, 1:46 am

≫ Next: New Post: CRC Zero gz

≪ Previous: New Post: EntryStream Position and Lenght Exception

Yes, it worked just fine with MemoryStream.

Thanks!

↧