Quantcast
Channel: sharpcompress Discussions Rss Feed
Viewing all articles
Browse latest Browse all 239

New Post: FileHeader FileName encoding for RAR files

$
0
0

Hi,

 

I'm trying to extract files from a RAR archive that contains multi-byte characters for the file entries but I cannot seem to get the correct results.

Here's an example of one of those entries: 00 ßalou.jpg

After the header is read, the filename will be set to "00 �alou.jpg\0��", which is incorrect. Upon inspection I have noticed that the Unicode flag is not found causing it to call the DecodeDefault method which boils down to Encoding.UTF8.GetString(). This should be right because the RAR specification states that UTF8 encoding will be used for Multi-byte strings.

I was able to truncate the final part of the string by ignoring the last few bytes after the zero-byte like so:

int length = 0;
while (length < fileNameBytes.Length && fileNameBytes[length] != 0)
{
  length++;
}

That does fix the garbage at the end of the string but I have no idea it that is the correct solution. The � in the middle of the string is still present. I've tried to use the FileNameDecoder class (which is supposed to work for unicode strings in RAR files) and that doesn't work (fully)

FileNameDecoder.Decode(fileNameBytes, length) // returns "Ìß"

Notice how it does decode the "ß" character but none of the rest of the string. Does anyone have some more info about this topic? I would love to know how to fix this.

Also I have tried creating an entire new RAR archive with the same entries using WinRar to make sure I am not using a faulty RAR archive.

I'm using dec 11's 0.8.2 - WP7 version in a WinRT application.

 

 

 

 


Viewing all articles
Browse latest Browse all 239

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>