CGPDF - Saving images with FlateDecode Filter

https://stackoverflow.com/questions/10401799

04-06-2021
|

Pregunta

I'm writing a PDF parser for work, and we're using Core Graphics to read in all of the data with callbacks and then writing it out with Lib Haru because our client needs to write out "real" annotations and CG can't do it.

Well, I've gotten to the point where I am getting images (and saving them out to a file to make sure I'm doing it right before I start to draw them) and I've run into an issue. I'm getting all of the Image XObjects out of the Resource dictionary and then attempting to save them out with this code

NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDir = [paths objectAtIndex:0];                

NSData *imageFileData = (NSData *)CGPDFStreamCopyData(objectStream, CGPDFDataFormatRaw);

NSString *fileName = [NSString stringWithFormat:@"%@/%s.png", documentsDir, name];
[imageFileData writeToFile:fileName atomically:YES];

where the objectStream is using the CGPDFDictionaryGetStream to extract the XObject. Well, it works fine when the Filter is "DCTDecode", but whenever the Filter is "FlateDecode", the image that is saved is corrupt and won't open.

I read in this post that CGPDFStreamCopyData can decode text with FlateDecode (all the way to the bottom of the post in the comments), but there are only 3 data formats in the CGPDFDataFormats, and none of them work.

I believe I'm also having issues with text that is encoded with FlatDecode. Does anyone have any suggestions on how to go about decoding this? Surely CGPDF has something that handles this since it appears in almost every pdf that I've tried to open (though I haven't been able to locate it).

Edit: I read in a few places that I could decompress it using zlib, so I tried this code that I was able to find about how to do that:

            NSData* uncompressedImageData;
            if ([imageFileData length] == 0) 
                uncompressedImageData = imageFileData;
            else
            {                
                z_stream strm;
                strm.zalloc = Z_NULL; 
                strm.zfree = Z_NULL; 
                strm.opaque = Z_NULL; 
                strm.total_out = 0; 
                strm.next_in=(Bytef*)[imageFileData bytes]; 
                strm.avail_in = [imageFileData length];

                // Compresssion Levels: // Z_NO_COMPRESSION // Z_BEST_SPEED // Z_BEST_COMPRESSION // Z_DEFAULT_COMPRESSION
                if (deflateInit(&strm, Z_DEFAULT_COMPRESSION) != Z_OK) 
                    uncompressedImageData = nil;

                NSMutableData *compressed = [NSMutableData dataWithLength:16384]; // 16K chuncks for expansion
                do 
                {
                    if (strm.total_out >= [compressed length]) 
                        [compressed increaseLengthBy: 16384];

                    strm.next_out = [compressed mutableBytes] + strm.total_out; strm.avail_out = [compressed length] - strm.total_out;
                    deflate(&strm, Z_FINISH);
                } 
                while (strm.avail_out == 0);

                deflateEnd(&strm);
                [compressed setLength: strm.total_out]; 

                uncompressedImageData = [NSData dataWithData: compressed]; 
            }

            if(uncompressedImageData != nil)
                [uncompressedImageData writeToFile:fileName atomically:YES];

The code didn't throw any exceptions when I ran it, but the resulting images were still unreadable.

Solución

Your use of CGPDFStreamCopyData seems to suggest that you have a misunderstanding there: You don't set the format that you want, the function sets this to the format it encounters in the stream. A typical use would be:

CGPDFDataFormat format;
CGPDFStreamCopyData(objectStream, &format);
if (format == CGPDFDataFormatRaw) {
    //handle raw data...
} else if (format == CGPDFDataFormatJPEGEncoded) {
    //handle jpeg data...
} else if (format == CGPDFDataFormatJPEG2000) {
    //handle jpeg 2000 data
}

PNG images are not supported at all by the PDF standard, so you'll never get a valid PNG file out of an image data stream. The options are JPEG, JPEG2K and raw images (see the spec for details on those).

Quartz handles zlib compression transparently, so you'll never get zlib-compressed data yourself.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow