Question

How can I check if a file uploaded by a user is a real jpg file in Python (Google App Engine)?

This is how far I got by now:

Script receives image via HTML Form Post and is processed by the following code

...
incomming_image = self.request.get("img")
image = db.Blob(incomming_image)
...

I found mimetypes.guess_type, but it does not work for me.

Was it helpful?

Solution

If you need more than looking at extension, one way would be to read the JPEG header, and check that it matches valid data. The format for this is:

Start Marker  | JFIF Marker | Header Length | Identifier
0xff, 0xd8    | 0xff, 0xe0  |    2-bytes    | "JFIF\0"

so a quick recogniser would be:

def is_jpg(filename):
    data = open(filename,'rb').read(11)
    if data[:4] != '\xff\xd8\xff\xe0': return False
    if data[6:] != 'JFIF\0': return False
    return True

However this won't catch any bad data in the body. If you want a more robust check, you could try loading it with PIL. eg:

from PIL import Image
def is_jpg(filename):
    try:
        i=Image.open(filename)
        return i.format =='JPEG'
    except IOError:
        return False

OTHER TIPS

No need to use and install the PIL lybrary for this, there is the imghdr standard module exactly fited for this sort of usage.

See http://docs.python.org/library/imghdr.html

import imghdr

image_type = imghdr.what(filename)
if not image_type:
    print "error"
else:
    print image_type

As you have an image from a stream you may use the stream option probably like this :

image_type = imghdr.what(filename, incomming_image)

Actualy this works for me in Pylons (even if i have not finished everything) : in the Mako template :

${h.form(h.url_for(action="save_image"), multipart=True)}
Upload file: ${h.file("upload_file")} <br />
${h.submit("Submit", "Submit")}
${h.end_form()}

in the upload controler :

def save_image(self):
    upload_file = request.POST["upload_file"]
    image_type = imghdr.what(upload_file.filename, upload_file.value)
    if not image_type:
        return "error"
    else:
        return image_type

A more general solution is to use the Python binding to the Unix "file" command. For this, install the package python-magic. Example:

import magic

ms = magic.open(magic.MAGIC_NONE)
ms.load()
type =  ms.file("/path/to/some/file")
print type

f = file("/path/to/some/file", "r")
buffer = f.read(4096)
f.close()

type = ms.buffer(buffer)
print type

ms.close()

Use PIL. If it can open the file, it's an image.

From the tutorial...

>>> import Image
>>> im = Image.open("lena.ppm")
>>> print im.format, im.size, im.mode

The last byte of the JPEG file specification seems to vary beyond just e0. Capturing the first three is 'good enough' of a heuristic signature to reliably identify whether the file is a jpeg. Please see below modified proposal:

def is_jpg(filename):
    data = open("uploads/" + filename,'rb').read(11)
    if (data[:3] == "\xff\xd8\xff"):
        return True
    elif (data[6:] == 'JFIF\0'): 
        return True
    else:
        return False
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top