Python: encryption as means to prevent data tampering

https://stackoverflow.com/questions/1178789

19-09-2019
|

Question

Many of my company's clients use our data acquisition software in a research basis. Due to the nature of research in general, some of the clients ask that data is encrypted to prevent tampering -- there could be serious ramifications if their data was shown to be falsified.

Some of our binary software encrypts output files with a password stored in the source, that looks like random characters. At the software level, we are able to open up encrypted files for read-only operations. If someone really wanted to find out the password so that they could alter data, it would be possible, but it would be a lot of work.

I'm looking into using Python for rapid development of another piece of software. To duplicate the functionality of encryption to defeat/discourage data tampering, the best idea I've come up with so far is to just use ctypes with a DLL for file reading/writing operations, so that the method of encryption and decryption is "sufficiently" obfuscated.

We are well aware that an "uncrackable" method is unattainable, but at the same time I'm obviously not comfortable with just having the encryption/decryption approaches sitting there in plain text in the Python source code. A "very strong discouragement of data tampering" would be good enough, I think.

What would be the best approach to attain a happy medium of encryption or other proof of data integrity using Python? I saw another post talking about generating a "tamper proof signature", but if a signature was generated in pure Python then it would be trivial to generate a signature for any arbitrary data. We might be able to phone home to prove data integrity, but that seems like a major inconvenience for everyone involved.

Solution

As a general principle, you don't want to use encryption to protect against tampering, instead you want to use a digital signature. Encryption gives you confidentiality, but you are after integrity.

Compute a hash value over your data and either store the hash value in a place where you know it cannot be tampered with or digitally sign it.

In your case, it seems like you want to ensure that only your software can have generated the files? Like you say, there cannot exist a really secure way to do this when your users have access to the software since they can tear it apart and find any secret keys you include. Given that constraint, I think your idea of using a DLL is about as good as you can do it.

OTHER TIPS

If you are embedding passwords somewhere, you are already hosed. You can't guarantee anything.

However, you could use public key/private key encryption to make sure the data hasn't been tampered with.

The way it works is this:

You generate a public key / private key pair.
Keep the private key secure, distribute the public key.
Hash the data and then sign the hash with the private key.
Use the public key to verify the hash.

This effectively renders the data read-only outside your company, and provides your program a simple way to verify that the data hasn't been modified without distributing passwords.

Here's another issue. Presumably, your data acquisition software is collecting data from some external source (like some sort of measuring device), then doing whatever processsing is necessary on the raw data and storing the results. Regardless of what method you use in your program, another possible attack vector would be to feed in bad data to the program, and the program itself has no way of knowing that you are feeding in made up data rather than data that came from the measuring device. But this might not be fixable.

Another possible attack vector (and probably the one you are concerned about is tampering with the data on the computer after it has been stored. Here's an idea to mitigate that risk: set up a separate server (this could either be something your company would run, or more likely it would be something the client would set up) with a password protected web service that allows a user to add (but not remove) data records. Then have your program, when it collects data, send it to the server (using the password/connection string which is stored in the program). Have your program only write the data to the local machine if it receives confirmation that the data has been successfully stored on the server.

Now suppose an attacker tries to tamper with the data on the client. If he can reverse engineer the program then he can of course still send it to the server for storage, just as the program did. But the server will still have the original data, so the tampering will be detectable because the server will end up with both the original and modified data - the client won't be able to erase the original records. (The client program of course does not need to know how to erase records on the server.)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow