Suggestion: A way to reduce overhead and prevent duplicate uploads for media
* if a match is found, cancel the upload and serve the match instead.
-
hashes of files can sometimes coincide with hashes of other files. This method isn't foolproof.
-1 -
@lolcatw That's why you salt your hashes and use a strong hashing algorithm. Salting not only makes collisions, which are already astronomically unlikely even with weak hash algorithms like MD5 and CRC32, even more astronomically unlikely, but also prevents hashes from being looked up e.g. from common passwords. It's secure enough to be used to store passwords in user databases, I'm sure it'll work for files, too.
We're talking about an event that is less likely than the same person winning the lottery several times in a row. If you're gonna let that slim chance of failure stop you from using an algorithm, then you will get nothing done. Hardware faults, like drive failures that will prevent a program from writing data to a file, are far more likely, yet programmers deal with those errors just fine.
0 -
bradenbest
You were talking about MD5 and CRC32, then right after that you were talking about hashing passwords. Were you implying it was a good idea to use these algorithms for storing passwords? Because that's incredibly unsecure. Furthermore, it isn't astronomically impossible to have hash collisions, especially with MD5, at discord's scale. While hashing becomes really useful for filtering duplicates, it's important to make sure you're not referring to a different file that has the same checksum. An additional check should be added there to make sure the two files really are identical. Lastly, if the user posts a file that has been posted a really long time ago, odds are the hard drives on discord's end will have to seek back and forth alot more to dig up old files, making the load times alot slower. If the drives used aren't good at random reads, the app will become sluggish, especially if the same logic is implemented to messages.
What I think discord should do is adopt duplication detection for messages that are less than 2 weeks old to prevent such a problem.
0 -
lolcatw
> You were talking about MD5 and CRC32, then right after that you were talking about hashing passwords. Were you implying it was a good idea to use these algorithms for storing passwords?
Not at all. MD5 and CRC32 are extremely common algorithms for hashing files. Especially on image boards and console emulators. If I were discussing password systems exclusively, I'd be bringing up secure hash algorithms instead, like SHA512. Though if you ask me, I don't think passwords are secure enough; we should be using RSA pubkeys for authentication instead. It would be way more secure and remove the human element altogether. Google tried to implement something like this, but alas, people like their convenience more than they value their security.> What I think discord should do is adopt duplication detection for messages that are less than 2 weeks old to prevent such a problem.
Yeah that sounds good. Some other solutions would be to apply it per-user, so two different users can upload the "same" file, or to add a setting to prompt the user saying the file matches, show a preview of the matched media, and let the user choose to use the match or upload the file anyway. If they upload anyway, the filename would be changed to something random and the file would be re-hashed using its filename as a salt to guarantee a unique hash.
0
Please sign in to leave a comment.
Comments
4 comments