Research | Practice

2013-07-07

md5sum vs. md5deep vs. openssl md5: MD5 calculation speed test

Sunday, July 07, 2013 Posted by Unknown , No comments
The following experiment is conducted to determine if md5sum, md5deep or openssl md5 hash calculations are faster than the others.

Methodology:
Test 1: A directory of test files consisting of disk images and extracted Windows Registry files will be scanned with the command 'find -type f | xargs -n 1 md5program'. The output of this command will be fed into each program, respectively. The time for the entire process to finish will be tracked with the time command. Xargs is being used to feed each line of output to the md5 program.
Test 1.1: After all programs have finished, the computer will be restarted, and the order the programs are ran in will be changed. For example, if md5sum was the first program in the first round, it will be the second program in the second round. This is to attempt to account for any caching that takes place in the operating system.

The result will be 9 timed runs for calculating the md5 sum of all files in a directory.

Test 2: Big files - In test 1, multiple files of different sizes will be continuously fed to the hashing program. In test 2, a single 10 GB disk image will be given to each program and timed.

Test machine:
Intel(R) Core(TM) i7-3537U CPU @ 2.00GHz (4 cores)
8 GB RAM
Timing cached reads: 8892.40 MB/sec
Timing buffered disk reads: 403.98 MB/sec

Results:
md5sum, run 1 md5sum, run 2 md5sum, run 3
real 3m46.457s real 3m46.117s real 3m59.198s
user 0m42.183s user 0m39.254s user 0m42.235s
sys 3m3.595s sys 3m6.300s sys 3m16.096s
md5deep, run 1 md5deep, run 2 md5deep, run 3
real 3m57.595s real 3m57.666s real 4m2.255s
user 0m43.743s user 0m43.567s user 0m44.551s
sys 3m17.552s sys 3m17.192s sys 3m22.881s
openssl md5, run 1 openssl md5, run 2 openssl md5, run 3
real 3m48.130s real 3m43.142s real 3m50.619s
user 0m37.878s user 0m37.462s user 0m38.558s
sys 3m9.436s sys 3m5.112s sys 3m11.136s

Overall processor time:
Run 1, md5sum 3m45.77s; md5deep 4m01.29s; openssl 3m47.32s.
Run 2: md5sum 3m45.55s; md5deep 4m00.76s; openssl 3m42.57s.
Run 3: md5sum 3m58.34s; md5deep 4m07.43s; openssl 3m49.7s.

Average (rounded to second):
md5sum: 3m50s
md5deep: 4m03s
openssl: 3m46s

Test 1 Conclusions
While the 'real' time may be an interesting factor for investigators, what we are more interested in is the time that the program was using the processor. In this case 'user' is the time taken by the program in user mode, and sys is time in kernel mode. The basic idea is that we want a program that takes the least amount of time on the processor to do the same amount of work.

Note: from the data it appears that another process, perhaps an OS update, was taking place that affected all the programs.

Based on the average of the three runs, it appears that openssl is slightly faster, followed by md5sum and then md5deep.

It should be noted, however, that md5deep is not really being used in the way it was designed. For example, md5sum does not include a recursive mode, where md5deep does. If we run md5deep using its recursive mode - instead of find and xargs - then the time is actually slightly better than the others:

md5deep recursive - restricted to 1 thread
real 3m43.963s
user 0m41.327s
sys 3m2.103s

md5deep: 3m43s

So, in conclusion, if you are are feeding a list of files into an md5 hash program, openssl appears to be a slightly better choice over md5sum and md5deep. However, if you can choose how the file list is ingested, md5deep is probably a better choice because of speed and available features.

Test 2: Testing the time for each program to hash a 10 GB disk image. This test will not use find and xargs - it will use the program directly. md5deep will be restricted to 1 thread.

md5sum md5deep openssl md5
real 1m44.558s real 1m49.491s real 1m43.475s
user 0m19.369s user 0m20.421s user 0m18.233s
sys 1m24.877s sys 1m30.534s sys 1m24.889s

md5sum: 1m44s
md5deep: 1m51s
openssl: 1m43s

For a single large file, it appears that openssl is also the fastest, followed by md5sum and then md5deep.

Please note: all of these tests are, at best, quite shallow. A proper testing environment with many more runs should be conducted.


Bonus observation: As set, xargs will send 1 line to the hashing program and spawn 1 process. This process seems to switch between processors, even while the same hash is calculating.


0 comments:

Post a Comment