Snip:
import os
import md5
for root,dirs,files in os.walk('/home/itorres/Media/Video/test'):
for file in files:
fp = os.path.join(root,file)
fp5 = md5.new(open(fp).read()).hexdigest()
print "%s\n\t%s" % (fp,fp5,)
Boom! a quick inventory of my media with md5 checksum so I can search for duplicates based on it. For simplicity’s sake I made a database and metadata free version, only the directory walk, file read, md5 hash and print it.
Now, how does that look in php?
<?php
$dh = opendir('/home/itorres/Media/Video/test');
while (false !== ($file = readdir($dh))) {
$fp = '/home/itorres/Media/Video/test' . '/' . $file;
if(is_file($fp))
echo "$fp\n\t" . md5_file($fp);
}
closedir($dh);
And how does it compare speed-wise? Notice that the php version does not walk the directory.
$ echo "Python" ; time python test.py ; echo "PHP" ; time php -f test.php
Python
/home/itorres/Media/Video/test/Homo Futurus.avi
4cfee62066a1fbebc957f9b2cc8275ff
/home/itorres/Media/Video/test/El mayor error de Einstein.avi
180ce7eeee86ae6bc5eabbfa9e577dce
real 0m5.299s
user 0m3.060s
sys 0m1.272s
PHP
/home/itorres/Media/Video/test/Homo Futurus.avi
4cfee62066a1fbebc957f9b2cc8275ff
/home/itorres/Media/Video/test/El mayor error de Einstein.avi
180ce7eeee86ae6bc5eabbfa9e577dce
real 0m9.289s
user 0m6.264s
sys 0m0.596s