Discussion:
[KPhotoAlbum] (Re)building the EXIF database
Robert Krawitz
2016-01-18 23:08:44 UTC
Permalink
At least at first blush, this looks very inefficient:

void Exif::Database::insert( const DB::FileName& filename, Exiv2::ExifData data )
{
if ( !isUsable() )
return;

QStringList formalList;
Database::ElementList elms = elements();
for( const DatabaseElement *e : elms )
{
formalList.append( e->queryString() );
}

QSqlQuery query( QString::fromLatin1( "INSERT into exif values (?, %1) " ).arg( formalList.join( QString::fromLatin1( ", " ) ) ), m_db );
query.bindValue( 0, filename.absolute() );
int i = 1;
for( const DatabaseElement *e : elms )
{
e->bindValues( &query, i, data );
}

if ( !query.exec() )
showError( query );
}

For each filename, it has to reconstruct the INSERT (which is a fixed
string), bind the values to it, and then exec the query. I'm not
worried about the CPU consumption (which is negligible; kpa's barely
showing up on top, consuming maybe 2-5% of the CPU) as with the I/O,
which looks to be horrendously inefficient. The relevant output I'm
getting from iostat is:

avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.00 0.40 7.87 0.00 90.50

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdb 93.60 1526.40 511.20 7632 2556

Notice how much writing it's doing, how poor the throughput is, and
how hard it's hitting the disk transactions. No doubt I could do a
little better by storing my EXIF database on a separate disk (or an
SSD, but my collection is much too big for that to be affordable), but
that's not what kpa's set up for. The database is presumably forcing
synchronous writes for each image's data insertion.

I'd think that using execBatch() to batch up the inserts (maybe 100
images at a time) would be a lot faster. Thoughts, anyone?
--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
Johannes Zarl-Zierl
2016-01-18 23:42:23 UTC
Permalink
Hi Robert,
Post by Robert Krawitz
For each filename, it has to reconstruct the INSERT (which is a fixed
string), bind the values to it, and then exec the query.
[...]
I'd think that using execBatch() to batch up the inserts (maybe 100
images at a time) would be a lot faster. Thoughts, anyone?
Even without doing any profiling myself your analysis seems to be on the
point. I'll try and see how much speedup a better solution will actually bring
us.

Thanks,
Johannes
Robert Krawitz
2016-01-19 00:23:20 UTC
Permalink
Post by Johannes Zarl-Zierl
Hi Robert,
Post by Robert Krawitz
For each filename, it has to reconstruct the INSERT (which is a fixed
string), bind the values to it, and then exec the query.
[...]
I'd think that using execBatch() to batch up the inserts (maybe 100
images at a time) would be a lot faster. Thoughts, anyone?
Even without doing any profiling myself your analysis seems to be on the point. I'll try and see how much speedup a better solution will actually bring us.
92 IOPS works out to about 10.8 ms/IOPS. The drive in question is a
5400 RPM 2 TB drive, so the average rotational latency is about 5.5
ms; 5.3 ms seek latency sounds about right. So that, in combination
with seeing something in the range of 2-5% CPU utilization (on an
older processor -- i7-920XM), suggests very strongly that this is
indeed completely I/O-bound.

The numbers suggest that I/O and CPU would balance somewhere around
20-50 images per batch, although reading the metadata from the image
files will result in an irreducible number of I/O ops too. If we
wanted to get really fancy, we might be able to use a scout thread to
preread the first 64K or whatnot of each image, but that would be for
a later stage.

I suspect batching this up in 100-200 images would work well. I think
we can afford to bias in favor of performance over safety; if kpa
crashes, simply start the operation over.
--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
Loading...