[KPhotoAlbum] More performance work

Discussion:

Robert Krawitz

2018-05-11 02:47:13 UTC

I've finally managed to completely eliminate the slowdown associated
with autostacking images at load time, as well as eliminating and/or
hiding a lot of I/O latency. This involves four things:

1) Full transaction semantics around EXIF database insertions: start a
transaction at the start of a batch load, commit it or roll it back
as appropriate when the load finishes/is aborted. I originally did
this to eliminate some I/O in the case of very big loads (big
enough to blow out your physical RAM, so the insertion would need a
second pass over the files to grab the EXIF data). However, it
also simplifies things somewhat and may allow us to build
thumbnails concurrently with loading images, which would reduce I/O
(for the same reason) and possibly allow us to get better
parallelism.

2) Use a scout thread to read the images. The scout thread simply
loops over all of the images being loaded and opens the files,
reads them in full, and closes them. If they're already in RAM the
scout thread runs very quickly; if they're not, it runs ahead of
the reader and pulls the files in so that the loader doesn't have
to wait for physical I/O to complete. On a very fast machine with
very slow I/O (NFS in particular) it probably won't be able to keep
ahead of the loader, but there's not a lot we can do. I found that
it significantly improves performance and CPU utilization because
with conventional drives on a reasonably fast processor (quad core
2.8->3.7 GHz) it now gets 100% CPU utilization.

On a slow system, if the scout thread gets too far ahead, it could
actually result in memory contention and extra I/O. If that were
to prove a real world problem we could look into what to do about
it, but it would make the code rather more complex (with
synchronization needed between the scount and the reader).

3) Reverse cache the MD5 checksums. Currently kpa only keeps an
MD5->filename mapping, so when it does the stack check based on
filename it has to recompute the MD5 checksum of the stackee to
ensure that it's not a duplicate. That means lots of I/O,
particularly in my case where I'm usually stacking a new smaller
edited file on top of the full size file. Caching the reverse
mapping eliminates that I/O (and the not entirely free MD5
calculation, although on spinning rust the I/O is the bad part).

4) Strengthening the image info cache so that there's a filename to
ImageInfo hash static to XMLDB/Database.cpp rather than just one
function within it. That's not quite doing what I expect it to do,
so I'm working on it.

I could probably separate out some of these pieces, but they all touch
LoadExtraFile. Right now the code is a mess, with a lot of debugging
code still left, the indentation all over the place, and stuff like
that, but I could generate a patch if people want to review it even in
a rather crude state.

I know I've been a bit single minded about the load performance, but
that's really the very first thing people see when they look at kpa or
any similar program. I know that when I've played with Digikam the
image load is so slow that it makes even experimenting painful.
Loading 3000 images in something under 3 minutes is fast enough that
even quite large image collections can be tested out without too much
discomfort. I'd honestly like to be able to load such small images (1
MB) even faster than the 16-18 images/second I'm getting, but that
will require a closer look at what's going on.

--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton

Robert Krawitz

2018-05-11 03:40:42 UTC

Permalink

Post by Robert Krawitz
I know I've been a bit single minded about the load performance, but
that's really the very first thing people see when they look at kpa or
any similar program. I know that when I've played with Digikam the
image load is so slow that it makes even experimenting painful.
Loading 3000 images in something under 3 minutes is fast enough that
even quite large image collections can be tested out without too much
discomfort. I'd honestly like to be able to load such small images (1
MB) even faster than the 16-18 images/second I'm getting, but that
will require a closer look at what's going on.

Thought something was wrong there. It was actually off by about an
order of magnitude or so; when I batched up the inserts it took more
like 15 seconds to load the 3000 images (about 200/second). The
mistake was inserting them one by one into the image info list, which
requires merging them (to preserve sorted order). Batching them up,
as is currently the case with images not to be stacked, is a lot
cheaper.

It should also make throwing away an aborted load easier, although the
logic's probably still not quite right (it's not inserting them into
the list, but it is inserting them into the hash). The two ways
around that would be to throw away the hash and rebuild it when needed
on abort, or to keep a pending hash as well as pending unsorted list.

In any event, batching it up still results in loading these images at
about 200 MB/sec; if they weren't all in RAM, I'd be I/O limited by my
spinning rust. That's what we want when large SSD's become cheap.

Robert Krawitz

2018-05-12 00:08:16 UTC

Permalink

OK, as promised here's my followon patch. This has a lot of changes,
described in the comment, but the upshot is that it's now just as fast
to load images with duplicate detection and autostacking turned on as
it is without them.

As a side benefit (or bug fix, really), automatic stacking on load now
works (it hasn't worked in quite a while if all of the images in the
stack are in the loaded set, such as if you're shooting RAW+JPEG).

The image scout thread could be separated out if desired. It touches
a lot of the same area of code (NewImageFinder::LoadExtraFiles) as the
rest of the changes, but it doesn't actually affect the rest of it
structurally. The rest of the changes pretty much need to all be done
together, or most of the benefit is lost.