[KPhotoAlbum] More thumbnail investigations

Discussion:

Robert Krawitz

7 years ago

I was able to get the thumbnail build time for ~10000 images down
further, but it took somewhat drastic measures.

There are two ways to load JPEG files using libjpeg: from a FILE *
and, as of libjpeg8, from a memory buffer. Loading the file into
memory myself and then feeding the memory buffer to libjpeg improved
the load time with 3 threads from 20 to 18 minutes (which is
significant, since nothing else I had tried, including increasing the
stdio buffer size, did that). It also decreased the IO/sec to around
60. Increasing the max threads from 3 to 8 got the time down to 16
minutes, with slightly higher I/O rates. I'm using 20 MB as the upper
limit to load that way, just for experimentation.

Even with such a large number of threads, it's using very little CPU
time -- mostly about 8-13% (less than one hyperthread). iostat
indicates that it's spending between 1/2 and 2/3 of the time in I/O
wait.

Running it on the SSD, I got well in excess of 400 MB/sec, with rather
modest IOPS in the range of 500/sec, indicating average I/O size on
the order of 1 MB. That's pretty close to saturating the SATA SSD,
which is rated in the range of 500 MB/sec (and is far better than I
can get with any single threaded program). That lends further
credence to my hypothesis that it's I/O limited with more typical
image storage. However, the iostat numbers I'm getting don't look
saturated; this disk should be able to sustain about 100-120 MB/sec
and 120 IO/sec or thereabouts.

To do better, I'd likely either need to use a scout thread or increase
the number of threads still further. Due to the file buffers, that
would likely increase memory consumption, although at least on my
system (which has plentiful memory by typical standards) that's not
likely to cause a major problem. Introducing a scout thread into this
code would not be particularly easy.

The best solution would be to generate thumbnails upon image load for
images up to a certain size. That would combine nicely with the MD5
code, which can also profit from having the entire file (since the
underlying crypto code in Qt only does 16K I/O ops). We could always
postpone the thumbnail generation for really big files (and files that
need load methods other than JPEG or thumbnail extraction from RAW) to
the end.

This work may not be entirely trivial, but it could have a pretty big
payoff when loading files.

--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton

Johannes Zarl-Zierl

7 years ago

Permalink

Hi Robert,

Thanks for your detailed analysis (not just this one, but you often dig into things quite
thoroughly and provide useful information)!

Post by Robert Krawitz
The best solution would be to generate thumbnails upon image load for
images up to a certain size. That would combine nicely with the MD5
code, which can also profit from having the entire file (since the
underlying crypto code in Qt only does 16K I/O ops). We could always
postpone the thumbnail generation for really big files (and files that
need load methods other than JPEG or thumbnail extraction from RAW) to
the end.

My gut feeling is that we do read image files too many times (at least 3 times for exif,
thumbnails, md5) and without optimizing for cache-friendliness.

Post by Robert Krawitz
This work may not be entirely trivial, but it could have a pretty big
payoff when loading files.

I've shied away from tackling this issue because of the complexity of the code it touches.

Johannes

Robert Krawitz

7 years ago

Permalink

Post by Johannes Zarl-Zierl
Hi Robert,
Do you know how the improvements interact with high latency media
(i.e. network mounts)?

No, and I have no convenient way to test it. I suppose I could set up
an NFS server somewhere to try it.

I think it will in large part depend on the performance
characteristics of the network medium and the operational behavior of
the filesystem. Network filesystems aren't always especially high
latency per se. A local 1Gb or 10Gb network can have latency in the
sub-1ms region and will have ample bandwidth, but in addition to
having two I/O paths to consider (client-server and server-disk),
there's also the problem of maintaining consistency. Local fileystems
don't have a problem with maintaining consistency between the storage
medium and the state of the memory, because they know that nothing
else is touching the data. In the case of network filesystems, the
client may or may not have any assurance that the state of the
filesystem hasn't changed behind its back. This is particularly
problematic in the case of stateless filesystems.

My changes will likely reduce the amount of bulk data transfer (since
the normal state of affairs is that the image files won't change
behind the user's back), but there will still be a lot more network
traffic because the client will have to check more often with the
server whether the data is still valid.

What I haven't done -- and what would really help network filesystem
users -- would be to have the scout thread slurp the files in, and
then use those in-memory buffers for all succeeding operations
(calculating MD5 checksums, reading EXIF data, and building
thumbnails). That will be a much more complex proposition.

The bottom line is that this will have to be tested by more users.

Post by Johannes Zarl-Zierl

Post by Robert Krawitz

Post by Johannes Zarl-Zierl

Post by Robert Krawitz
The best solution would be to generate thumbnails upon image load

for

Post by Johannes Zarl-Zierl

Post by Robert Krawitz
images up to a certain size. That would combine nicely with the

MD5

Post by Johannes Zarl-Zierl

Post by Robert Krawitz
code, which can also profit from having the entire file (since the
underlying crypto code in Qt only does 16K I/O ops). We could

always

Post by Johannes Zarl-Zierl

Post by Robert Krawitz
postpone the thumbnail generation for really big files (and files

that

Post by Johannes Zarl-Zierl

Post by Robert Krawitz
need load methods other than JPEG or thumbnail extraction from RAW)

Post by Johannes Zarl-Zierl

Post by Robert Krawitz
the end.

My gut feeling is that we do read image files too many times (at
least 3 times for exif, thumbnails, md5) and without optimizing for
cache-friendliness.

We do indeed. We may read them four times; I'm not certain. I've
actually added a fourth one -- a scout thread (actually, I'm finding
that two scouts work best, but we can tune it) that slurps the data
into RAM, so the other reads are satisfied by buffering (and I've put
in protocol so the scout thread doesn't get too far ahead).

Combining

thumbnail building with everything else helps too.

So I did some more performance measurement, and found that one scout
thread actually works best. I also tuned the I/O sizes for both the
scout thread and the MD5 checksumming.
There turned out to be one more subtle (but quite significant)
performance issue in the new image loading code; it was trying to
compute MD5 checksums on all "modified" filenames, which can be
expensive if you have a lot of suffix substitutions; it was on the
order of 25-33% on both SSD and hard disk. With that (and proper I/O
tuning), I'm now getting the kind of I/O performance I expect on the
hard disk (95 MB/sec or so with 100-110 IO/sec when reading 10 MB
image files). The hard disk maxes out around 115 MB/sec, but that
needs sustained streaming I/O. I'm getting about 350 MB/sec off the
SSD, but that appears to be partly CPU limited on my system; if I turn
off thumbnail building I get about 400-420 MB/sec (the peak is about
550 MB/sec, and I've gotten close to that with sufficient threading).
With an NVMe SSD I'd probably get a little better performance but not
enough to matter. Being able to load 10800 images in 4'30" is quite
satisfactory (it's about 16'20" on hard disk).
I'm pretty confident now (and I'm going to be preparing to push this
code this weekend) that the image loading is pretty close to what
we're going to get on a hard disk system; it would need some pretty
fancy footwork to do better on an SSD.

Post by Johannes Zarl-Zierl

Post by Robert Krawitz
This work may not be entirely trivial, but it could have a pretty

big

Post by Johannes Zarl-Zierl

Post by Robert Krawitz
payoff when loading files.

I've shied away from tackling this issue because of the complexity
of the code it touches.

It's pretty complex code, to be sure, but this is the very first

thing

people see (how fast does it read my photos, and how fast can I skim
through the thumbnails?), and if you have a lot of images, it's very
important from a workflow perspective.

I'm going to try the thumbnail rebuild thing overnight; I'm curious
whether some of my other changes are having a significant impact
there.

...