Discussion:
[KPhotoAlbum] More thumbnail investigations
Robert Krawitz
7 years ago
Permalink
I was able to get the thumbnail build time for ~10000 images down
further, but it took somewhat drastic measures.

There are two ways to load JPEG files using libjpeg: from a FILE *
and, as of libjpeg8, from a memory buffer. Loading the file into
memory myself and then feeding the memory buffer to libjpeg improved
the load time with 3 threads from 20 to 18 minutes (which is
significant, since nothing else I had tried, including increasing the
stdio buffer size, did that). It also decreased the IO/sec to around
60. Increasing the max threads from 3 to 8 got the time down to 16
minutes, with slightly higher I/O rates. I'm using 20 MB as the upper
limit to load that way, just for experimentation.

Even with such a large number of threads, it's using very little CPU
time -- mostly about 8-13% (less than one hyperthread). iostat
indicates that it's spending between 1/2 and 2/3 of the time in I/O
wait.

Running it on the SSD, I got well in excess of 400 MB/sec, with rather
modest IOPS in the range of 500/sec, indicating average I/O size on
the order of 1 MB. That's pretty close to saturating the SATA SSD,
which is rated in the range of 500 MB/sec (and is far better than I
can get with any single threaded program). That lends further
credence to my hypothesis that it's I/O limited with more typical
image storage. However, the iostat numbers I'm getting don't look
saturated; this disk should be able to sustain about 100-120 MB/sec
and 120 IO/sec or thereabouts.

To do better, I'd likely either need to use a scout thread or increase
the number of threads still further. Due to the file buffers, that
would likely increase memory consumption, although at least on my
system (which has plentiful memory by typical standards) that's not
likely to cause a major problem. Introducing a scout thread into this
code would not be particularly easy.

The best solution would be to generate thumbnails upon image load for
images up to a certain size. That would combine nicely with the MD5
code, which can also profit from having the entire file (since the
underlying crypto code in Qt only does 16K I/O ops). We could always
postpone the thumbnail generation for really big files (and files that
need load methods other than JPEG or thumbnail extraction from RAW) to
the end.

This work may not be entirely trivial, but it could have a pretty big
payoff when loading files.
--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
Johannes Zarl-Zierl
7 years ago
Permalink
Hi Robert,

Thanks for your detailed analysis (not just this one, but you often dig into things quite
thoroughly and provide useful information)!
Post by Robert Krawitz
The best solution would be to generate thumbnails upon image load for
images up to a certain size. That would combine nicely with the MD5
code, which can also profit from having the entire file (since the
underlying crypto code in Qt only does 16K I/O ops). We could always
postpone the thumbnail generation for really big files (and files that
need load methods other than JPEG or thumbnail extraction from RAW) to
the end.
My gut feeling is that we do read image files too many times (at least 3 times for exif,
thumbnails, md5) and without optimizing for cache-friendliness.
Post by Robert Krawitz
This work may not be entirely trivial, but it could have a pretty big
payoff when loading files.
I've shied away from tackling this issue because of the complexity of the code it touches.

Johannes
Robert Krawitz
7 years ago
Permalink
Post by Johannes Zarl-Zierl
Hi Robert,
Do you know how the improvements interact with high latency media
(i.e. network mounts)?
No, and I have no convenient way to test it. I suppose I could set up
an NFS server somewhere to try it.

I think it will in large part depend on the performance
characteristics of the network medium and the operational behavior of
the filesystem. Network filesystems aren't always especially high
latency per se. A local 1Gb or 10Gb network can have latency in the
sub-1ms region and will have ample bandwidth, but in addition to
having two I/O paths to consider (client-server and server-disk),
there's also the problem of maintaining consistency. Local fileystems
don't have a problem with maintaining consistency between the storage
medium and the state of the memory, because they know that nothing
else is touching the data. In the case of network filesystems, the
client may or may not have any assurance that the state of the
filesystem hasn't changed behind its back. This is particularly
problematic in the case of stateless filesystems.

My changes will likely reduce the amount of bulk data transfer (since
the normal state of affairs is that the image files won't change
behind the user's back), but there will still be a lot more network
traffic because the client will have to check more often with the
server whether the data is still valid.

What I haven't done -- and what would really help network filesystem
users -- would be to have the scout thread slurp the files in, and
then use those in-memory buffers for all succeeding operations
(calculating MD5 checksums, reading EXIF data, and building
thumbnails). That will be a much more complex proposition.

The bottom line is that this will have to be tested by more users.
...
--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
Loading...