Discussion:
[KPhotoAlbum] More on thumbnails
Tobias Leupold
2018-05-18 04:58:33 UTC
Permalink
Nice work :-) Don't forget to create a branch and push your changes from time
to time!
1) It misuses timers. Timers can only be started and stopped in the
same thread. Even putting them all in the same function is not
good enough, if that function can be called from different threads;
with the multi-threaded nature of thumbnail building, calling
ThumbnailCache::save() directly means it can be called from
multiple threads.
The practical impact is that the 1-second timer for thumbnail
saving accomplished nothing, because the timer was seldom getting
started (or stopped).
The solution is to never call save() directly; always call it
indirectly through emitting a signal.
2) The thumbnail cache itself is saved very inefficiently; it's
written in full to a temporary file, which is then copied (yes,
copied, not renamed) to the real file. Even with a big database,
such as my own (270,303 images) that's probably not killing me with
I/O, but it's a lot of work to be doing every 100 images.
I'm changing it to do an incremental save where possible. That
technically violates the intent of the file format, which stores
the most recently written file and offset as well as the image
count, but those can be reconstituted on cache reload easily
enough.
Also, in order to do that, I had to add an additional list of
unsaved files (auto-save list). And there's no good reason for the
data structure to be a QMap; a QHash is faster, and ordering
doesn't matter in the cache.
3) The save() routine wasn't protected by a mutex; with multiple
threads processing thumbnails, I'm not sure how we didn't get a lot
of data corruption in the index cache.
I'm working on all of these problems.
Robert Krawitz
2018-05-18 11:52:27 UTC
Permalink
Post by Tobias Leupold
Nice work :-) Don't forget to create a branch and push your changes
from time to time!
What are the testing etc. expectations?
Post by Tobias Leupold
1) It misuses timers. Timers can only be started and stopped in the
same thread. Even putting them all in the same function is not
good enough, if that function can be called from different threads;
with the multi-threaded nature of thumbnail building, calling
ThumbnailCache::save() directly means it can be called from
multiple threads.
The practical impact is that the 1-second timer for thumbnail
saving accomplished nothing, because the timer was seldom getting
started (or stopped).
The solution is to never call save() directly; always call it
indirectly through emitting a signal.
2) The thumbnail cache itself is saved very inefficiently; it's
written in full to a temporary file, which is then copied (yes,
copied, not renamed) to the real file. Even with a big database,
such as my own (270,303 images) that's probably not killing me with
I/O, but it's a lot of work to be doing every 100 images.
I'm changing it to do an incremental save where possible. That
technically violates the intent of the file format, which stores
the most recently written file and offset as well as the image
count, but those can be reconstituted on cache reload easily
enough.
Also, in order to do that, I had to add an additional list of
unsaved files (auto-save list). And there's no good reason for the
data structure to be a QMap; a QHash is faster, and ordering
doesn't matter in the cache.
3) The save() routine wasn't protected by a mutex; with multiple
threads processing thumbnails, I'm not sure how we didn't get a lot
of data corruption in the index cache.
I'm working on all of these problems.
--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
Johannes Zarl-Zierl
2018-05-18 21:29:52 UTC
Permalink
Hi Robert,
Post by Robert Krawitz
Post by Tobias Leupold
Nice work :-) Don't forget to create a branch and push your changes
from time to time!
What are the testing etc. expectations?
Unfortunately, we have no automated testing in place, so I can't expect anything from you in
that regard.

If your code touches threads, I'll need to think about it until I'm fairly certain I understand it.
Then I'll test it and merge to master for further testing. If you can improve upon that sad
state of things and make it easier for me to trust the code, I'd be very grateful...

Cheers,
Johannes
Kerry Sainsbury
2018-05-18 07:33:35 UTC
Permalink
I have to say that I'm really enjoying the commentary. Thanks for the
taking the time to not only fix stuff, but also to write it up!

Perhaps I need to get out more.
1) It misuses timers. Timers can only be started and stopped in the
same thread. Even putting them all in the same function is not
good enough, if that function can be called from different threads;
with the multi-threaded nature of thumbnail building, calling
ThumbnailCache::save() directly means it can be called from
multiple threads.
The practical impact is that the 1-second timer for thumbnail
saving accomplished nothing, because the timer was seldom getting
started (or stopped).
The solution is to never call save() directly; always call it
indirectly through emitting a signal.
2) The thumbnail cache itself is saved very inefficiently; it's
written in full to a temporary file, which is then copied (yes,
copied, not renamed) to the real file. Even with a big database,
such as my own (270,303 images) that's probably not killing me with
I/O, but it's a lot of work to be doing every 100 images.
I'm changing it to do an incremental save where possible. That
technically violates the intent of the file format, which stores
the most recently written file and offset as well as the image
count, but those can be reconstituted on cache reload easily
enough.
Also, in order to do that, I had to add an additional list of
unsaved files (auto-save list). And there's no good reason for the
data structure to be a QMap; a QHash is faster, and ordering
doesn't matter in the cache.
3) The save() routine wasn't protected by a mutex; with multiple
threads processing thumbnails, I'm not sure how we didn't get a lot
of data corruption in the index cache.
I'm working on all of these problems.
--
*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net
"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
_______________________________________________
KPhotoAlbum mailing list
https://mail.kdab.com/mailman/listinfo/kphotoalbum
Loading...