Discussion:
[KPhotoAlbum] Startup performance: the final frontier
Robert Krawitz
2017-05-31 02:22:11 UTC
Permalink
The last big performance problem I have with kpa is startup. It takes
about 18 seconds to start up, with the 220K photos in my database.

I intend to kcachegrind this one of these days. It's a somewhat
painful experience, given how long it takes to start that way. At
least from the splash screen, it looks like maybe 80% of the time is
spent in loading the database, the remainder in creating the main
window (whatever that really means).

Saving is also problematic, and autosave is very annoying, but not as
bad as startup.

I know there was a move afoot a while back to have an SQL database
rather than an XML file (and XML's pretty unwieldly for this), but
IIRC nobody ever got it working and eventually it was pulled out.
--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
Robert Krawitz
2017-06-01 00:32:22 UTC
Permalink
Post by Robert Krawitz
The last big performance problem I have with kpa is startup. It takes
about 18 seconds to start up, with the 220K photos in my database.
I intend to kcachegrind this one of these days. It's a somewhat
painful experience, given how long it takes to start that way. At
least from the splash screen, it looks like maybe 80% of the time is
spent in loading the database, the remainder in creating the main
window (whatever that really means).
Saving is also problematic, and autosave is very annoying, but not as
bad as startup.
I know there was a move afoot a while back to have an SQL database
rather than an XML file (and XML's pretty unwieldly for this), but
IIRC nobody ever got it working and eventually it was pulled out.
So, a little research shows me kpa isn't parsing the config file in
the most efficient way:
http://3gfp.com/wp/2014/07/three-ways-to-parse-xml-in-qt/

It's using QDomDocument, which reads the entire file in. That makes
it easy to extract things the way we want, but is less efficient than
using QXmlStreamReader, which is more of a tokenizer that feeds us
elements.

It's possible to read the config.xml file quickly; I did so with
xmllint, which took 2 or 3 seconds. It's something that may be worth
investigating.
--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
Robert Krawitz
2017-06-01 00:35:11 UTC
Permalink
Post by Robert Krawitz
The last big performance problem I have with kpa is startup. It takes
about 18 seconds to start up, with the 220K photos in my database.
I intend to kcachegrind this one of these days. It's a somewhat
painful experience, given how long it takes to start that way. At
least from the splash screen, it looks like maybe 80% of the time is
spent in loading the database, the remainder in creating the main
window (whatever that really means).
Saving is also problematic, and autosave is very annoying, but not as
bad as startup.
I know there was a move afoot a while back to have an SQL database
rather than an XML file (and XML's pretty unwieldly for this), but
IIRC nobody ever got it working and eventually it was pulled out.
Never mind :-)

I was looking at code that has been #ifdef'ed out, and I see it's
already using QXmlStreamReader.
--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
Andreas Schleth
2017-06-01 20:36:38 UTC
Permalink
Post by Robert Krawitz
The last big performance problem I have with kpa is
...

Hi Robert,

I have been following your exchanges with Johannes about the performance
problems of KPA for a while and would like to add my two cents (or is it
five?).

First, changing the structure of the index.xml at this point would be a
really bad move from my point of view. Why?

I still use a canned version KPA 4.1.whatever with an ancient Suse 11.2
in a virtual machine if I have to time shift some images from a foreign
source. And from there I work directly on my original database via
host-only network. In that old KPA version (KDE4) the kipi-plugin for
time shifting still works. In the current version it is out of business
because some design decisions of the digicam community. AFAIK Martin
Füssel is looking into this, but this might take a while.

The charming thing about KPA at the moment is, that my database
operations with this ancient KPA version on index.xml files (maintained
normally with the most recent git master) do not corrupt the database.
Thus, this is a feature (backward compatibility), that would be lovely
to keep for the time being.
[[I probably only understood some 10% of what you wrote about, but
seeing mentions of xml structure and parsing made me listen up ...]]

Second, as you are looking at performance ... did you perchance ever
look at thumbnail generation? My database of some 30000 images resides
on a NFS share and rebuilding thumbnails still takes close to 30 minutes
with KPA not being very reactive in the meantime, even tending to crash
in the (longish) interval until the progress bar first shows up - this
is after Johannes did some improvements to the process already. So,
this would be a point where you could harvest many minutes and not
seconds of performance gain :-) [[Also the files in the .thumbnails
folder always have the wrong (for my setup) permissions (rw-r--r-- while
everything else is rw-rw-r--) - but this might be a problem existing
between keyboard and (my) chair]]

Anyway, it is great to see an MIT engineer hacking away at KPA.

Best regards, Andreas
Robert Krawitz
2017-06-02 03:51:05 UTC
Permalink
Post by Andreas Schleth
Post by Robert Krawitz
The last big performance problem I have with kpa is
...
Hi Robert,
I have been following your exchanges with Johannes about the performance problems of KPA for a while and would like to add my two cents (or is it five?).
First, changing the structure of the index.xml at this point would be a really bad move from my point of view. Why?
I still use a canned version KPA 4.1.whatever with an ancient Suse 11.2 in a virtual machine if I have to time shift some images from a foreign source. And from there I work directly on my original database via host-only network. In that old KPA version (KDE4) the kipi-plugin for time shifting still works. In the current version it is out of business because some design decisions of the digicam community. AFAIK Martin Füssel is looking into this, but this might take a while.
If you're using essentially a KPA appliance, none of this should
matter -- you're going to be using a fixed version of KPA. There's no
suggestion that KPA would not be able to recognize an older data file.

I wasn't actually suggesting any particular changes in the XML file
format; I wanted to look at startup time overall. It looks like the
time spent reading the XML file is only a bit more than half of the
total timme (about 10 seconds out of 18 total to start up). I tried
replacing the code that actually reads the images with code that does
just enough parsing to get through the file; that got the read time
down to 3 seconds, so that doesn't look all that fruitful.

I'm not a big fan of using XML for this purpose period. It would be a
much better fit for a relational database, IMO. The schema is pretty
straightforward.
Post by Andreas Schleth
The charming thing about KPA at the moment is, that my database operations with this ancient KPA version on index.xml files (maintained normally with the most recent git master) do not corrupt the database.
Thus, this is a feature (backward compatibility), that would be lovely to keep for the time being.
[[I probably only understood some 10% of what you wrote about, but seeing mentions of xml structure and parsing made me listen up ...]]
Again, I'm not proposing breaking back compatibility. Don't confuse
back compatibility with forward compatibility. Your old version of
kpa might not be able to read a new index.xml file (and certainly
wouldn't be able to handle the EXIF database, if for no other reason
than you probably don't have sqlite3 around), so there may not be
forward compatibility.
Post by Andreas Schleth
Second, as you are looking at performance ... did you perchance ever look at thumbnail generation? My database of some 30000 images resides on a NFS share and rebuilding thumbnails still takes close to 30 minutes with KPA not being very reactive in the meantime, even tending to crash in the (longish) interval until the progress bar first shows up - this is after Johannes did some improvements to the process already. So, this would be a point where you could harvest many minutes and not seconds of performance gain :-) [[Also the files in the .thumbnails folder always have the wrong (for my setup) permissions (rw-r--r-- while everything else is rw-rw-r--) - but this might be a problem existing between keyboard and (my) chair]]
It's very hard for me to evaluate the numbers without more
information. How big are your images, how fast is your network (both
bandwidth and latency), what's behind the NFS server (flash or
spinning rust of some variety), what's your CPU utilization while
you're building the thumbnails, how long does it take using local
disk? I don't remember exactly how long it takes me to build
thumbnails, but offhand that doesn't sound very slow to me. That's
about 17 images/second you're thumbnailing; considering that each
image requires a variety of NFS operations, some of which hit the
disk, you may not be able to do a lot better. I did look at the code,
and the thumbnails are batched up and only written every 100 images,
so the code's avoiding small writes (which are very bad over NFS).

Also, why do you frequently rebuild your thumbnails? That's usually
something you do once and forget, unless you want to keep changing
your thumbnail size, which is going to be inefficient whatever you do.

The thumbnail storage is optimized for reading (viewing), and in my
experience, optimized very well indeed -- you can jump scroll and the
thumbnails keep up, even if you jump around all over the place. I
remember that there was a lot of work done on it. They're stored as
fixed size (uncompressed) bitmaps in a hashed file, so reading a
thumbnail is one seek and one read.

NFS works best for streaming I/O. That's true for most high latency
protocols. It is not good for small random I/O.
--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
Robert Krawitz
2017-06-02 12:06:34 UTC
Permalink
Post by Robert Krawitz
Also, why do you frequently rebuild your thumbnails? That's usually
something you do once and forget, unless you want to keep changing
your thumbnail size, which is going to be inefficient whatever you do.
I suspect there can be made some optimizations in that corner. I see
* Thumbnail rebuilding processes kicking in some time when I
don't quite understand why (on old pictures/videos which
already should have been processed a long time ago)
Full rebuild, or just a few? If video thumbnail processing fails, I
see it trying to regenerate those thumbnails.
* No good error handling if the thumbnail file can not be generated
because the file system credentials have run out. It kindof seems
to assume that all I/O operations complete fast and with success.
That should be fixed; there are other reasons thumbnail file creation
can fail.
* I have found empty videothumbnail files (an empty file does not
make sense, if there is no info, then you don't need to create the
file at all)
* The 9000 video thumbnail files (of 900 videos) are all in one directory
I'd personally prefer just one thumbnail per file, stored in the
thumbnail cache file just like any other (at least as an option).
Thumbnails are killing AFS I/O performance so bad that I made a link
back to local laptop HD. Which of course means that thumbnails can
no be shared between KPAs on more than one computer.
Then I would like to have a setting "read thumb from image/video file
thumb if available".
That's going to clobber your performance a whole lot worse -- you'll
be doing endless remote I/O calls on separate files, requiring extra
protocol exchange.
Btw, it looks to me that kphotoalbum at startup every time traverses
all directories and stat()s all files/images even if "search for new
images at startup" is off. Is that necessary?
I just checked with strace on the current git master and saw no
evidence of that.
--
Robert Krawitz <***@alum.mit.edu>

*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
Loading...