Now updates to ZIM files that failed to be added to the library upon
their appearance in a monitored dir are not handled immediately. Instead
they are debounced (deferred/delayed by 1 second) and processed only
after the file has been observed to be stable for some time (1 second as
of this commit).
When a change to a monitored directory is detected, any bad ZIM files
in it (that could not be added to the library during the previous
update) are ignored unless their modification timestamp has changed.
A ZIM file that failed to be added to the library isn't just
discarded by directory monitoring but is added to the set of known
ZIM files with a proper status. This enables to process updates to
such files more efficiently (coming next).
- ContentManager::m_knownZimFiles[dirPath] now stores filenames (rather
than full paths) of known ZIM files in directory at dirPath.
- Library::getLibraryZimsFromDir() returns a set of file names (rather
than paths)
Note that the change of the semantics of ContentManager::m_knownZimFiles
has been carried out via the change in the value of the second argument
of ContentManager::setMonitorDirZims():
1. In KiwixApp::setupDirectoryMonitoring() the latter is fed with the
output of (the now changed) Library::getLibraryZimsFromDir()
2. In ContentManager::updateLibraryFromDir() all variables representing
a set of files now contain filenames only (note that
ContentManager::handleNewZimFiles() returns just a subset of its second
parameter) and therefore produce a set of filenames.
Before this change, once a ZIM file was detected in a monitored
directory it could slip into the known ZIM file list even if it was
a partially downloaded ZIM file that failed to be added to the library.
This happens when another valid/healthy ZIM file is added to the
monitored directory or a known ZIM file is removed from it (the bad
ZIM file is included in the update triggered by the other ZIM file).
Now ZIM files that failed to be added to the library are NOT inserted
in the list of known ZIM files and thus will be reevaluated for every
change in the monitored directory. Thus a file being downloaded by an
external tool will eventually be successfully added to the library
(since updates to the file should trigger an update to the directory due
to the changed modification time of the file).
This however may result in excessive CPU usage in the following
scenario. Suppose that there are a number of partially downloaded files.
Only one of them is being downloaded, while the others are paused. Every
time the next chunk of data is written to that active file, directory
monitoring will try to add *all of* those partial ZIM files.
There are two optimizations against such waste of CPU cycles:
1. Only try to add those files that had their modification
time changed since the previous attempt.
2. Debounce the updates (react to updates, for example, at most once per
second).
Those optimizations will come next.