

#Kiwix zim file dump full#
effectively requiring a CD's worth of space to store the full fifteen-volume set (186 MB of text, 430 MB of images). epub's, this one with 5500 Wikipedia articles which had been distributed on CD as an encyclopaedia for schools in October 2008.Īs most images (in whatever size they appeared on the original Wikipedia pages) are retained in this collection, the size of each individual volume appears to be about fifty megabytes. #4 carlb 03-04-2011, 12:07 PMI've posted another set of encyclopaedia. This would appear to be a firmware bug, as the text is entirely readable on PC-based tools such as the document viewer in Calibre. At some point (usually on the first page turn) the Kobo will decide that it's taking too long to make sense of such a huge, unwieldly HTML table and reboot itself. Open the encyclopaedia to the first chapter and, instead of using the menus to skip directly to another chapter using the table of contents, just try paging through Chapter 1 (the huge table listing what's in this selection). The handling of large tables (such as the main "Version 0.5" content overview page which appears as the first chapter of these generated *.epubs) appears to be breaking badly on Kobo wi-fi. I've split this project into four separate *.epub files (like the alphabetical volumes of a printed encyclopaedia) and removed all but the first-level article names from the table of contents and the result is almost usable.

The table of contents generation in Sigil is also problematic, insofar as it insists on taking every HTML heading (h1, h2, h3, h4, h5) from within the individual articles and creating a multi-megabyte table of contents which is unusable to the reader due to its sheer size. I also find that there seems to be a practical limit (likely no more than 500 typical encyclopaedia articles) for what can be contained in a single *.epub file without creating problems. Subsequent attempts based on the Kiwix-style *.ZIM archives appear to be more successful as these abandon the oddball three-level, one-character base name file structure of the old 0.5 version of this collection this was used for the schools encyclopedia described in a subsequent post to this thread. This leaves many missing images in the resulting *.epub files. I have noticed one bug in Sigil if there are two or more images which have the same base filename but different path, the auto-rename which Sigil attempts to use to resolve this conflict tends to be sporadic at best. I'd tried renaming the files (which have names like /html/art/a/w/9.html) to something meaningful and then importing them into a Sigil *.epub document. The articles will need to be renamed to add the '.html' suffix, to replace any blank spaces in the name with _ underscores and to fix any URL-encoded accented/Unicode characters before importing this mess into Sigil.) At that point, 'zimdump -D destination_directory -f first_article_name input_filename.ZIM' should dump everything back to the original format, articles in destination_directory/A/* and images in destination_directory/I/*. Once you have zimlib installed from source, go to zimlib/src/tools and type 'make' to build the optional command-line utilities which you will need. ZIM archives into CD or DVD-sized piles of individual *.html and *.png/*.jpg files.

#Kiwix zim file dump code#
(Update: There is a utility "zimdump" provided as part of the source code package for Zimlib, available from this has been used successfully to convert Kiwix. directory tree with plain, uncompressed web pages for this small selection of Wikipedia texts. The "kiwix-0.5.iso" archive, however, is an old version which contains an /html/. ZIM, a non-standard compressed file format which can be read with the open-source libzim, according to online documentation.
#Kiwix zim file dump Offline#
Kiwix is a project intended to create an offline reader for Wikipedia. epub but am encountering a few bugs along the way. #1 carlb 02-23-2011, 07:32 PMI've been attempting to import the 1900+ Wikipedia articles from the "kiwix-0.5.iso" CD as an.
