JWPL is a language independent, database-driven, high performance Wikipedia API that provides structured access to information nuggets like redirects, categories, articles, and link structure. It contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page or standalone with other text, TimeMachine, which reconstructs a snapshot of Wikipedia from a specific date, or multiple snapshots from a time span, and RevisionMachine, which offers efficient access to the history of articles using a dedicated storage format which decreases storage space by 98%. This enables random access to the whole revision history without requiring several terabytes of storage for a single Wikipedia dump.
| Tags | Wikipedia API revisions edit history JWPL converter Parser MediaWiki |
|---|---|
| Licenses | LGPL |
| Operating Systems | Platform Independent |
| Implementation | Java 1.6+ hibernate |
Recent releases


Release Notes: This release fixes a bug in the API which prevented fetching inlink IDs. Several improvements to hibernate session handling have been made.


Release Notes: JWPL Core now depends on Hibernate 4.0.0-final. The PageIterator can now iterate over a predefined list of pages. All components of the RevisionMachine are now able to produce datafiles in addition to SQL dumps. A severe error in the DiffTool has been fixed that caused exceptions when creating a new revision dump.