Projects / JWPL

JWPL

JWPL is a language independent, database-driven, high performance Wikipedia API that provides structured access to information nuggets like redirects, categories, articles, and link structure. It contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page or standalone with other text, TimeMachine, which reconstructs a snapshot of Wikipedia from a specific date, or multiple snapshots from a time span, and RevisionMachine, which offers efficient access to the history of articles using a dedicated storage format which decreases storage space by 98%. This enables random access to the whole revision history without requiring several terabytes of storage for a single Wikipedia dump.

Tags
Licenses
Operating Systems
Implementation

RSS Recent releases

  •  21 Feb 2012 01:11

Release Notes: This release fixes a bug in the API which prevented fetching inlink IDs. Several improvements to hibernate session handling have been made.

  •  09 Feb 2012 22:38

Release Notes: JWPL Core now depends on Hibernate 4.0.0-final. The PageIterator can now iterate over a predefined list of pages. All components of the RevisionMachine are now able to produce datafiles in addition to SQL dumps. A severe error in the DiffTool has been fixed that caused exceptions when creating a new revision dump.

Screenshot

Project Spotlight

spdylay

An experimental SPDY protocol implementation in C.

Screenshot

Project Spotlight

Samizdat

An RDF-based open publishing engine.