gauss at zqex.dk
Tue Aug 15 00:05:09 CEST 2006
Pleech.pl works by downloading html pages, and extracting info from
these by parsing the HTML. As you can imagine this is a major pain,
with silly and fragile regex'es.
However, all the same information is available in XML format! So I
think all the existing code should be scrapped, and replaced by
an XML base EPO interface, and XML formatted patent files.
This is a SOAP interface that provides XML formatted bibliography,
claims, and description and more.
Provides the XML formatted epoline data for patent with application
EBD files in XML format, that contains lists of new applications,
applications that has been granted etc. This is required for automagially
updating the patent states in gauss.
(download_ebdxml.pl fetches these files, and stores them see
http://gauss.ffii.org:8080/~zqex/ebd_xml/ no processing is done. )
More information about the Gauss-parl