XML expert??
Carsten Svaneborg
gauss at zqex.dk
Tue Aug 15 00:05:09 CEST 2006
Hi!
Pleech.pl works by downloading html pages, and extracting info from
these by parsing the HTML. As you can imagine this is a major pain,
with silly and fragile regex'es.
However, all the same information is available in XML format! So I
think all the existing code should be scrapped, and replaced by
an XML base EPO interface, and XML formatted patent files.
Data sources:
1) http://ops-i.espacenet.com/
This is a SOAP interface that provides XML formatted bibliography,
claims, and description and more.
2)
http://www.epoline.org/portal/PA_1_0_FS/jsp/application/xmldocument.jsp?RAPPNO=03795517.6
Provides the XML formatted epoline data for patent with application
number "03795517.6".
3) http://ebd2.epoline.org/jsp/ebdst36.jsp
http://ebd2.epoline.org/jsp/ebdabs.jsp
EBD files in XML format, that contains lists of new applications,
applications that has been granted etc. This is required for automagially
updating the patent states in gauss.
(download_ebdxml.pl fetches these files, and stores them see
http://gauss.ffii.org:8080/~zqex/ebd_xml/ no processing is done. )
--
Mvh. Carsten
More information about the Gauss-parl
mailing list