XML expert??

Carsten Svaneborg gauss at zqex.dk
Tue Aug 15 00:05:09 CEST 2006


Hi!

Pleech.pl works by downloading html pages, and extracting info from
these by parsing the HTML. As you can imagine this is a major pain,
with silly and fragile regex'es.

However, all the same information is available in XML format! So I
think all the existing code should be scrapped, and replaced by
an XML base EPO interface, and XML formatted patent files.
 
Data sources:

1) http://ops-i.espacenet.com/ 

This is a SOAP interface that provides XML formatted bibliography,
claims, and description and more.

2) 
http://www.epoline.org/portal/PA_1_0_FS/jsp/application/xmldocument.jsp?RAPPNO=03795517.6

Provides the XML formatted epoline data for patent with application 
number "03795517.6".

3) http://ebd2.epoline.org/jsp/ebdst36.jsp 
http://ebd2.epoline.org/jsp/ebdabs.jsp

EBD files in XML format, that contains lists of new applications, 
applications that has been granted etc. This is required for automagially
updating the patent states in gauss.

(download_ebdxml.pl fetches these files, and stores them see 
http://gauss.ffii.org:8080/~zqex/ebd_xml/ no processing is done. )

-- 
  Mvh. Carsten



More information about the Gauss-parl mailing list