Some info on findpat (us class 705 mining)
bjont97 at student.vxu.se
Mon Apr 18 21:27:11 CEST 2005
I'll try to explain how you could use the findpat program
I wrote to find matches between US and EPO patents.
You can run it with these options:
-e Only fetch a list of patents from the USPTO without
searching EPO. Saves the result to a file.
-f What subclass to start from as in class/subclass, ie the one in 705/1
-t At what subclass to stop fetching data.
Note that this only works if you are fetching data
from the USPTO and search the EPO at the same time.
Otherwise, if you are piping data to it it searches
the EPO for everything you feed it.
-s What class to fetch patents from, usually 705, which is
-u <filename> Read US patent data from a file that you previously
have generated with this program. That is it won't
fetch anything from the USPTO.
-p Read US patent data from the stdin, otherwise just like the
-v The program will be a bit more verbose about what it is doing.
All the fetched US patents is saved to a file called something like
"USPAT_in_EPO_705_2_11.txt", which contains data from 705/2 to 705/11.
US patents found in the EPO's database in the search against the EPO
is saved to a file called something like "USPAT_in_EPO_705_1_1.txt".
I don't know if this is really useful, but it was easy and cheap to do.
The mapping between US and EPO patents are saved to a file called
something like "US_EPO_map_705_1_1.txt" with the same meaning of
the file naming as before.
To fetch everything on the USPTO in the subclass range 705/1 to 705/50
without searching the EPO at the same step, you could do:
./findpat -e -f 1 -t 50
which would save the result to a file called "US_Patents_705_1_50.txt".
That file could be divided later in subclasses and piped/read by the
program one by one later or be piped/read all at one go. That is to
search the EPO for matching EP-patents to the US patents in the list.
./findpat -v -u US_Patents_705_1_50.txt
cat US_Patents_705_1_50.txt | ./findpat -v -p
or with several files
cat file1 file2 file3 ... | ./findpat -p
At the beginning of the program file there is a variable called $sleeptime
which is set to one by default. This number represents the number
of seconds to sleep between each request to the EPO.
Perhaps should be set a bit higher to avoid overloading the server ...
I also set an other variable called $maxfetch to 201 in the beginning
of the program file.
This number represents the maximal number of derived patents that the
program will try to retrieve per patent.
This makes it to get up to 10 pages, that is 20 * 10 + 1.
This is to avoid to get too many pages of related patents.
When I tested the program I came a cross one US patent that had
over 1200 patents it was derived from (US2002194035) and because
it only lists 20 patents per page it would take considerable time
to fetch them all. In this case their server actually got overloaded
and slowed to a crawl.
Hope this will be of some use in this patent mess.
And a big thanks to all of you who work so hard trying to stop
Best regards, Bosse
More information about the Gauss-parl