[Gauss-parl] Pres from epip 2005 (+ comments)
Carsten Svaneborg
zqex at mpipks-dresden.mpg.de
Mon Mar 14 02:44:28 CET 2005
On Sunday 13 March 2005 05:01, Roland Orre wrote:
> You may also be surprised about the front page describing CRM114 :)
> but the CRM114 concept is actually not so bad, as it is using higher
> order features, that is, scanning windows of different widths which
> can cath features like: "software", "software process", "software
> process and" "software process and method" etc.
One of the project proposals are a way to identify keywords for
patents, such that these can be used to define related patents.
I was wondering precisely how to handle the problem of e.g.
"wireless networking" vs. "wireless".
Maybe the keywords could be tied into some hirachical structure
(networking, networking/browsing, networking/wireless, networking/lan etc.)
which would provide an entirely new way of browsing and searching
patents, which would be closely matching the way a software developer
would think.
> As I understand the actual implementation is not yet using a
> Bayesian classifier, but as I have very hard to understand
> how this can be done without a Bayesian classifer
The majority of patents has been found by making searches
for software related keywords. So some refining is in all
likelyhood required.
> (which I'm also a kind of expert in, have worked with
> Bayesian classification methods the last 10 years,
> my thesis is about classification as well) I presented
> the only solution that I could really understand.
Great! I would be interesting in reading it. I have read various
reviews on Baysian statistics and ET Jaynes book but mostly
focussing on hypothesis testing and data analysis and not classification.
> I have several suggestions about extensions of the project,
> where I can get involved with the methods I'm currently
> working on, e.g.
> * similarity measures between different patents
> * clustering of patents
This is precisely the goal of the project proposal.
> (both relating to recent project I've been working on, can
> provide papers, three of my latest papers are highly relevant
> for this:
I would be interested in reading them.
(the ones most relevant, I already have stacks of unread articles)
> I also consider it natural to extend the outcome of the
> classification with multiple classes, and we should actually
> copy the whole EPO data base to be able to run this on all
> patents, and later also to the USPTO I think.
I'm not sure about this, since I expect the non-swpat found
by matching keywords to be clustered in "patent space". So
the spam for the Baysian classifier would be "close" to the ham.
Secondly there is a practical constraint the DB is currently 7GB.
> (my speciality is fast data mining of huge data sets, so this
> would be a perfect challenge for me).
Great!!
> One of my future ideas is to transform patent descriptions
> to lambda calculus, which of course is quite a hard problem,
> but then we would be able to compare the actual functionality
> of a patent description
How would you map patent claims into functions?
(relating to the Dilbert strip on
> my "ThankYou" page) and to a comment relating to skidbladnir
> http://www.gnu.org/brave-gnu-world//issue-49.en.html
> that is, when they had investigated 40000 patents they found
> 40 different solutions to technical problems. I think we
> could be able to do something similar for software patents.
Sounds interesting.
--
Mvh. Carsten
More information about the Gauss-parl
mailing list