Monday, 21 September 2015

Beta, but can it be better? PatentsView open for trial

Among the many fascinating data- and information-related items picked up by the ever-vigilant Sabrina I. Pacifici for her beSpacific website ("Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002"), is the following item, headed "USPTO PatentsView Beta" last Friday.  Drawn from the PatentsView "About" link, with a few of our comments interspersed in bold red, the text reads thus:
Open data can be full
of little surprises
PatentsView is a prototype patent data visualization and analysis platform intended to increase the value, utility, and transparency of US patent data. The initiative is supported by the Office of Chief Economist in the US Patent & Trademark Office (USPTO), with additional support from the US Department of Agriculture (USDA) [it's notable that this initiative to increase transparency comes from one of the world's currently most transparent IP offices; it appears to be consonant with the demands, which it antedates, made by the Data Transparency Coalition, also recorded by beSpacific and discussed by Aistemos here]
The PatentsView initiative was established in 2012 and is a collaboration between USPTO, USDA, the Center for the Science of Science and Innovation Policy, the University of California at Berkeley, Twin Arch Technologies, and Periscopic. The PatentsView platform is built on a newly developed database that longitudinally links inventors, their organizations, locations, and overall patenting activity. The platform uses data derived from USPTO bulk data files. These data are provided for research purposes and do not constitute the official USPTO record [to the extent that the visualisation reflects bulk data and not the status or content of individual patents, this will presumably not be a matter of concern]
The data visualization tool, query tool, and flexible API enable a broad spectrum of users to examine the dynamics of inventor patenting activity over time and space [well, some space: the data is drawn from US records alone]. They also permit users to explore patent technologies, assignees, citation patterns and co-inventor networks. For researchers in particular, PatentsView is intended to encourage the study and understanding of the intellectual property (IP) and innovation system; to serve as a fundamental function of the government in creating “public good” platforms in these data; and to eliminate the wasteful and redundant cleaning, converting and matching of these data by many individual researchers, thus freeing up researcher time to do what they do best—study IP, innovation, and technological change. 
This initiative is directly responsive to the President’s open government agenda. In the Memorandum on Transparency and Open Government, issued on January 21, 2009, the President instructed the Director of the Office of Management and Budget (OMB) to issue an Open Government Directive. This directive described the steps that executive agencies should take towards the goal of creating a more open government. Those steps are: 
(1) publish government information online; 
(2) improve the quality of government information; 
(3) create and institutionalize a culture of open government; and 
(4) create an enabling policy framework for open government. 
In addition, USDA’s Agricultural Research Services agency has successfully piloted a study to demonstrate the feasibility of using PatentsView data to automatically describe the patenting activity of USDA-supported researchers. USDA administrative data from the National Institute of Food and Agriculture, the Agricultural Research Service, and the US Forest Service have been linked with PatentsView data and can be visualized in the prototype web tool. 
The current PatentsView platform is a prototype and the team welcomes feedback [by email to] on data discrepancies”.
Looking further, the PatentsView website has some significant confessions to make: it states, among other things:
Assignee Disambiguation 
The PatentsView data generation process does not fully disambiguate the names of assignees. A preliminary disambiguation of the records corrects for minor misspellings by applying the Jaro-Winkler string similarity algorithm to each pair of raw assignee records. In other words, records that are within a certain bound of similarity are considered the same and are linked together. ...
A corresponding note addresses inventor disambiguation.

Patent owners with the same or slightly different names, and the problems they may cause, are part of the reason why the ORoPO Foundation launched just three months ago a voluntary register of publicly-accessible and verified accurate patent ownership records. A low level of error in patent records is unlikely to cause major problems where large amounts of data are aggregated, analysed and visualised, but can cause serious financial loss or market misdirection in individual cases.

1 comment:

  1. There's some more about PatentsView here: