|
Sustainability
and Governance of Web and GRID Resources in Functional Genomics
9-11
May
2005
Sankt
Augustin, Germany
Organisers:
Paul
van der Vet, Dept. of Computer Science, University of
Twente, Enschede, Netherlands
Theo Huibers, Dept. of Computer Science, University of Twente,
Enschede, Netherlands & KPMG Business Advisory Services,
Amstelveen, Netherlands
Pierre-Alain Binz, GeneBio, Geneva, Switzerland & Swiss
Institute of Bioinformatics, Geneva, Switzerland
Martin Hofmann, Fraunhofer Institute SCAI, Sankt Augustin,
Germany
Report
Introduction
Life
scientists and bioinformaticians increasingly rely on web-based
resources. The number of such resources, as well as the amount
of available content grows continuously. The need for efficient
interoperability is becoming important. Moreover, as data
propagate easily through such media, their quality and pertinence
need to be monitored closely. In the context of the ESF Programme
Integrated Approaches to Functional Genomics, two workshops
have been organised to discuss the use of these resources,
in particular for data integration. , In both, but particularly
during the latter (Geneva, October 2003), it turned out that
the sustainability and governance of web-based resources have
become urgent issues. It costs money to set up and maintain
a resource. Because biology, like any other field of scientific
enquiry, is very dynamic, maintenance is labour-intensive.
Users of the resource will generally want to be assured of
its quality; in other words, a scheme of quality assurance
has to be in place. Governance, or who is responsible for
what, has to be clear for the community. A business model
addresses these issues in their mutual dependence. Therefore,
every resource comes with a business model, no matter whether
their creators are aware of it or not.
To
address the issues involved, a follow-up workshop was held
at the Fraunhofer Institute at Schloss Birlinghoven, Sankt
Augustin, Germany, May 2005, hosted by Martin Hofmann. The
participants included the organisers of the former workshop,
researchers involved in offering web services and/or using
them, representatives of the publishing industry, a representative
of a national funding body, and a representative of an industrial
private enterprise. We chose to deviate from the standard
workshop format of having talks followed by discussions and
adopted a way of working that had all participants actively
involved in exploring the issues. The outcome, unanimously
agreed upon by the participants, can be summarised in one
sentence: the risks involved in the current situation are
so large that guidelines and actions are urgent.
In
the invitation, the issue of business models was underlined.
Briefly (actually, more a caricature), a business model explains
how the mission of an institute is accomplished. It tells
how incomes and expenditures are matched, it identifies risks
and offers strategies to deal with them. Making a profit is
not an essential ingredient of a business model. Every group,
institute or company has a way to balance incomes and expenditures
and to address risks, but normally only commercial firms make
this model explicit. We advocate that web resource providers
in bioinformatics also make their business models explicit,
as a first step towards ameliorating the current high-risk
situation.
Business
models for resources are required to address at least the
following issues: quality assurance, accessibility over particular
time frames (which may range from a few years to, perhaps,
decades), pricing, financing, risks, and control. These issues
are addressed explicitly or implicitly by any institute that
operates a resource as a vital concern: academic institutes
as well as businesses. Since businesses have been operating
with business models for a long time, academic institutes
might learn from their experiences. Ongoing discussions on
using GRID technology for eSciences underline the need for
new, adequate business models for distributed knowledge resources.
The workshop aimed to identify possible business models to
further the ideal of a European information infrastructure
for the life sciences.
The
main risk for most academic resource providers is lack of
funding. Their resources are created in the course of funded
projects, but the very idea of project funding is at odds
with sustainability because a project by definition has a
limited lifetime while the resource is expected to last longer
than that. Since no resource can be maintained without costs,
continued availability requires a source of income. This is
a difficult issue because the major funding bodies tend to
give priority to funding the generation of experimental data
rather than funding the structured storage of data and information
in public databases, even if a portion of the budget can be
allocated to the construction of such resources. Dissemination
and maintenance of the information generated in the course
of a functional genomics project become a problem when the
project stops. Quite a number of projects have already ceased
or are approaching the end of their funding period. Data,
sometimes valuable data, may be lost because there is no follow-up
grant or other source of income to safeguard the continued
availability and maintentance of the resource. In this way,
EU-funded research in functional genomics faces destruction
of capital on an unprecedented scale.
Scientific
Content
Session
1: Exploration of the Field
The
first session started with a welcome by Martin Hofmann and
a short introduction to the workshop and the way of working
by Paul van der Vet.
On
this opening day, we wanted to get a better grip on the subject
of resources in functional genomics and their aspects. The
variety of biological information available over the Web proved
to be too large to admit of an adequate systematisation within
an afternoon: sequences, SNPs, structures, interactions, pathways,
metadata, ontologies, images, literature, and more. As pointed
out by Amos Bairoch and others, from a costing point of view,
there are roughly two types of data resources: repositories
and curated databases. Their cost structures are entirely
different. In particular, the costs for curated databases
are huge compared with those of a repository, and consist
largely of personnel costs. Of course, this human activity
is what makes these resources so valuable.
Session
2: Business Models
The
second day was devoted to business models. There were four
introductions by speakers who all in one way or another are
stakeholders.
Martin Hofmann (SCAI, Fraunhofer Institute, Sankt Augustin)
discussed resources from the creator perspective. He provided
an example of a combined wet lab/in silico experimental setup
that generates data believed to be of interest to others.
He also drew attention to the growing importance of clinical
data. The more complex a biological phenomenon is, the more
likely it is that one can find it in natural-language texts,
because more complex phenomena need lots of context.
Henning Hermjakob (EBI, Cambridge, UK) highlighted the way
EBI financed the often excellent resources they offer. He
was aware of the dangers of project funding for infrastructure
and cited the example of the resource BIND that had to change
its mode of operation drastically as a result of lack of money.
Quite apart from the direct disinvestment, there is indirect
disinvestment which tends to be overlooked: the curators had
to be laid off. Training a curator takes roughly a year or
more, so that laying off trained curators to do other work
constitutes a large source of disinvestment.
Geoffrey Adams (Elsevier Science) presented an overview of
web resource sustainability from a business perspective, systematising
the various components that make up a resource and discussing
their financial aspects. He also warned scientists not to
become addicted to funding, because funding may disappear,
for example, when EU research priorities change.
Finally, Bernd Hägele (Swiss Federal State Secretariat
for Education and Research) provided an instructive but unfortunately
rare example of co-operation between a funding body and resource
maintainers. The Swiss government funds the Swiss Institute
for Bioinformatics (SIB) not only because of the perceived
quality of services and research it offers, but also because
of its scientific importance and because it contributes to
the visibility of Switzerland as a scientific country. The
situation can be considered stable over the medium term.
In
the second part of this session, participants divided into
groups. Each group was asked to design a business model for
a resource of their own choosing, as long as it dealt with
content relevant for functional genomics researchers. We asked,
more specifically, to identify the services and products offered,
the customers of such products/services, and the stakeholders.
In addition, we asked to identify the main cost drivers, what
customers would be prepared to pay, and reasonable revenue
models. Each group reported their thoughts, proposals and
findings to the reunited participantsat the end of this session.
One of the more striking aspects was that that almost all
groups involved in setting up and maintaining data resources
are also heavy users of such resources. This shows that there
is a tight network of resources, and the loss of one of the
nodes may well bring extra costs for the other nodes.To the
surprise of quite a few participants, the circle of stakeholders
proved to be quite large.
Session
3: Drawing Lessons and Identifying Possible Actions
The
last session was a round table discussion with all participants
about actions that could and should be undertaken. As we already
said above, all participants agreed that the current situation
is simply too risky. The costs of resource providers are high
and are spent not only on maintaining and improving the resources
but also on searching for funds. Continuation of the situation
is unlikely in the majority of cases, as witnessed by the
case of BIND. A number of factors that contribute to this
instability were discussed.
First, as already mentioned, most resources are set up on
project money, but the duration of a project is typically
less than that of a resource.
Second, to fund infrastructure is not commonplace. From the
funder perspective, funding resources would mean at least
a partial deviation from the current practice of project funding.
This has given rise to the practice of funding infrastructure
in disguise, namely as if it were a new project. There are
exceptions: the current surge of interest in and, hence, funding
of GRID technology may be interpreted as funding infrastructure.
This is not entirely true, however: from the funder perspective
these projects are probably regarded as seed money to get
the field going, and once it is mature, GRID funding as such
will cease.
Third, unlike practitioners in some other fields, researchers
in the life sciences expect IT/Web resources to be for free.
This is an impediment to a structure in which users of resources
pay. The US National Institutes of Health however tends to
favour user payment.
Fourth, resources grow and mature. Most resources are initially
built because of prospective interest. Either creation and
initial maintenance of the resource is not funded at all or
it is part of a funded project that uses the resource as means
of communication between the project partners. When the resource
proves sufficiently interesting to third parties, a growth
stage sets in. The technical infrastructure is consolidated
and access to the resource is improved. Data are added and
a quality assessment procedure is put in place. When third
parties continue to be interested in the resource, the third
stage, maturity, sets in. Maturity may be accompanied by the
incorporation of the resource in the service portfolio of
an institute or company, either existing or created specifically
around the resource. To repeat what we said earlier, no resource
can survive without income. (If income is not visible, this
can only mean that it is hidden by incomplete or inaccurate
accounting.) Therefore, each of the three steps we have identified
may end with continuation or not. Ideally, quality and scope
are the factors that determine survival but in reality, of
course, other factors play a role that in some circumstances
may be more important than quality and scope.
Considering
these four major issues and the current EU policies and funding
mechanisms, the model that has users pay was judged an alternative
to the current situation that merits serious consideration.
Users' costs can be covered by including a sum for use of
the resource on the budget of funding proposals. The EU and
other funding bodies may promote this. This would also make
it possible to have resources maintained by commercial publishers
(Elsevier, Wiley) or semi-commercial publishers (such as learned
societies), who have far more experience in cost-effective
web resource offering than academic groups.
However, if users are prepared to pay at all, they will normally
be prepared to pay for mature resources only. It thus turns
out that a resource is at its most vulnerable in the growth
stage. This is the stage in which funding bodies play a decisive
role. They might design guidelines that take issues such as
viability, scientific quality, scope of the resource, size
of the intended audience, and other factors into consideration.
A discussion about these matters would be helped enormously
if there were some kind of business model that also outlines
the long-term perspectives of the resource.
For
any alternative to the current situation one may envisage,
it is urgent that explicit business models are drawn up by
resource providers. It would be nice if a platform were created
for resource providers to share experiences and help each
other with business models. As we have seen, resource providers
themselves are heavy users of resources. Sustainability thus
certainly constitutes a shared interest. Business models can
serve as concrete anchors for a discussion between stakeholders
about the future of the resources.
|