nlm2sval2.pl
A program to convert any Word Sense Disambiguation data file in NLM format (PMID version) to a file in SENSEVAL2 format.
The National Library of Medicine (NLM) has a test collection for Word Sense Disambiguation. The data files in this collection are plain text files with certain formatting. This program converts these NLM data files (Basic Reviewed Results from the PMID version of this test collection) into the SENSEVAL2 format which is an XML format.
In the NLM data format, the context of the ambiguity is provided in 2 ways:
1. The actual sentence in the citation that contains the ambiguity 2. The entire citation containing the ambiguity (i.e. containing the above sentence)
This program provides two options by which the user can create SENSEVAL2 data files with either only the sentence as the context or the entire citation (abstract) as the context. These two modes are referred to as sentence mode and abstract mode in further documentation.
The PMID version of the NLM WSD data has the following format (note that the data has been re-formatted to suit the POD formatting and hence the offsets will not match if the text is simply copied from here and pasted in a text file):
1|9337195.ab.7|M2 The relation between birth weight and flow-mediated dilation was not affected by adjustment for childhood body build, parity, cardiovascular risk factors, social class, or ethnicity. adjustment|adjustment|78|90|81|90|by adjustment| PMID- 9337195 TI - Flow-mediated dilation in 9- to 11-year-old children: the influence of intrauterine and childhood factors. AB - BACKGROUND: Early life factors, particularly size at birth, may influence later risk of cardiovascular disease, but a mechanism for this influence has not been established. We have examined the relation between birth weight and endothelial function (a key event in atherosclerosis) in a population-based study of children, taking into account classic cardiovascular risk factors in childhood. METHODS AND RESULTS: We studied 333 British children aged 9 to 11 years in whom information on birth weight, maternal factors, and risk factors (including blood pressure, lipid fractions, preload and postload glucose levels, smoking exposure, and socioeconomic status) was available. A noninvasive ultrasound technique was used to assess the ability of the brachial artery to dilate in response to increased blood flow (induced by forearm cuff occlusion and release), an endothelium-dependent response. Birth weight showed a significant, graded, positive association with flow- mediated dilation (0.027 mm/kg; 95% CI, 0.003 to 0.051 mm/kg; P=.02). Childhood cardiovascular risk factors (blood pressure, total and LDL cholesterol, and salivary cotinine level) showed no relation with flow-mediated dilation, but HDL cholesterol level was inversely related (-0.067 mm/mmol; 95% CI, -0.021 to -0.113 mm/mmol; P=.005). The relation between birth weight and flow-mediated dilation was not affected by adjustment for childhood body build, parity, cardiovascular risk factors, social class, or ethnicity. CONCLUSIONS: Low birth weight is associated with impaired endothelial function in childhood, a key early event in atherogenesis. Growth in utero may be associated with long- term changes in vascular function that are manifest by the first decade of life and that may influence the long-term risk of cardiovascular disease. adjustment|adjustment|1521|1533|1524|1533|by adjustment| ... ...
In abstract mode (which is the default mode), the program will convert this to:
<corpus lang='en'>
<lexelt item="adjustment">
<instance id="9337195.ab.7" pmid="9337195" alias="adjustment">
<answer instance="9337195.ab.7" senseid="M2"/>
<context>
<title>Flow-mediated dilation in 9- to 11-year-old
children: the influence of intrauterine and childhood
factors. </title> BACKGROUND: Early life factors,
particularly size at birth, may influence later risk
of cardiovascular disease, but a mechanism for this
influence has not been established. We have examined
the relation between birth weight and endothelial
function (a key event in atherosclerosis) in a
population-based study of children, taking into
account classic cardiovascular risk factors in
childhood. METHODS AND RESULTS: We studied 333
British children aged 9 to 11 years in whom
information on birth weight, maternal factors, and
risk factors (including blood pressure, lipid
fractions, preload and postload glucose levels,
smoking exposure, and socioeconomic status) was
available. A noninvasive ultrasound technique was
used to assess the ability of the brachial artery
to dilate in response to increased blood flow
(induced by forearm cuff occlusion and release),
an endothelium-dependent response. Birth weight
showed a significant, graded, positive association
with flow-mediated dilation (0.027 mm/kg; 95% CI,
0.003 to 0.051 mm/kg; P=.02). Childhood
cardiovascular risk factors (blood pressure, total
and LDL cholesterol, and salivary cotinine level)
showed no relation with flow-mediated dilation, but
HDL cholesterol level was inversely related (-0.067
mm/mmol; 95% CI, -0.021 to -0.113 mm/mmol; P=.005).
The relation between birth weight and flow-mediated
dilation was not affected <local>by <head>adjustment
</head></local> for childhood body build, parity,
cardiovascular risk factors, social class, or
ethnicity. CONCLUSIONS: Low birth weight is associated
with impaired endothelial function in childhood, a key
early event in atherogenesis. Growth in utero may be
associated with long-term changes in vascular function
that are manifest by the first decade of life and that
may influence the long-term risk of cardiovascular
disease.
</context>
</instance>
...
...
</lexelt>
</corpus>
And in the sentence mode, it will be converted to:
<corpus lang='en'>
<lexelt item="adjustment">
<instance id="9337195.ab.7" pmid="9337195" alias="adjustment">
<answer instance="9337195.ab.7" senseid="M2"/>
<context>
The relation between birth weight and flow-mediated
dilation was not affected <local>by <head>adjustment
</head></local> for childhood body build, parity,
cardiovascular risk factors, social class, or ethnicity.
</context>
</instance>
...
...
</lexelt>
</corpus>
See 'nlm2sval2.pl --help' for a list of options and their usage.
Mahesh Joshi <joshi031@d.umn.edu>
Ted Pedersen <tpederse@d.umn.edu>
NLM WSD Test Collection : http://wsd.nlm.nih.gov/
Senseval-2 : http://www.sle.sharp.co.uk/senseval2/
nlm2sval2driver.pl
Copyright (C) 2005, Ted Pedersen and Mahesh Joshi
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.