NAME

nlm2sval2.pl


SYNOPSIS

A program to convert any Word Sense Disambiguation data file in NLM format (PMID version) to a file in SENSEVAL2 format.


DESCRIPTION

The National Library of Medicine (NLM) has a test collection for Word Sense Disambiguation. The data files in this collection are plain text files with certain formatting. This program converts these NLM data files (Basic Reviewed Results from the PMID version of this test collection) into the SENSEVAL2 format which is an XML format.

In the NLM data format, the context of the ambiguity is provided in 2 ways:

1. The actual sentence in the citation that contains the ambiguity 2. The entire citation containing the ambiguity (i.e. containing the above sentence)

This program provides two options by which the user can create SENSEVAL2 data files with either only the sentence as the context or the entire citation (abstract) as the context. These two modes are referred to as sentence mode and abstract mode in further documentation.

The PMID version of the NLM WSD data has the following format (note that the data has been re-formatted to suit the POD formatting and hence the offsets will not match if the text is simply copied from here and pasted in a text file):

 1|9337195.ab.7|M2
 The relation between birth weight and flow-mediated dilation
 was not affected by adjustment for childhood body build, 
 parity, cardiovascular risk factors, social class, or 
 ethnicity.
 adjustment|adjustment|78|90|81|90|by adjustment|
 PMID- 9337195
 TI  - Flow-mediated dilation in 9- to 11-year-old children: 
 the influence of intrauterine and childhood factors.  
 AB  - BACKGROUND: Early life factors, particularly size at 
 birth, may influence later risk of cardiovascular disease, 
 but a mechanism for this  influence has not been established.
 We have examined the relation between birth weight and 
 endothelial function (a key event in atherosclerosis) in 
 a population-based study of children, taking into account
 classic cardiovascular risk factors in childhood. METHODS
 AND RESULTS: We studied 333 British children aged 9 to 11
 years in whom information on birth weight, maternal factors,
 and risk factors (including blood pressure, lipid fractions,
 preload and postload glucose levels, smoking exposure, and
 socioeconomic status) was available. A noninvasive 
 ultrasound technique was used to assess the ability of the
 brachial artery to dilate in response to increased blood 
 flow (induced by forearm cuff occlusion and release), an 
 endothelium-dependent response. Birth weight showed a 
 significant, graded, positive association with flow-
 mediated dilation (0.027 mm/kg; 95% CI, 0.003 to 0.051 
 mm/kg; P=.02). Childhood cardiovascular risk factors (blood
 pressure, total and LDL cholesterol, and salivary cotinine
 level) showed no relation with flow-mediated dilation, but
 HDL cholesterol level was inversely related (-0.067 mm/mmol;
 95% CI, -0.021 to -0.113 mm/mmol; P=.005). The relation 
 between birth weight and flow-mediated dilation was not 
 affected by adjustment for childhood body build, parity,
 cardiovascular risk factors, social class, or ethnicity. 
 CONCLUSIONS: Low birth weight is associated with impaired
 endothelial function in childhood, a key early event in 
 atherogenesis. Growth in utero may be associated with long-
 term changes in vascular function that are manifest by the
 first decade of life and that may influence the long-term
 risk of cardiovascular disease.
 adjustment|adjustment|1521|1533|1524|1533|by adjustment|
 ...
 ...

In abstract mode (which is the default mode), the program will convert this to:

 <corpus lang='en'>
   <lexelt item="adjustment">
     <instance id="9337195.ab.7" pmid="9337195" alias="adjustment">
       <answer instance="9337195.ab.7" senseid="M2"/>
       <context>
         <title>Flow-mediated dilation in 9- to 11-year-old 
         children: the influence of intrauterine and childhood 
         factors.  </title> BACKGROUND: Early life factors, 
         particularly size at birth, may influence later risk 
         of cardiovascular disease, but a mechanism for this
         influence has not been established. We have examined 
         the relation between birth weight and endothelial 
         function (a key event in atherosclerosis) in a 
         population-based study of children, taking into 
         account classic cardiovascular risk factors in 
         childhood. METHODS AND RESULTS: We studied 333 
         British children aged 9 to 11 years in whom 
         information on birth weight, maternal factors, and 
         risk factors (including blood pressure, lipid 
         fractions, preload and postload glucose levels, 
         smoking exposure, and socioeconomic status) was 
         available. A noninvasive ultrasound technique was 
         used to assess the ability of the brachial artery 
         to dilate in response to increased blood flow 
         (induced by forearm cuff occlusion and release), 
         an endothelium-dependent response. Birth weight 
         showed a significant, graded, positive association 
         with flow-mediated dilation (0.027 mm/kg; 95% CI, 
         0.003 to 0.051 mm/kg; P=.02). Childhood 
         cardiovascular risk factors (blood pressure, total 
         and LDL cholesterol, and salivary cotinine level) 
         showed no relation with flow-mediated dilation, but 
         HDL cholesterol level was inversely related (-0.067 
         mm/mmol; 95% CI, -0.021 to -0.113 mm/mmol; P=.005). 
         The relation between birth weight and flow-mediated 
         dilation was not affected <local>by <head>adjustment
         </head></local> for childhood body build, parity, 
         cardiovascular risk factors, social class, or 
         ethnicity. CONCLUSIONS: Low birth weight is associated 
         with impaired endothelial function in childhood, a key 
         early event in atherogenesis. Growth in utero may be 
         associated with long-term changes in vascular function 
         that are manifest by the first decade of life and that 
         may influence the long-term risk of cardiovascular 
         disease.
       </context>
     </instance>
     ...
     ...
   </lexelt>
 </corpus>

And in the sentence mode, it will be converted to:

 <corpus lang='en'>
   <lexelt item="adjustment">
     <instance id="9337195.ab.7" pmid="9337195" alias="adjustment">
       <answer instance="9337195.ab.7" senseid="M2"/>
       <context>
         The relation between birth weight and flow-mediated 
         dilation was not affected <local>by <head>adjustment
         </head></local> for childhood body build, parity, 
         cardiovascular risk factors, social class, or ethnicity.
       </context>
     </instance>
     ...
     ...
   </lexelt>
 </corpus>


USAGE

See 'nlm2sval2.pl --help' for a list of options and their usage.


AUTHOR

Mahesh Joshi <joshi031@d.umn.edu>

Ted Pedersen <tpederse@d.umn.edu>


BUGS


SEE ALSO

NLM WSD Test Collection : http://wsd.nlm.nih.gov/

Senseval-2 : http://www.sle.sharp.co.uk/senseval2/

nlm2sval2driver.pl


COPYRIGHT

Copyright (C) 2005, Ted Pedersen and Mahesh Joshi

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.