HEAD

create-vectors.pl


SYNOPSIS

This program creates the instance and sense vectors for the unsupervised-disambiguate wrapper program.


DESCRIPTION

This program uses programs from SenseClusters version 0.95 to create vecctors for the unsupervised word sense disambiguation approach used by CuiTools.


USAGE

perl create-vectors.pl [OPTIONS] DIRECTORY

Required Arguments:

DIRECTORY

Directory where all the files generated by create-vectors reside.

--training FILE

File containing the training data.

The data is expected to be in plain format and the name of file is expected to have the following format:

    <target word>.trainingdata

--senses FILE

File containing the possible concepts of the target word. The format required for the sense file is:

<TAG>|<TERM>|<Semantic Type>|<CUI>

For example, one of the possible concepts for the target word adjustment is:

Adjustment <1> (Individual Adjustment)|inbe, Individual Behavior|C0376209

--instances FILE

File containing the instances. The data is expected to be in sval2 format.

--tw TARGET WORD

The word that is being disambiguated

Optional Arguments:

OPTIONS :

--stop STOPFILE

A file of Perl regexes that define the stop list of words to be excluded from the features.

STOPFILE could be specified with two modes -

AND mode - declared by including '@stop.mode=AND' on the first line of the STOPFILE. - ignores word pairs in which both words are stop words.

OR mode - declared by including '@stop.mode=OR' on the first line of the STOPFILE. - ignores word pairs in which either word is a stop word.

Both modes exclude stop words from unigram features.

Default is OR mode.

--cuidef

Use the UMLS CUI definition of the possible concepts as context [Default]

--stdef

Use the semantic type definition of the UMLS CUI of the possible concepts as context.

--pardef

Use the parent definition(s) of the UMLS CUI to represent the context of the concept.

--chddef

Use the child(ren) definition(s) of the UMLS CUI to represent the context of the concept.

--help

Displays the quick summary of program options.

--version

Displays the version information.


OUTPUT

Vectors located in the <target word>.vectors files.


PROGRAM REQUIREMENTS


AUTHOR

 Bridget T. McInnes, University of Minnesota, Twin Cities


COPYRIGHT

Copyright (c) 2007-2008,

 Bridget T. McInnes, University of Minnesota, Twin Cities
 bthomson at cs.umn.edu
 Ted Pedersen, University of Minnesota Duluth
 tpederse at d.umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.