create-vectors.pl
This program creates the instance and sense vectors for the unsupervised-disambiguate wrapper program.
This program uses programs from SenseClusters version 0.95 to create vecctors for the unsupervised word sense disambiguation approach used by CuiTools.
perl create-vectors.pl [OPTIONS] DIRECTORY
Directory where all the files generated by create-vectors reside.
File containing the training data.
The data is expected to be in plain format and the name of file is expected to have the following format:
<target word>.trainingdata
File containing the possible concepts of the target word. The format required for the sense file is:
<TAG>|<TERM>|<Semantic Type>|<CUI>
For example, one of the possible concepts for the target word adjustment is:
Adjustment <1> (Individual Adjustment)|inbe, Individual Behavior|C0376209
File containing the instances. The data is expected to be in sval2 format.
The word that is being disambiguated
A file of Perl regexes that define the stop list of words to be excluded from the features.
STOPFILE could be specified with two modes -
AND mode - declared by including '@stop.mode=AND' on the first line of the STOPFILE. - ignores word pairs in which both words are stop words.
OR mode - declared by including '@stop.mode=OR' on the first line of the STOPFILE. - ignores word pairs in which either word is a stop word.
Both modes exclude stop words from unigram features.
Default is OR mode.
Use the UMLS CUI definition of the possible concepts as context [Default]
Use the semantic type definition of the UMLS CUI of the possible concepts as context.
Use the parent definition(s) of the UMLS CUI to represent the
context of the concept.
Use the child(ren) definition(s) of the UMLS CUI to represent
the context of the concept.
Displays the quick summary of program options.
Displays the version information.
Vectors located in the <target word>.vectors files.
Bridget T. McInnes, University of Minnesota, Twin Cities
Copyright (c) 2007-2008,
Bridget T. McInnes, University of Minnesota, Twin Cities bthomson at cs.umn.edu
Ted Pedersen, University of Minnesota Duluth tpederse at d.umn.edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.