NAME

x-fold.pl


SYNOPSIS

This program randomly splits a file in mm format into a training and test files.


DESCRIPTION

This program randomly splits a file in our xml-like .mm format in to X number of files so that X-fold cross validation can be performed.


USAGE

x-fold.pl [OPTIONS] DESTINATION SOURCE


INPUT

Required Arguments:

DESTINATION (DIRECTORY)

The DIRECTORY for the test and training files. The training files will be labeled <fold>.train and the test files will be labeled <fold>.test

SOURCE (FILE)

The mm formated file to be split into training and test files to perform x-fold cross validation

Optional Arguments:

--fold NUMBER

This indicates the number of folds. Default = 10

--seed NUMBER

This is used with the --cv option in order to seed the random number generator for the cross validation

Default: --seed 1

--help

Displays the summary of command line options.

--version

Displays the version information.


OUTPUT

This program output two files for each fold: i) <fold>.test and ii) <fold>.train. Each file is in .mm format and can be used by the disambiguate.pl program.


PROGRAM REQUIREMENTS


AUTHOR

Bridget McInnes, University of Minnesota, Twin Cities


COPYRIGHT

Copyright (c) 2007-2008,

 Ted Pedersen, University of Minnesota, Duluth.
 tpederse at umn.edu
 Bridget McInnes, University of Minnesota, Twin Cities
 bthomson at cs.umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.