NAME

README.format.arff.pod


SYNPOSIS

This file briefly describes the ARFF format used by the WEKA datamining package.


THE .arff FORMAT

The .arff format is required by the WEKA data mining package to perform supervised learning. An example of the .arff format when running mm2arff.pl on the above .mm format example with the cui option (mm2arff.pl --cui ex.arff ex.mm) is as follows:

@RELATION art

@ATTRIBUTE Name NUMERIC

@ATTRIBUTE Magazine NUMERIC

@ATTRIBUTE Top NUMERIC

@ATTRIBUTE Collectors NUMERIC

@ATTRIBUTE Sense {art}

@DATA

1,1,1,1,art

The @RELATION tag indicates the attribute of the data set. In this case we are trying to identify the sense of the word art.

The @ATTRIBUTE <cui> NUMERIC indicates that there is an attribute which in this case is a specific cui and it takes a NUMERIC value. In our case 1 or 0 depending on if that cui exists in the same context as the target word. You can have multiple attributes. The attributes in the arff file are determined by the user defined options for this program.

The @ATTRIBUTE Sense tag is always there. This indicates the possible senses for all the instances in the data set. Since we were looking at only one instance, we have only one possible sense (``art''). Normally there would be multiple senses, like {art,art_gallery}. The senses are delimited by a comma.

The @DATA indicates that all the instances with their corresponding features and sense are below.

So ``1,1,1,1,art'' indicates for this instance the cui's ``Name'', ``Magazine'', ``Top'', and ``Collectors'' all exist surrounding the target word and the sense of this instance is ``art''.

For more information about this format see the WEKA documentation at:

http://www.cs.waikato.ac.nz/ml/weka/


AUTHOR

Bridget McInnes, University of Minnesota, Twin Cities


COPYRIGHT

Copyright (c) 2007-2011,

 Ted Pedersen, University of Minnesota, Duluth.
 tpederse at umn.edu
 Bridget McInnes, University of Minnesota, Twin Cities
 bthomson at cs.umn.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.