README.format.arff.pod
This file briefly describes the ARFF format used by the WEKA datamining package.
The .arff format is required by the WEKA data mining package to perform supervised learning. An example of the .arff format when running mm2arff.pl on the above .mm format example with the cui option (mm2arff.pl --cui ex.arff ex.mm) is as follows:
@RELATION art
@ATTRIBUTE Name NUMERIC
@ATTRIBUTE Magazine NUMERIC
@ATTRIBUTE Top NUMERIC
@ATTRIBUTE Collectors NUMERIC
@ATTRIBUTE Sense {art}
@DATA
1,1,1,1,art
The @RELATION tag indicates the attribute of the data set. In this case we are trying to identify the sense of the word art.
The @ATTRIBUTE <cui> NUMERIC indicates that there is an attribute which in this case is a specific cui and it takes a NUMERIC value. In our case 1 or 0 depending on if that cui exists in the same context as the target word. You can have multiple attributes. The attributes in the arff file are determined by the user defined options for this program.
The @ATTRIBUTE Sense tag is always there. This indicates the possible senses for all the instances in the data set. Since we were looking at only one instance, we have only one possible sense (``art''). Normally there would be multiple senses, like {art,art_gallery}. The senses are delimited by a comma.
The @DATA indicates that all the instances with their corresponding features and sense are below.
So ``1,1,1,1,art'' indicates for this instance the cui's ``Name'', ``Magazine'', ``Top'', and ``Collectors'' all exist surrounding the target word and the sense of this instance is ``art''.
For more information about this format see the WEKA documentation at:
http://www.cs.waikato.ac.nz/ml/weka/
Bridget McInnes, University of Minnesota, Twin Cities
Copyright (c) 2007-2011,
Ted Pedersen, University of Minnesota, Duluth. tpederse at umn.edu
Bridget McInnes, University of Minnesota, Twin Cities bthomson at cs.umn.edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.