About | Software | Publications


CuiTools is a freely available package of Perl programs for unsupervised and supervised word sense disambiguation (WSD) experiments. The name CuiTools comes from the Concept Unique Identifiers (CUIs) found in the Unified Medical Language System (UMLS). This package allows the users to perform supervised or unsupervised word sense disambiguation using information extracted from the UMLS such as CUIs, semantic types and semantic relations as well as general english features such as unigrams, bigrams and part-of-speech information.

This package has also been used to perform classification for other medical tasks outside of WSD such as assigning ICD9-CM codes to medical records, determining the co-morbidities of a patient based on their medical record and identifying relations in biomedical text.

If you use CuiTools, please cite the following paper:


  • Download the current version (v0.29): SourceForge
  • Publications

  • Using PharmGKB to Train Text Mining Approaches for Identifying Potential Gene Targets for Pharmacogenomic Studies. Serguei Pakhomov, Bridget T. McInnes, Jatinder Lamba, Ying Liu, Genevieve B. Melton, Yogita Ghodke, Neha Bhise, Vishal Lamba and Angela K. Birnbaum. Journal of Biomedical Informatics. 2012 Oct; 45(5):862-9.
  • Exploiting MeSH Indexing in MEDLINE to Generate a Data set For Word Sense Disambiguation. Antonio Jimen-Yepes, Bridget T. McInnes and Alan R. Aronson. BMC Bioinformatics. 2011 Jun 2;12(1):223.
  • Using Second-order Vectors in a Knowledge-based Method for Acronym Disambiguation. Bridget T. McInnes, Ted Pedersen, Ying Liu, Serguei Pakhomov, and Genevieve B. Melton. Appears in the Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL 2011), June 23-24, 2011, pp. 145 - 153, Portland, Oregon.
  • Collocation Analysis for UMLS Knowledge-based Word Sense Disambiguation Antonio Jimen-Yepes, Bridget T. McInnes and Alan R. Aronson. BMC Bioinformatics. 2011, 12(Suppl 3):S4.
  • Supervised and Knowledge-based Methods for Disambiguating Terms in Biomedical Text using the UMLS and MetaMap. Bridget T. McInnes. Doctor of Philosophy Dissertation, Department of Computer Science, University of Minnesota, Twin Cities, September, 2009.
  • Using CuiTools to Identify Obesity and its Co-morbidities in Discharge Summaries. Bridget T. McInnes. In the Proceedings of the Second i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data, Nov 7-8, 2008, Washington, DC.
  • An Unsupervised Vector Approach to Biomedical Term Disambiguation: Integrating UMLS and Medline. Bridget T. McInnes. In Proceedings of the Assocation for Computational Linguistics Student Research Workshop (ACL-SRW) 2008. (poster: pdf)
  • Using UMLS Concept Unique Identifiers (CUIs) for Word Sense Disambiguation in the Biomedical Domain. Bridget T. McInnes, Ted Pedersen, and John Carlis. In Proceedings of the Annual Symposium of the American Medical Informatics Association (AMIA), pages 533-37, Nov. 2007, Chicago, IL. (slides: pdf ppt)
  • Using Domain Specific Information for Word Sense Disambiguation. Bridget T. McInnes, Ted Pedersen and John Carlis. Grace Hopper Conference for Women in Computing, October 2007, Orlando, Florida.
  • National Library of Medicine Research Participation Report. Bridget T. McInnes. 2008.