Tuesday, October 11, 2011

A nice tutorial on POS tagging

Part-of-speech tagging from 97% to 100%: is it time for some linguistics?
http://dl.acm.org/citation.cfm?id=1964816

Tuesday, September 27, 2011

A pretty good introduction about HMM

"A Revealing Introduction to Hidden Markov Models" http://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf

Tuesday, April 12, 2011

Biomedical text mining

Biomedical text mining (also known as BioNLP) refers to text mining applied to texts and literature of the biomedical and molecular biology domain. It is a rather recent research field on the edge of natural language processing, bioinformatics, medical informatics and computational linguistics.
...
Wikipedia

Thursday, March 31, 2011

Microsoft Web N-gram Services

Microsoft Web N-gram services are a cloud-based platform for language modeling research in the areas of web search, natural language processing, speech, and related areas. A collaboration between Microsoft Research and Bing, the services provide access to real-world web-scale data with regular updates.

The Web N-gram services provide you access to:

Content types: Document Body, Document Title, Anchor Texts, and Query
Model types: Smoothed Backoff N-gram models with N up to 5
Locale: Web documents indexed by Bing in the EN-US market
Access: Hosted Services by Microsoft with SOAP and REST interfaces. Python development kits are also available.
Web models: N-gram models based on Web snapshot taken in June 2009 has been and will always be available. Additionally with the support of NSF, models from two snapshots taken in April 2010 and October 2010 will be hosted on Windows Azure for at least 3 years. Further updates will be updated based on community feedback.
Query models: N-gram models based on 9 months of Bing queries up to June 2009 will always be available. In addition, a monthly update to query N-gram will also be provided. The services will maintain up to 3 query Ngrams based on storage and usage patterns.

(from http://web-ngram.research.microsoft.com/)

Tuesday, March 8, 2011

ACL 2012

http://www.acl2012.org/

Welcome to ACL 2012!

For the first time, the annual meeting of the Association for Computational Linguistics (ACL) comes to Korea, a vibrant country with rich language and cultural heritage. ACL 2012 will be held in Jeju Island (濟州島) on July 8-14, 2012. In conjunction with ACL 2012, a series of research workshops and conferences including EMNLP and CoNLL will be co-located in Jeju.

Building on the success enjoyed by the past ACL conferences, ACL 2012 will continue to strive for a comprehensive conference program that covers a diverse field of computational linguistics and that recognizes the importance of both theoretical and empirical approaches to research problems.

In 2012, ACL marks its 50th year of scientific activities and community services. On behalf of the organizing committee,
I invite all of you to Jeju to join the celebration!

Haizhou Li
General Chair

Is it just a waste of time? Word Sense Disambiguation for the skeptic

http://lml.bas.bg/ranlp2011/invited.php#navigli

Summary: Word Sense Disambiguation (WSD), the task of automatically associating meaning with words in context, is a long-standing problem in the field of computational linguistics. There can be no doubt the problem is a tough one. Researchers began to study the automatic association of meanings with words as long ago as the late 1940s. And they have been struggling to put their ideas into effective practice ever since. All too frequently their results have been disappointing not only in terms of disambiguation quality, but also when their WSD has been plugged into applications such as Information Retrieval and Machine Translation. Nevertheless, this pessimistic scenario has been progressively changing over the last decade, to the point that high disambiguation performance has been reported in recent work on the topic, indicating that WSD is more than alive. In this talk I will "challenge" the skeptic and analyze how and why WSD has achieved remarkable improvements in the last few years, and what promises it holds for the near future in terms of both in vitro performance and end-to-end applications.