Thursday, March 31, 2011

Microsoft Web N-gram Services

Microsoft Web N-gram services are a cloud-based platform for language modeling research in the areas of web search, natural language processing, speech, and related areas. A collaboration between Microsoft Research and Bing, the services provide access to real-world web-scale data with regular updates.

The Web N-gram services provide you access to:

Content types: Document Body, Document Title, Anchor Texts, and Query
Model types: Smoothed Backoff N-gram models with N up to 5
Locale: Web documents indexed by Bing in the EN-US market
Access: Hosted Services by Microsoft with SOAP and REST interfaces. Python development kits are also available.
Web models: N-gram models based on Web snapshot taken in June 2009 has been and will always be available. Additionally with the support of NSF, models from two snapshots taken in April 2010 and October 2010 will be hosted on Windows Azure for at least 3 years. Further updates will be updated based on community feedback.
Query models: N-gram models based on 9 months of Bing queries up to June 2009 will always be available. In addition, a monthly update to query N-gram will also be provided. The services will maintain up to 3 query Ngrams based on storage and usage patterns.

(from http://web-ngram.research.microsoft.com/)

No comments: