|
||||||||
EmoText for opinion mining in long texts using different lexical listsEmoText for opinion mining in long texts illustrates a domain-independent approach to opinion mining. For reliable result classified texts should contain not less than 200 words. The system evaluates features (lexical, stylometric, grammatical, deictic) using different evaluation methods and uses the SMO or NaiveBayes classifiers from the WEKA data mining toolkit for text classification. The approach was tested on the following English corpora: Pang corpus, Berardinelli movie review corpus, a corpus with spontaneous dialogues (the SAL corpus), and a corpus with product reviews. For more information about the statistical affect sensing see here. The approach uses lexical datasets on the basis of three sources of words: frequency list of the Berardinelli movie review corpus containing 15,168 words, BNC frequency list, Whissell's Dictionary of Affect Language containing 8,742 words. In our datasets, we extract 5,056 words, 91,782 words, 2,234 words resp. In the next form enter the text to classify, e.g. a movie review and press the Classify button. You can take movie reviews from here.
|