Abstract:
In this study, we propose a text-analytic method that identifies groups of attributes, called thematic categories, which users perceive as important in their evaluative judgements such as hotel ratings. It extracts these categories from user reviews (of hotels) by using Word2Vec that creates a vector space representation of the words appearing in a corpus based on their co-occurrences within a fixed size window around each word. The words thus embedded in the vector space, called embeddings, are then clustered using K-means algorithm to produce categories, called thematic because they are thematically related (e.g. staff, friendly,complain) not syntactically nor semantically. The extent to which these categories in fact influence the associated judgement (e.g. hotel ratings) is tested through a regression. Three of the five categories are found significant. We discuss the implication of this result on the method in the chosen domain of hotel reviews/ratings and the evaluation of method as an artifact from the design science sense. We believe that the method satisfies several of the evaluation criteria that have been proposed, such as validity, efficiency,and utility, while its generalizability to other domains remains plausible but yet to be tested.