The proceedings contain 4 papers. The topics discussed include: some thoughts on using annotated suffix trees for naturallanguageprocessing;MIL: automatic metaphor identification by statistical learning;a peculiarit...
The proceedings contain 4 papers. The topics discussed include: some thoughts on using annotated suffix trees for naturallanguageprocessing;MIL: automatic metaphor identification by statistical learning;a peculiarity-based exploration of syntactical patterns: a computational study of stylistics;and mining comparative sentences from social medias.
Social media interactions have become increasingly important in today's world. A survey conducted in 2014 among adult Americans found that a majority of those surveyed use at least one social media site. Twitter, ...
详细信息
ISBN:
(纸本)9781467390064
Social media interactions have become increasingly important in today's world. A survey conducted in 2014 among adult Americans found that a majority of those surveyed use at least one social media site. Twitter, in particular, serves 310 million active users on a monthly basis, and thousands of tweets are published every second. The public nature of this data makes it a prime candidate for datamining. Twitter users publish 140-character long messages and have the ability to geo-tag these tweets using a variety of methods: GPS coordinates, IP geolocation and user-declared location. However, few users disclose their location, only between 1% and 3% of users provide location data, according to our empirical findings. In this article, we aim to aggregate information from different sources to provide an estimation on the location of any Twitter user. We use an hybrid approach, using techniques in the fields of naturallanguageprocessing and network theory. Tests have been conducted on two datasets, inferring the location of each individual user and then comparing it against the actual known location of users with geolocation information. The estimation error is the distance in kilometers between the estimation and the actual location. Furthermore, there is a comparison of the relative average error per country, to account for difference in country sizes. Our results improve those presented in different researches in the literature. Our research has as feature to be independent of the language used by the user, while most of works in the literature use just one language or a reduced set of languages. The article also showcases the evolution of our estimation approach and the impact that the modifications had on the results.
This book constitutes the refereed post-proceedings of the Second IFIP WG 12.7 International workshop on Computational History and data-Driven Humanities, held in Dublin, Ireland, in May 2016.;The 7 full papers presen...
详细信息
ISBN:
(数字)9783319462240
ISBN:
(纸本)9783319462233;9783319834726
This book constitutes the refereed post-proceedings of the Second IFIP WG 12.7 International workshop on Computational History and data-Driven Humanities, held in Dublin, Ireland, in May 2016.;The 7 full papers presented together with 2 invited talks and 4 lightning talks were carefully reviewed and selected from 14 submissions. The papers focus on the challenge and opportunities of data-driven humanities and cover topics at the interface between computer science, social science, humanities, and mathematics.
暂无评论