Saturday, August 26, 2017

Computing Similarity between Tweets in Text, Time and Space



Suggested by: Bernd Resch (bernd.resch@sbg.ac.at), and Nikolaus Augsten (nikolaus.augsten@sbg.ac.at)

Short description: Computing similarities between Tweets is of crucial importance for a number of application areas like disaster management, urban planning, or fight against crime and terrorism. However in contrast to most previous natural language processing (NLP) approaches, which focused purely on textual content, the approach addressed in this master’s thesis implicitly considers the temporal and spatial dimensions, which carry vital information. This thesis builds on existing research, which developed an interdisciplinary method for emotion classification that combines linguistic, temporal, and spatial information into a single similarity metric and subsequently applies a graph-based semi-supervised learning approach to labels all tweets with an emotion class. The main goal of this thesis is to improve the current algorithm: 1.) by increasing the efficiency through the development of a new tweet labelling algorithm, and 2.) by validating the definition of linguistic, spatial and temporal similarity parameters.

The master thesis will be carried out together with University of Salzburg's Computer Science Department.

Literature:


Resch, B., Summa, A., Zeile, P. and Strube, M. (2016) Citizen-centric Urban Planning through Extracting Emotion Information from Twitter in an Interdisciplinary Space-Time-Linguistics Algorithm. Urban Planning, 1(2), pp. 114-127.
W. Mann, N. Augsten, P. Bouros. An Empirical Evaluation of Set Similarity Join Techniques. In The Proceedings of the VLDB Endowment (PVLDB 2016)
Pak, Alexander and Patrick Paroubek (2010). “Twitter as a Corpus for Sentiment Analysis and Opinion Mining”. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Ed. by Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias. Valletta, Malta: European Language Resources Association (ELRA), pp. 1320–1326.
Abney, Steven (2008). Semisupervised Learning for Computational Linguistics. Ed. By David Madigan, Fionn Murtagh, and Padhraic Smyth. London: Chapman & Hall/CRC.

Start date: ASAP

Prerequisites/qualifications: interest in interdisciplinary/applied research, preferably experience with algorithms, text mining, similarity computation and/or Tweet analysis; programming skills (Java) are required

No comments: