Big data analysis in humanities and economics with machine learning techniques and use of cloud computing technologies

Thumbnail Image



Θεοδωρακόπουλος, Λεωνίδας

Journal Title

Journal ISSN

Volume Title



Sentiment Analysis has been extensively investigated in recent years as a method of human emotions’ classification to specific events, products, services etc. It is considered as a very important problem, especially for organizations or companies who want to know the consumers’ view about their products and services. In combination with the evolution of social media, it has been established as an interesting domain of research. Through social media, people tend to express their opinions or feelings, such as happiness or sadness on a daily basis. Thus, the vast amount of available data has made the existing solutions inappropriate and the need for automated analysis methods is imperative. In this thesis, it was examined sentiment polarity analysis on Twitter data in a distributed environment, known as Apache Spark. More specially, in this thesis are propose three classification algorithms for tweet level sentiment analysis in Spark due to its suitability for Big Data processing against its predecessors, MapReduce and Hadoop. Also The research to study the effects of economic policy uncertainty on the return volatility of stock with data from the largest banking institutions in Greece. Volatility is constructed using intraday data, the research period extends over a period of about thirteen years, more specifically from January 5, 2001, to June 30, 2014. This period contains various phases of the market such as stock market crashes along with stock market booms (e.g. the financial crisis of 2007 and 2008 in the United States, and the European sovereign debt crisis). The estimated regressions were used to indicate the direct effects of economic policy uncertainty on the return volatility of the stock in the four large Greek banks. Volatility is constructed based on intraday data, whereas four different estimators of volatility were used.



Apache Spark, Big data, Supervised machine learning, Sentiment analysis