.
To execute the proposed project, First, we gather raw data set of depressive and anxious tweets. Articulating the problem, defining data required, collecting and combining data from different sources are the 3 main steps under data collecting. Also, we are planning to discuss with medical officers and other experts in psychiatry and obtain local data-sets if possible. Then formatting, cleaning, and sampling will be done in data set preprocessing. We will research and use suitable Natural Language Processing (NLP) technique (ex: Neural Networks/Deep Learning) to determine the structural role of words in the sentence and categorize them accordingly using the preprocessed data. We will study and use knowledge of the researches conducted on NLP classifying (ex: From Sentiment Analysis to Emotion Recognition NLP [7]) to improve our system.
Then the depression measuring module will be developed with the help of machine learning. We plan to research and work with the most suitable classifier (ex: SVM classifier, Naïve Bayes classifier) for our project.
We are planning to collect the test data set from tweets of individuals by using Twitter API or an alternative method such as a portable framework for testing web applications (ex: Selenium) and use the collected new test data on our module, test our module and use the test results for the betterment of the people who suffer from this major health problem of depression.
Figure shows the interaction among those modules.
.