🌙

Here are some partially-academic partially-fun software projects and resources that I have built over the years.

Language Processing Resources

Citation Context Corpus
The citation context corpus consists of the full text of 852 papers which cite the top 20 target papers in the citation sentiment corpus with the highest number of objective citations. The corpus contains 1,034 paper--reference pairs and 203,803 sentences from the ACL Anthology Network (AAN) corpus.
Citation Sentiment Corpus
The corpus consists of 8,736 citation sentences which have been manually annotated with sentiment. These citation sentences have been extracted from the ACL Anthology Network corpus. A science-specific sentiment lexicon consisting of 83 polar phrases which have been manually extracted is also available.
LDA topic modelling in JavaScript
The corpus consists of 8,736 citation sentences which have been manually annotated with sentiment. These citation sentences have been extracted from the ACL Anthology Network corpus. A science-specific sentiment lexicon consisting of 83 polar phrases which have been manually extracted is also available.
DependenSee
A small tool which generates a PNG of the dependency graph of a given sentence using the Stanford Parser.
Urdu Sentiment Lexicon
The Urdu Sentiment Lexicon is a list of 2,607 positive and 4,728 negative sentiment/opinion words for Urdu. Here's a simple javascript based application which changes the color of the sentiment words according to their polarity and calculates the background colour of the whole text using the total polarity score of the text.

Urdu Resources and Games

Urdu Thesaurus
Urdu Thesaurus is the first online free thesaurus for Urdu. Through its website and app, over forty thousand unique words and phrases, and over twenty thousand sets of synonyms can be searched. It was launched in 2016 and has since reached thousands of users in over 150 countries.
Concordance of Iqbal's Urdu Poetry
KWIC (Key Word in Context) concordance is a method used in corpus linguistics to display a word in its surrounding context to help researchers analyze its usage. I couldn't find an KWIC concordance tool tailored for poetry — so I built one. Click on the link above for an example from all Urdu works of Iqbal.
Char Harf چار حرف
Char Harf is a word guessing game with the game mechanics of the old code breaking game Bulls and Cows. You have to guess a four-lettered word (all puns intended) and the computer reports back how many of your letters are present in the word and how many letters are in the correct place.
Urdle اُردل
Urdle was a word guessing game where you guess a four-lettered word and the computer shows which letters are present in the word and are in their correct place. It got quite popular very quickly in the online Urdu word game circles, reaching thousands of users per month.