Here are some partially-academic partially-fun software projects and resources that I have built over the years.
Language Processing Resources
Citation Context Corpus
The citation context corpus consists of the full text of 852 papers which
cite the top 20 target papers in the citation sentiment corpus with the highest number of
objective citations. The corpus contains 1,034 paper--reference pairs and 203,803 sentences from
the ACL Anthology Network (AAN) corpus.
Citation Sentiment Corpus
The corpus consists of 8,736 citation sentences which have been manually
annotated with sentiment. These citation sentences have been extracted from the ACL Anthology
Network corpus. A science-specific sentiment lexicon consisting of 83 polar phrases which have
been manually extracted is also available.
LDA topic modelling in JavaScript
The corpus consists of 8,736 citation sentences which have been manually
annotated with sentiment. These citation sentences have been extracted from the ACL Anthology
Network corpus. A science-specific sentiment lexicon consisting of 83 polar phrases which have
been manually extracted is also available.
DependenSee
A small tool which generates a PNG of the dependency graph of a given sentence using the Stanford Parser.
Urdu Sentiment Lexicon
The Urdu Sentiment Lexicon is a list of 2,607 positive and 4,728 negative
sentiment/opinion words for Urdu. Here's a simple javascript based application which changes the
color of the sentiment words according to their polarity and calculates the background colour of
the whole text using the total polarity score of the text.
Urdu Resources and Games
Urdu Thesaurus
Urdu Thesaurus is the first online free thesaurus for Urdu. Through its
website and app, over forty thousand unique words and phrases, and over twenty thousand sets of
synonyms can be searched. It was launched in 2016 and has since reached thousands of users in
over 150 countries.
Concordance of Iqbal's Urdu Poetry
KWIC (Key Word in Context) concordance is a method used in corpus
linguistics to display a word in its surrounding context to help researchers analyze its
usage. I couldn't find an KWIC concordance tool tailored for poetry — so I built one.
Click on the link above for an example from all Urdu works of
Iqbal.
Char Harf چار حرف
Char Harf is a word guessing game with the game mechanics of the old code
breaking game Bulls and Cows. You
have to guess a four-lettered word (all puns intended) and the computer reports back how many
of your letters are present in the word and how many letters are in the correct place.
Urdle was a word guessing game where you guess a four-lettered
word and the computer shows which letters are present in the word and are in their correct
place.
It got quite popular very quickly in the online Urdu word game circles,
reaching thousands of users per month.