Twitter Auto-completer for Members of Congress

This project (Link to Github Repo) is a tweet auto-completer for members of Congress. I used the Twitter API & the DocNow hydrator to create a custom dataset. I then used the GenSim library to generate a custom word2vec representation and finally used a Keras LSTM model to auto-complete tweets. Link to this post on medium.

I used subsets of the following datasets from George Washington University stored on Harvard Dataverse:

Word2Vec Embeddings

Sample output of top ten closest words to the word ‘simple’ in the generated embedded space. Simple Top Ten

Two dimensional representation of the embedded space (dataset of ~50,000 tweets). Word2Vec Embeddings Small

Two dimensional representation of the embedded space (dataset of ~1.5 million tweets) Word2Vec Embeddings Large

Written on April 15, 2018