These are the resources I’ve used to understand Airflow and develop a deeper intuition for the data architecture frameworks associated with it.
From what I understand, the first prototype for a columnar database was introduced in a 2005 paper from MIT’s school of computer science and artificial intelligence, called C-Store: A Column Oriented Database. (Stonebraker et al). MIT CSAIL.
In this project I used a Generative Adversarial Network (GAN) architecture to generate NEW artistic images that capture the style of the Indian artists, Raja Ravi Varma [1848 - 1906] and Sattiraju Lakshmi Narayana (Bapu) [1933 - 2014]. (Link to Github Repo of Source Code). Link to this post on medium.
In this project I used a LeNet5 convolutional neural network to carry out classifcation for the CIFAR100 Dataset. The trained CNN model was then used to carry out real time object classification in a video stream. (Link to Github Repo of Source Code). I also included an implementation of a CNN model to carry out classification for the MNIST (handwritten digits) dataset. Link to this post on medium.
This project (Link to Github Repo) is a tweet auto-completer for members of Congress. I used the Twitter API & the DocNow hydrator to create a custom dataset. I then used the GenSim library to generate a custom word2vec representation and finally used a Keras LSTM model to auto-complete tweets. Link to this post on medium.
I trained my neural network with the MNIST dataset in this project. The MNIST dataset consists of handwritten images (60,000 images in the training set and 10,000 images in the test set). (Link to Github Repo of Source Code). Link to this post on medium.
I implemented backpropagation and stochastic gradient descent in my neural network for this project. (Link to Github Repo of Source Code). I tested my implementation using, AND, OR, NOT and XOR networks.
I implemented and tested a feed forward neural network using pytorch in this project. (Link to Github Repo of Source Code). I tested my implementation using, AND, OR, NOT and XOR networks. Link to this post on medium.
I implemented a Vanilla Perceptron model, an Average Perceptron model and a Naive Bayes model from scratch in this project. (Link to Github Repo of Source Code)
I implemented three variants of a decision tree from scratch in this project. They are (a) a binary decision tree with no pruning using the ID3 algorithm, (b) a binary decision tree with a given maximum depth, and (c) a binary decision tree with post-pruning using reduced error pruning. (Link to Github Repo of Source Code). Link to this post on medium.
I implemented the k-means and agglomerative clustering algorithms from scratch in this project. (Link to Github Repo of Source Code) The python script in the repo uses the yelp dataset.Yelp Dataset Link. I verified the correctness of the implementation using the SKLearn implementations of these algorithms.
Fact/Opinion Classification using the Naive Bayes Classifier and the Iterative Hyperlink-Induced Topic Search Algorithm
In this project, I replicated the key results from the paper, “A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles” (Pujari, Rajkumar. Desai, Swara. Ganguly, Niloy. Goyal, Pawan).
I tinkered with the Galago toolkit from the lemur project as part of Dr. Chris Clifton’s course on Web Information Search and Management course at Purdue University. It was great fun to work on developing an intuition for search engine indices.
Link to Todo Heroku Web App. See screenshots below