Kieran Kavanaugh, David Nigenda, and I, recently wrote a post for the AWS Machine Learning Blog about A/B Testing ML models in production using Amazon SageMaker. I recommend reading the post, and also checking out our accompanying Jupyter notebook (A/B Testing with Amazon SageMaker).
The CernerWorks Enterprise System Management team is responsible for mining systems data from Cerner clients’ systems, providing visibility to the collected data for various teams within Cerner, and building monitoring solutions using the collected data. Our primary mission is to help increase the reliability and security of Cerner clients’ systems. About three years ago, our team was at a place where we had developed an effective telemetry framework for systems data collection. At the same time, we were seeing an exponential increase in use-cases where we had to transform the collected systems data in various ways to support our visibility and monitoring efforts. We thereby felt a pressing need to introduce a dedicated ETL pipeline platform to our data architecture. Link to this post on towards data science.
These are the resources I’ve used to understand Airflow and develop a deeper intuition for the data architecture frameworks associated with it.
From what I understand, the first prototype for a columnar database was introduced in a 2005 paper from MIT’s school of computer science and artificial intelligence, called C-Store: A Column Oriented Database. (Stonebraker et al). MIT CSAIL.
In this project I used a Generative Adversarial Network (GAN) architecture to generate NEW artistic images that capture the style of the Indian artists, Raja Ravi Varma [1848 - 1906] and Sattiraju Lakshmi Narayana (Bapu) [1933 - 2014]. (Link to Github Repo of Source Code). Link to this post on medium.
In this project I used a LeNet5 convolutional neural network to carry out classifcation for the CIFAR100 Dataset. The trained CNN model was then used to carry out real time object classification in a video stream. (Link to Github Repo of Source Code). I also included an implementation of a CNN model to carry out classification for the MNIST (handwritten digits) dataset. Link to this post on medium.
I trained my neural network with the MNIST dataset in this project. The MNIST dataset consists of handwritten images (60,000 images in the training set and 10,000 images in the test set). (Link to Github Repo of Source Code). Link to this post on medium.
I implemented backpropagation and stochastic gradient descent in my neural network for this project. (Link to Github Repo of Source Code). I tested my implementation using, AND, OR, NOT and XOR networks.
I implemented and tested a feed forward neural network using pytorch in this project. (Link to Github Repo of Source Code). I tested my implementation using, AND, OR, NOT and XOR networks. Link to this post on medium.
I implemented a Vanilla Perceptron model, an Average Perceptron model and a Naive Bayes model from scratch in this project. (Link to Github Repo of Source Code)
I implemented three variants of a decision tree from scratch in this project. They are (a) a binary decision tree with no pruning using the ID3 algorithm, (b) a binary decision tree with a given maximum depth, and (c) a binary decision tree with post-pruning using reduced error pruning. (Link to Github Repo of Source Code). Link to this post on medium.
I implemented the k-means and agglomerative clustering algorithms from scratch in this project. (Link to Github Repo of Source Code) The python script in the repo uses the yelp dataset.Yelp Dataset Link. I verified the correctness of the implementation using the SKLearn implementations of these algorithms.
Fact/Opinion Classification using the Naive Bayes Classifier and the Iterative Hyperlink-Induced Topic Search Algorithm
In this project, I replicated the key results from the paper, “A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles” (Pujari, Rajkumar. Desai, Swara. Ganguly, Niloy. Goyal, Pawan).
I tinkered with the Galago toolkit from the lemur project as part of Dr. Chris Clifton’s course on Web Information Search and Management course at Purdue University. It was great fun to work on developing an intuition for search engine indices.