shopify analytics ecommerce

Fun with Webcrawling with Azure

I’ve been messing around with Azure (and the free $200 of account credit for signing up) to run a web crawler on Seattle craigslist automotive listings. The crawler continuously scrubs the seattle “cars and trucks” site and logs all the listings into a JSON database with title, location, price, vehicle attributes, etc. It feeds all the data into a CosmosDB, which can then be searched with AzureSearch. However, currently i’m simply dumping the data into a JSON database and running my own Python scripts to manipulate and search the data.

For example, show me a all cars from 1995 to 1998 that are not a truck or SUV, that have a 6-cylinder engine if they are a European model, or an 8-cylinder engine if they are an American model — this kind of filtering cannot be done on Autotrader.

I’m stilling developing it as a side-hobby, but my plan is to begin to include more websites in the crawler. For example, automotive forums and classifieds, car dealership personal certified pre-owned listings, etc. Then be able to filter and search simply based on location and vehicle attributes, and not have to check 4-5 different listings.

Here is the real-time log of the crawler:


It uses Scrapy (pip install Scrapy). More to come. I may try to hook it up to a mobile iOS/Android front-end depending on time/costs/legal infringements.

Machine Learning Courses

I’ve started to take some Machine Learning online courses at Coursera and EDX.

I’ve been interested in a long time about data science and AI on a large scale. Potential career opportunities, as well as continued learning and bettering my technical abilities and viewpoint.

I’m currently in the Coursera course: Machine Learning Foundations: A Case Study Approach, taught by two UW professors and teaches Regression, Clustering, and more, based in Python, using GraphLab Create library for data analysis and filtering. Here is a plot of sentiment analysis on prediction rate of customer reviews of products from a large data set of Amazon reviews on baby products. It’s called an ROC curve, and shows the rate of true-positive vs. true-negative rates for the prediction.

line_False Positive Rate_True Positive Rate.png

Autodesk VR Game Design Project Update

Added an entry to my projects and experience page for my most recent position with Autodesk developing a VR game using their new game engine Stingray.

I served primarily as the group lead/integration engineer on this project, and it's been a lot of fun to apply many of the same methods from my PhD to games and entertainment applications. I've learned a lot about rendering and animation, and I'm still working on honing my 'digital artistic' skills, eg: Maya and 3ds Max, lighting, AI, texturing, etc.

Thesis Defense

I defended my PhD thesis this week, and walked in the graduation ceremony. 

The cliff notes for my thesis are that I developed a new framework for topology and shape optimization of strongly coupled fluid-structure interaction problems. I presented 2- and 3-D results using a framework that combines many state of the art numerical methods in applied mathematics and computational mechanics, implemented using various computer science concepts.

The most notable detail is the ability for this framework that I developed to accurately and robustly design real world, nonlinear, 3D multi-physics (fluid-structure) problems. Moreover, I optimize a bio-prosthetic aortic heart valve to reduce human blood damage (a measure of shear) while constraining the pressure drop (simulating being able to move enough blood through the valve). 

In addition, this immersed framework allows me to rapidly simulate fully two-way coupled fluid-structure interaction of any image without having to build a body fitted mesh for the contour, which can cut down months of pre-processing for an experienced engineer. Additionally, this work can be combined with multiple interface coupling conditions to simulate two-phase flow, thermal convection, or a combination of multiple conditions.