Insight Week 3

written by Eric J. Ma on 2017-06-17

This week was a week of polishing our final products and getting them in shape for our demos. Pushing a product to final production really involves a lot of nitty-gritty tweaking. In this blog post, I'll detail some of what I had to work on.

My final product a hybrid web dashboard + blog post. Behind the dashboard is a fairly complex set of computations, which currently are run through a Jupyter notebook. The front-end, therefore, only renders the predicted flu sequences returned from the Jupyter notebooks. As part of my forecasts, I want to show the uncertainty surrounding the predictions, and how they're associated with individual forecasted sequences. This requires computing a convex hull surrounding a point cloud, and plotting it. I spent about 3-4 hours on Tuesday figuring out the code to make this part of the visualization, which I consider integral to communicating the project.

Another important thing is the user experience (UX) when interacting with my hybrid blog post + dashboard. Unlike this blog or one written in Medium (the blog of choice for Insight), I have interactive elements in the post, which meant I had to hand-craft the HTML for the page. In plotting the figures on the page, there are a set of functions in the backend that are run before the page is rendered. These compute the necessary JS for interactive web plots. They have to run fast enough, otherwise Heroku will timeout. Introducing code to plot the bounding boxes above slowed the loading time of the page beyond the 30 second limit Heroku imposed. As such, I had to carefully profile my code (mostly manually, with timing statements printed to console) to isolate the slow part, rewrite the implementation for speed, and re-deploy to Heroku. This took another good 3-4 hours, all to shave off dozens of seconds. The things we do with our lives!

Throughout the week, a lot of other Fellows were getting their web demos set up. A lot of questions regarding Bokeh and Flask were flying around. Because of the discussion, I think I have a much better grasp over the programming model involved in making Bokeh work with Flask. Basically there's a bunch of plotting computation that is needed to get the JavaScript computer by Bokeh, and then through Jinja2 templating and HTML divs, we can put the final plot in the HTML canvas. A few more rounds of practice and I should be able to commit it to memory.

The final part is in getting the presentation overall looking polished and understandable. This involves many tasks, from tweaking the text to making static figures and more. I have spent time with column layouts and configuring modals to get my page content looking overall fresh and yet also informative. Requires a lot of thought!

"Walk the Stage" 2017 Edition

written by Eric J. Ma on 2017-06-10

Finally put on a funny hat and an oversized robe, and topped it off with a yellow hood that only six of us received last Friday. Just six in the entire Institute this year - only six of us weirdos chose the SciDoc (ScD) degree! (I chose it because I like yellow over blue, and get to have a bit of fun confusing recruiters out there.) The day was too hot, and so I couldn't be bothered to dress up - who's going to see what I'm wearing underneath the robes anyways?!

Anyways, overall a good feeling to be done. A little bittersweet because I'm leaving a time where I had a ton of fun learning new things, especially in my final two years of grad school, though I also think it's nice to have a change of environment and to have a new set of problems to solve.

My hope is to continue being deeply engaged with the hard sciences, even outside of the academic ivory tower, just because it's a fun thing to do. Here's to hoping I can find a good match with a company out there.

Insight Week 2

written by Eric J. Ma on 2017-06-10

This week has been intense, mostly because I knew in advance that I'd be spending two days wearing a fancy hat, funky robe, and yellow sash. Because I was missing two days of Insight, I had to get my Minimum Viable Product (MVP) out by Wednesday - thankfully, I did!

If you've followed my blog post series on Insight (this is the 2nd post), my project is forecasting influenza sequences. This week, I hacked out my MVP and deployed it to Heroku as a hybrid HTML report + dashboard. I also picked up and incorporated a few new things along the way.

The first is the use of tooltips on my Bokeh plots. Bokeh is really powerful, and in some of the exploratory analyses, I desired having tooltips as a UI element to help a reader (who might need some introduction) understand the nature of the problem and the data involved.

The second is further mastery of Bootstrap CSS & JS. Now, Insight's Program Directors have told us clearly that we're not gunning to become front-end designers (and the likes). Keeping that in mind, I still think it's important to know at least one front-end framework well enough to produce pleasant-looking interactive tools or reports - knowing front-end elements potentiates us to communicate with front-end designers on final data products.

For the MVP, I tried further experiments with the Grid layout and Modal JS. The key idea behind Grid layouts was easier to grasp - prioritize rows, then columns.

With the Modal, stepping back for a moment, my goal was to display the science behind the project. However, it gets really technical. My audience is probably going to fall into one of two personas: the "business person" who just wants to see the final result and doesn't really care about the techniques, and the "technical person" who wants to dig deeper. I chose to use the Modal effect to satisfy both. The scientific methods are described at a high level on the main page, and the Modal element is used to show further information, graphics, and the likes.

The third was deployment to Heroku itself! David Baumgold first showed me how to use Heroku at PyCon 2016, but I could never wrap my head around it at first. I think I didn't understand how "deployment" worked. A year later, stuff that DB taught me came to fruition, as I hacked on deploying a minimal Flask app to Heroku with my younger brother. That gave me enough of the Heroku-specific concepts to hack together the necessary requirements.txt and Procfile files to deploy Flu Forecaster to the web.

For next week, these are my plans:

Still not sure which of the above two approaches are the better one, so I'll be sure to give each a shot.

Insight Week 1

written by Eric J. Ma on 2017-06-02

Insight's Week 1 is done! Here's some of my thoughts so far.

Firstly, the Fellows at Insight is very fast at learning things. Everybody is either a PhD or MD, some have done post-doctoral work, and even fewer have become professors, but everybody is interested in doing data stuff, and are very fast at picking up new things. I think at the same time, we're also good at thinking strategically upon being given feedback; once an idea sounds infeasible, new ideas come out of the pivot or even switch.

Secondly, I see now the importance of developing a great data product. I think of a data product in terms of the input data, the transformation applied to the data, and the insight returned from the data. Think of it as a Python function:

def data_product(data):
    insight = transformation(data)
    return insight

Most of the "data products" being developed are consumer-facing type projects that a user can interact with, but a small number of them, mine included, are "dashboard-style" products that can continually ingest continually updated data and return continually updated insights. Both are good ideas.

Thirdly, I've become clear on the importance of first clearly defining the problem we want to solve, and then working backwards to define what we build, particularly for the minimum viable product (MVP). This way of thinking keeps us agile, and prevents us from being stuck in a rut.

Fourthly, other fellows know lots of good stuff that I've been able to learn about. For example, in deep learning, there's been a few steps I wasn't sure about w.r.t. convolutional neural networks in autoencoders. One other fellow, a post-doc from UC Berkeley, gave me the master-class run-through on what happens at the vector/matrix level with convolutional neural networks.

Thus far, really nice. I've noticed we don't generally end up competing with one another, and the atmosphere is very collaborative. We're working with one another, talking with one another, building trust and the likes. I'm looking forward to the coming weeks!

PyCon 2017 Highlights

written by Eric J. Ma on 2017-05-22

Last post was about thoughts on past PyCons, having attended PyCon 2017. This post is on PyCon 2017's highlights for me.

(1) Serving as part of the organizing committee. I had the privilege of serving on the FinAid committee this year, and spent a large fraction of time in the staff room preparing to disburse FinAid cheques. I have very vivid memories of how slow the line was when I was receiving my cheques back in the day, and so I wanted to make sure FinAid recipients could receive their reimbursements as fast as possible, without wasting time in line (when they could instead be listening on talks).

(2) Teaching two tutorials. This year, I submitted two tutorial proposals, and both were accepted. In the three years that I've been teaching it, Network Analysis Made Simple has always been popular, and I think it's because it gives participants a different way of thinking about data, thus making it an intellectually stimulating topic. I also developed a new material on Best Testing Practices for Data Science. This one, in retrospect, was much fresher, and thus in need of more battle-testing and polish compared to Network Analysis. I have some ideas, including modifications to the workshop format, narrowing the target audience and more, to make it more useful for future iterations.

(3) First talk at PyCon! I also gave a talk at PyCon on doing Bayesian Statistical Analysis with PyMC3! This was my first PyCon talk ever. It was so nice to have a tweet-commendation by PyMC3's creator Chris Fonnesbeck too:

It was also nice to have Thomas Wiecki's tweet-commendation too:

Beyond that, the attendees seemed to like the talk too on the Twitterverse!

It's very heartening to see how many people want to move into Bayes-land! The talk also happened to be the last in the session and last of the day, so I think many people were tired by that point and wanted to go to the final keynote. Thus, the only question came from my friend Hugo, with whom I also worked on a course at DataCamp, who asked about "how we might communicate these ideas to, say, a manager." My thoughts on that were to report not a single number (e.g. the mean), but also the range, and communicate how the lower and upper bound of the range would affect bottomline decisions, or open up new opportunities (though I probably could have expressed this sentiment better).

(4) Feeding Guido van Rossum. Python's BDFL, Guido van Rossum, wandered into the staff office asking to see whether the speaker ready room was open, because he was hungry and was looking for some snacks. We initially suggested the main conference hall, but later I ran out and called him back, because we had some English biscuits in the staff room, and we engaged in a short chat. That's when I had my star-awed moment! Was tempted to get a photo, but I figured he'd probably be fed up with people asking for photo ops, so I decided against it, hoping to be considerate for him. When he finished the biscuit, he said goodbye, and left the staff office. Amazing how everybody else just went about their own business while he was in the room; speaks to the lack of ego that PyCon celebrities have, and that sets a great example for the rest of the community!

Once I'm back in Boston, I'm definitely going to catch up on the rest of PyCon. I heard that there were a lot of good talks that I missed while staffing the conference as FinAid co-chair, will have to make sure that YouTube playlist is set up!