Data scientists need to write good APIs

written by Eric J. Ma on 2018-02-13

I had a breakthrough in my work today. This was not some scientific epiphany, but just breaking through a wall in my progress. Today's breakthrough was totally enabled by writing my class definitions in a way that made sense, and by writing class methods that enabled me to express my ideas in a literate fashion.

Logical class definitions and methods, refactored functions... these should be reflexive habits, but unfortunately, this isn't always the case with data science. We get so caught up in writing the code to make that plot that we forget to refactor out so that the block of code isn't brittle. But that brittle code means that my future self will loathe my current self for not writing that code robustly.

In other words, write good APIs.

Did you enjoy this blog post? Let's discuss more!

Bayesian Inference & Testing Sets

written by Eric J. Ma on 2018-02-07

This topic recently came up again on the PyMC3 discourse. I had an opportunity to further clarify what I was thinking about when I first uttered the train/test split comment at PyData NYC.

After a little while, my thoughts for a layperson are a bit clearer, and I thought I'd re-iterate them here.

  1. Model specification uncertainty: Did we get the conditional relationships correct? Did we specify enough of the explanatory variables?
  2. Model parameter uncertainty: Given a model, can we quantify the uncertainty in the parameter values?

These are different uncertainties to deal with. We must be clear: where we are pretty sure about the model spec, Bayesian inference is about quantifying the uncertainty in the parameter values. Under this paradigm, if we use more data, we get narrower posterior distributions, and if we use less data, we get wider posterior distributions. If we split the data, we're just feeding in fewer data points to the model; if we don't, then we're just feeding in more data points.

Did you enjoy this blog post? Let's discuss more!

PyCon Program Committee Review

written by Eric J. Ma on 2018-02-06

This year, I participated in the PyCon 2018 program committee (ProgCom). Though it was my second time doing it, it nonetheless was an eye-opening experience. This was because in contrast to last year, when I was writing my thesis and thus couldn't follow through on both rounds of review, this year I was. I’d like to write a bit about my thoughts on the process, with three-fold goals:

  1. To document my experience reviewing this year's PyCon talks.
  2. To bring a little bit more transparency to the process. (Proposal authors, understandably, might feel like the process is opaque.)
  3. To indirectly encourage others to participate in the process by demystifying what goes on in the mind of a reviewer.

Before I go on, though I do want to make a few points clear on what I will not be doing in this blog post.

  1. I will not be commenting on any particular proposal.
  2. I will not be mentioning specific reviewers' names and what they commented on specific proposals.
  3. I will not be offering tips for making your proposal succeed next year - that will depend on what next year’s program committee is looking for.
  4. I will not be describing the review process in detail, as this will be dealt with by an "official" blog post from the PSF.
  5. This blog post is definitely not an invitation to review your current round or next round proposal, as with a full-time job, I currently don't have the bandwidth for that.

Ready to read on? Let's go!

Stage 1: Scoring

The process for selecting talks is a two-stage process. In the first stage, we are rating the talks qualitatively (ordinally) on six criteria (for which I will defer to the official PSF blog post for details when it comes out). At this stage, we are blinded to a speaker's name and prior experience, as we want to review only the content on its merits. I'd like to give my thoughts on a few of the criteria below.

On "code of conduct", this criteria ensures that there's no disparaging of sub-communities. I generally ding'ed talks that carried any hints of negativity, as I'd like to see PyCon be a positive force in the community. A few proposals were easily misconstrued as being negative, even though they weren't in substance; we tried our best to communicate this concern to proposal authors. That said, most talks are technical in nature, thus this criteria was not really an issue.

On "completeness", it's really as about seeing whether an author demonstrated the meticulousness in making it easy for us to review. Timings were super important to me, to gauge whether I thought the talk proposer had over- or under-proposed content. I also sought detail to evaluate the accuracy of content and connect the sub-points into the author's overall message. The additional detail made it easier for me to champion a talk in the later stages.

A few authors, while in communication with the ProgCom, flat-out refused to add in details after I had messaged them requesting details, citing other conferences' practices. Unfortunately, each conference is going to be slightly different in how they operate, and that ongoing dialogue can indirectly influence reviewers' perception and therefore the proposal's score. We're all human, and if a talk proposer comes off as uncooperative, especially when we provide an ongoing opportunity to engage in dialogue, it just makes it tougher to justify their elevated presence as a speaker at a conference that emphasizes cooperation and community.

On "coherence", this point somewhat overlaps with "completeness", in that the accuracy of content can help boost this criteria. A good talk would cover one sufficiently focused topic for 30 minutes (or 45 minutes if request in sufficient depth.

I think that what constitutes "sufficient" depends on our state of knowledge. For example, in data science, I would view it as not sufficient to speak on "how to do a data analysis", as it is now quite clear that each analysis is quite different, and the generalizable principles are too vague to be of use for a listener. On the other hand, speaking about solving a particular (data) science problem can be illuminating with solid take-aways for an audience members.

Other non-explicit criteria can affect the perception of a proposal. At this stage, we're doing an ongoing dialogue with proposal authors, up till the submission deadline. Thus, as mentioned above, an authors' cooperativeness can affect our perception of the proposal, particularly on the "code of conduct" criteria - it'll affect whether we can trust an author's ability to adhere to the code of conduct. Additionally, having good English grammar can affect how readable the proposal is. Finally, there is something qualitatively different about a proposal by an author who is deeply passionate about their topic and believes they have something important to say, in contrast to a proposal author who is merely trying to fulfill PyCon selection criteria. I want to hear from the former, not the latter.

Stage 2: Selection

At this stage, we looked at the composition of talks, and proposed "buckets" of topics that the talks covered. This means that the "topics" are defined by what the community puts together. In total, there were >60+ groups of talks.

In this stage, we are basically looking for one talk to emerge from each group. The unfortunate reality is that there will be some groups that are small (3 talks), and some groups that are large (>15 talks), and many groups in between, meaning not every talk has an "equal" chance of making it through. This is entirely dependent on what the community has submitted, though, so there's no easy way to control for this.

At this stage, we are also debating what kind of conference we want. This is where it gets super interesting, and the idiosyncracies of each ProgCom member will show through. Here's a sampling of questions that went through my mind.

Do we want one in which mature talks are rehashed from the three other conferences it's already been at, or do we want new talks to come to prominence? It's not an easy decision - for some topics we favoured new ones, and for others we favoured mature talks.

Do we want experienced speakers or do we want to encourage new ones to come up? This one is relatively easy - we hard-limited speakers to one talk; some speakers proposed 5 or 6 of them, but we only take one, to enable other speakers to be present. This gives more room for newcomers to speak.

For me, there were some experienced speakers giving new talks, and I knew they'd be able to pull it off given their track record; I favoured them on both merits: experience and topical novelty. On the other hand, there were experienced speakers taking a single talk around the conference circuit - for these talks, if they were publicly available online, I didn't favour them, and communicated that to other members of the ProgCom team.

Amongst new speakers (and relatively unknown speakers) at PyCon, I was looking for slide decks to evaluate their message, and recordings of other talks that they had done. Knowing that there's a catch-22 problem (new speaker approximately means no recordings), I also tried looking in greater detail at the proposal for hints and clues that the speaker knew in great depth what they were speaking about, and were confident about delivering it. (Generally, good amounts of non-jargony detail highlight a speaker's capacity for mastery and communication skills simultaneously.)

My advocacy for new speakers and new talks, and advocacy against talks that had been given at other conferences, particularly those which had a public recording available online, stems from my desire to see PyCon be complementary to other conferences. We're programmers, and I thInk Don't Repeat Yourself (DRY) is a good general principle to adhere to. Other ProgCom members were also free to disagree with my advocacy, as are you, the reader.

In deciding on whether to include some evergreen topics at a beginners level or not, I looked to history to help me decide. For example, we hadn't had a beginner-friendly live talk on testing for the past few years, so I advocated in favour of that, even though one other program committee member disagreed and preferred to have more advanced testing talks in the program.

One topic that I was super torn by when voting was in the Bayesian Statistics category. Two talks, one extremely topical and beginner friendly, the other deeply technical but extremely useful in a variety of domains, both by speakers whom I've learned from in the past. I couldn't bring myself to pick one, so I voted for both and communicated this guidance to others on the team, and let them cast the deciding vote.

Finally, as a machine learner, I have been frustrated by new libraries that don't respect existing community idioms, however idiosyncratic those idioms are. One particular pet peeve is libraries that reinvent similar-yet-slightly-different APIs. There are a myriad of DataFrame APIs out there, yet I've only seen Dask do its best to explicitly implement the Pandas API. Likewise, there are a myriad of GPU tensor libraries out there, but I've only seen CuPy explicitly implement the NumPy API, which is idiomatic in the Python community. I thus strongly advocated for talks that described projects that explicitly adhered to and built on top of community idioms.

Gratitude towards the team

I'm not ProgCom's fearless leader (Jason Myers, whose name will be public anyways, is our fearless leader (and spreadsheet maestro) – and yes, I know I mentioned a second name here), but I nonetheless feel a ton of gratiutde towards the team. We worked asynchronously, distributed around globe in a myriad of time zones. We gave incisive insight into topics, and educated each other on our respective areas of expertise. I learned and got excited about new topics in Python. We debated and advocated for a PyCon that we'd all be proud of presenting back to the community.

Encouragement for speakers, accepted and turned down

As part of PyCon's program committee in 2018, I'm proud to congratulate speakers accepted to this year's PyCon talk lineup!

To those who were turned down (myself included), I would like to offer up a picture of the reality we faced: some talks were decided by a single vote; at other times, we had to decide between two proposals submitted independently that paralleled each other; yet at other times, we saw such a big cluster of talks that we knew we wanted to hear, yet could only pick a handful because we didn't have an Education Summit-like dedicated track to accommodate all of them. Tough choices left and right. Don't be discouraged, you have important things to say, and there are many awesome Python-related venues (SciPy & PyData) to present at.

For those whose data science talks were turned down, ping me on Twitter @ericmjl: I'd love to organize a Data Science Summit with you at PyCon 2019!

Addressing potential lingering questions

Does everybody on the ProgCom have the qualifications to review every talk proposal submitted? Definitely not. I have a knowledge bias towards the data science talks, and could handle some of the web talks, but I was completely unqualified to review talks on security and Python internals. Thus, for the data science talks, I offered my guidance to the rest of the ProgCom on what would be useful to speak about, but deferred to the expertise of others for talks I could not intelligently comment on. More than once I found myself re-voting based on other expert opinions.

Without a fixed criteria on hand, how can talk proposers maximize their chances of getting their talk accepted? This “criteria” (if you want to call it that) develops organically over time. This is intentional, as PyCon is a community conference, not a topical conference. By not setting explicit topical criteria, we can solicit talks from the community that range from timely to evergreen, from specialized to broad, and beyond, allowing the community to speak for itself (pun, ahem, not intended).

This is not to say that PyCon couldn't become a topical conference, in which the ProgCom solitics proposals in particular pre-defined categories. If this changes, I'd love for next year's ProgCom to be explicit about this change, so that proposal authors have enough time to prepare for it.

Moreover, new topical areas can be proposed: if I am remembering history correctly, this is how the Education Summit came into being (though I'm happy to be corrected if I'm wrong). If anybody's up for it, I'd love for a topical "Data Science Summit" at PyCon to come to life as well! Let's propose it together next year through the Hatchery Program.

Back to the question, though: if you're thinking of this question, your proposal is probably not the one I would vote in favour of. I personally would like to hear from speakers who are deeply technical and can inject passion into a room through their technical talk, rather than from someone who was trying to tick checkboxes.

If you're on the ProgCom, does this mean you can't propose a talk? Of course you can propose a talk! :) Our previous fearless leader, Ned Jackson Lovely, wrote an open source app that hides our own talks from our own review, thus enabling us to remain impartial. For what it's worth, my own talk was not accepted by the ProgCom, but I have no hard feelings about it - it was placed in a category with (I think) 21 talks, making that category super competitive.

Did you enjoy this blog post? Let's discuss more!

Refactor Notebook Code

written by Eric J. Ma on 2018-01-29

Jupyter notebooks that are filled with complex analyses can get unwieldy. Refactoring repeated code out into functions placed in modules should be standard practice, but from the sampling of Jupyter notebooks I've seen, I don't think this is standard practice.

When should code be refactored? As soon as we start copying/pasting it! Making sure I have self-contained functions ensures that lingering state in my notebook doesn't cause unexpected behaviour. (Side note: learning the "functional" programming mindset can be very useful here!)

But won't this slow down my pace? Isn't it faster to just copy and paste the code, and tweak what I need? Yes, but a small speed hit is going to be traded for a massive bump in rigour. Just today, I saw the effects of "lingering state" in my notebooks causing my plots to display different things before and after refactoring. It's not a good sign for any analysis if this happens.

In short, refactor your code.

Did you enjoy this blog post? Let's discuss more!

PyMC3 docs + Weibull patches merged!

written by Eric J. Ma on 2018-01-18

I recently had a few PRs merged into the PyMC3 codebase. Really happy about it, and just like my previous bug fix, I thought I'd share a bit about how those PRs came about.

The first PR was an update to the docs on when to specify precision and when to specify standard deviation. They're related, so only one has to be specified, but I sometimes am sloppy when reading the docs and didn't pick up on that. Thus, I added a few lines to make sure this was crystal clear to sloppy readers like me.

The next PR was an update to the Mixture model docs, in which I added an example of the new API for specifying components of mixture models. It previously wasn't clear how to do this, as there were no examples provided, so I put in a documentation PR specifying examples.

The final PR was a patch to the Weibull distribution. I wanted to play around with trying mixture Weibulls at work, but mixture Weibulls wouldn't work because it didn't have a mode specified. I checked on Wikipedia, and found that Weibull's mode is conditional on the value of its parameters, and thus put in a PR to make this happen. Trying it out on some simulated/toy data, it worked! Thus, the devs allowed it to be merged.

A few lessons I've learned along the way:

(1) Docs are an awesome place to start. In fact, I made a few formatting mistakes in my first and second PRs that gave an opportunity for another guy to fix! Nothing is too small to be made as a contribution. FWIW, my first contribution to open source software were documentation fixes for matplotlib, and that was a superb learning journey!

(2) Friendly maintainers are crucial. The PyMC dev team can basically be described as, "generally super nice!" From the online and in-person interactions I've had with them, there's little in the way of egos, they're always learning, always being generally helpful. If they weren't that way, I very likely would have second thoughts trying putting in a PR there.

(3) Open source lets me fix bugs I find. This lets me work at the pace that I need to, without having to wait for commercial vendors to provide update patches. If the patch that I find turns out to be useful for others, then the work I did can possibly save a ton of people's time as well. Win-win scenario!

Did you enjoy this blog post? Let's discuss more!