skip to Main Content
HACARUS Kaggle Challenge #1

Hello, I’m Ippei Usami, a Data Scientist at Hacarus.

In August of this year, Hacarus implemented the Internal Kaggle Challenge.

To clarify, Kaggle is a Google-owned online platform that allows for the data scientists community to challenge themselves through competition, and share insights with the rest to further increased know-how. When we talk about the Internal Kaggle Challenge, we’re talking about our an internal company-wide competition aimed to help our data scientists cultivate skills and increase creativity.

Well, we’ve now finished our first Kaggle challenge, so let’s dive right into our findings:

 

The Challenge

Something to note here is our use of work hours. During this challenge, we allocated 20% of our work hours. Some may be wondering if this allocation of time was truly worth it- as time is money, and as a startup company it’s our most precious resource. However, at Hacarus we strongly believe that investing time into this type challenge is well worth the effort. In fact, Kaggle offers an excellent opportunity to be creative, think outside of the box, be creative and obtain new skills – and have fun together!

Our Kaggle challenge program is designed to be inclusive for all Hacarus engineers – each group consisted of both data scientists and application engineers. While this allowed for multiple perceptive on problem solving, it also meant that there were certain knowledge gaps. As a team this meant that we needed to be mindful of differences in skill, background and experience – something I believe will help in our day-to-day work as well. Moreover, our data scientists are all Japanese, and our application engineers are all Filipino – with the Japanese team in the Kyoto headquarters and the Filipino team in the Manila development center. Not only do we need to overcome  the differences of our distinct work knowledge experiences, but also remote and linguistic communication – a compelling challenge !

Challenges aside, while this was the first ever attempt, we still managed to place within the top 40% of all submissions. For the next challenge we’ll work even harder!

Reflecting on our Challenges

Reflecting on our first Kaggle Challenge, I can only say one thing- we’ve still got a lot to learn!

As a concrete method for improvement, I think that learning from the techniques published by other, better performing teams would be helpful.

Kaggle-specific Methods and Tips

I checked out Titanic, a Kaggle competition that is considered the best introductory challenge tutorial for new users to Kaggle. Actually engaging with it was my first time.

For that reason, producing and submitting results of my own was also difficult.

Since this competition was a Kernel-only competition, we could only go through the competition by creating a Kaggle notebook that can run on private data. Because notebooks have limited resources and execution time, we had to upload the prepared models and features as data sets in advance.

As for tips, it appeared that many of the groups that ranked above us were doing things such as pulling image data from similar competitions in the past and referring to pre-processing of other participants.

Clever reuse is not only important not just for competing, but also in everyday work.

Understanding Machine Learning Together

We talked about how our teams consist of both data scientists and application engineers, so one learning challenge was to get everyone to the same level of understanding.

Prior knowledge of algorithms used in machine learning was probably our biggest hurdle. So, for my team, we took the steps stated below as to how we confronted the challenge:

  1. Take a look at the data and discuss relevant features, determine pre-processing
  2. Submission of each and every notebook
  3. To further improve accuracy, we add another discussion about pre-processing and feature value extraction methods
  4. Final notebook submission

This was the same approach for the other team. In my team, we decided to have notebooks prepared by data scientists as a base of understanding for application engineer members and then proceeded by copying and editing each notebook. The data set was also pre-processed by the data scientists and then uploaded to Kaggle for sharing. At each step, we had a video meeting to dicuss how best to proceed, explain algorithms, feature extraction approaches, etc. In addition, we used Slack to communicate and answer questions and summarized them using Google Docs.

 

          Our internal communication

 

The difficult part of communicating took place when having to accurately explain several techniques, in a manner that can be easily understood and in English at that! The upside, is that this offered a chance for our data scientists to reconfirm their techniques and learn from the teaching process.

After the challenge, we had a reflection meeting where we gathered feedback on whether or not our process of learning as a team brought clarity to the application engineers.

In our discussion, we discussed the techniques we chose, why they were good, and why that technique amongst others. Topics like “if we use k-means we can perform clustering” or “using xgboost allow use to classify” were some discussions we had. In this way, we better understood what we lacked in conveying and explanation during the challenge – and where to improve for the next one

For our application engineers, understanding of the techniques used by Data Scientists has advanced and now it’s a matter of practical application, and how we’ll go further together with all this new knowledge gained.

Final Remarks

This was my review of our first Kaggle Challenge at Hacarus! Although this concludes our first Kaggle challenge, we will continue to challenge ourselves through internal competitions on Kaggle.

I am excited for the next one!