Data Science Hackathon Guide For Beginners

Keegan Fernandes
4 min readJan 18, 2023

I took part in my first-ever data science hackathon organized by Christ University Lavasa, and while we didn't win, it may help you when you participate in your first Hackathon. Our presentation and code can be found on github

The Hackathon had three rounds. We were supposed to work in a team of 5. Our college's Data Science Association (DSA) organized it and had various professors and students helping with the organization. Before the rounds began, we had an Alumni give us a rundown on what we might expect from a data science project. She explained the project lifecycle and gave us valuable advice on the steps we should take with the project.

The First Round

The first round followed this format. It was an online quiz; we were allowed to use the internet. The goal of the competition was to test our ability to gather information. Each member was separated and had to take a different quiz. Only 15 teams went to the next round. Our team was able to pass the game, although we came last.

The Second Round

The second round was a coding round. There were three questions, each made by the organizers, so there was no use in searching for the clues online. Each team could make multiple submissions for a problem. However, each submission would decrease the accuracy of the problem. Our team had a strong coding background and came in second in the round.

24-Hour Hackathon

The final round was a 24-hour hackathon. We were given 24 hours to devise a solution for a problem statement. We were given a life expectancy CSV dataset and were told to create a solution to predict a person's life expectancy and make a usable UI for our model. We began building an EDA for the data and finding research already done on the data. We found various Kaggle notebooks on the topic. Different users found many discrepancies in the data, such as zero populations. We fixed the issues and used these combined insights to improve our data quality. After we were finished building the EDA and cleaning the data, we began with model building. We started with a list of all the algorithms we thought would work with the data, building a baseline with a linear regression model, and got an accuracy of 88%. Apart from modelling, we feature-engineered the data to get the best result possible. We got the maximum accuracy of 95% using the XGB regression model, after which we made a streamlit UI.

Results

Our team was floored entirely after spending 24 hours without sleep and had to give our model presentation to two sets of judges. Though we gave our best, we ended up losing the final round. The winning team acquired almost the same model accuracy as us; however, they also focused heavily on presentation and making a great UI that was user-friendly and appealing. The team relied on libraries that allow you to test models quickly and efficiently to help you focus more on feature engineering and data cleaning, two essential tasks for the best results. This shows the importance of presentation and quick pipeline building for the best results.

Do's:

  • Use the internet. Google and StackOverflow is your ally.
  • Start with an EDA of the data and begin cleaning any discrepancies within the data.
  • Learn to use a design framework with your models and practical uses of the dashboards and models you make.
  • Document your code, the steps you took, and your reasoning behind each decision.
  • Improve your model using feature engineering. Even if it increases your score by a few points, the difference may be between first and last place.
  • Make sure your team has different areas of expertise. It's good if your team has someone good at data handling, statistical analysis, machine learning, presentation and deployment.

Don'ts:

  • Start your project from scratch. Use notebooks made by other users on similar data.
  • Waste hours are making the code for model building, especially for classical machine learning models that use machine learning libraries for the same.
  • Refrain from paying attention to the presentation. Make sure you put enough effort into your production as well.

Conclusion

This article is not meant to be a guide; instead, it is intended to be a short note of advice on the problems you might face in your Hackathon.

The Names of my teammates:-

Anujit Ghosh, Reena,Karan Punjabi

--

--

Keegan Fernandes

First year student in Msc Data Science. Writes about data science and machine learning tutorials and the impact it has on the world.