Please ignore the above image, Medium told me my article would be more attractive if it had an HD image. So I went to google and searched for “HD Data Science Images” and this one looked pretty aesthetic.
The feedback loop is a dangerous thing. Once you’re stimulated by the never-ending improvements, it’s very hard to stop. That’s what happened to me after I completed the Google Summer of Code 2018. I finished writing a Python Package from scratch and deployed it to pip with the help of my organisation EvalAI. After GSoC was finished, I felt like the challenge was gone. Because throughout GSoC, I was pushed so hard that all the time I was on the verge of a breakdown. Till that time, I’ve been learning and doing Software Development for 2 years and the question I had in my mind was “Is that it?”.
During August my interest in Data Science peaked because unlike working on building Software, working with data has more business implications and faster feedback loop. Let’s just say it has something to do with a spiritual trip, Stoicism and a guy from Germany. We were having a conversation regarding the impact of AI and Data Science in a bar at Gokarna and something about that conversation stuck with me. I reached home and started researching everything about it. Reading about statistical experiments and improving Business Metrics immediately sparked my curiosity. I was instantly hooked and I knew this was something I had to get into. I had some of the prerequisites for understanding Data Science like I had done the Andrew NG course a year before. I knew UG level mathematics and I did Software Engineering fairly decently. From September onwards, I set a goal for myself. That was, the next internship that I was going to do would be
-
In Bangalore
-
It would be paid
-
Would be a product company
-
The field would be Data Science
-
And the most important, near a pub.
I started doing Udemy courses to get me in sync with the tools of Data Science. Simultaneously I was solving the infamous books ISLR for understanding algorithms and statistics. I took courses in Linear Algebra, Multivariate Calculus, Statistics (The courses I took is at the end of this read).
Mathematics
-
Mathematics for Machine Learning Specialization by Imperial College London: Covers Linear Algebra, Multivariate Calculus and PCA.
-
Probability and Statistics in Data Science using Python by UCSanDiego
Machine Learning
-
Python for Data Science and Machine Learning Bootcamp: This is where I actually learned the practical parts of Data Science. It’s very rigorous and makes you familiar with the tools. You can take this course if you have like basic mathematics covered. This above course would make you proficient enough for doing Kaggle.
By the time it was December I was fairly confident about using the tools but I needed a few capstone projects that I could use to show off my new skills with these tools. Simultaneously, I had to start my search for internships as THE. TIME. WAS. RUNNING. OUT.
I did three very cool Data Analytics project from end-to-end. One was to find the optimum time to Shitpost on Reddit and the other was price comparison of books between Amazon and Flipkart both with the advice of my Reddit friend Chase. The other one was the deployment of a Time Series model to predict the recession.
“The Harder I Work, the Luckier I Get” — Samuel Goldwyn
By the time it was the end of February, I had my portfolio looking decent with all the Data Science courses, a couple of projects and a good Software Developer internship. I was feeling confident about landing some gig for the summer. Even though I was fully aware of the fact that I knew absolutely nothing in Data Science, I barely even scratched the surface, I knew the best thing to do to have an exponential learning curve would be to land an internship in the field.
As luck would have it, I didn’t land just one, I landed two Data Science internships. One through my college at VisualIQ, which I am extremely grateful to my friends at Placement Cell for and the other through cold texting people on LinkedIn. So how I found the company is a funny story. I had templates of messages saved to send to people through LinkedIn and I might have messaged well over 100 people LMAO. But you need only one person to notice you and that’s what kept me going. I had an AngelList profile and I found cool start-ups through there, and then I spammed everyone from their company on LinkedIn. And one person did notice me and asked for my portfolio which I gladly handed over to because this was the first time someone ASKED for it. But then things moved quickly from there. I had a couple of rounds of phone interviews and albeit the pay at this start-up was about 5k lesser than the other one, everyone told me to go for the start-up.
Choosing between these two was fairly hard. On one hand, the VisualIQ internship paid more and was based on Big Data which I was very interested in. The other one was, at an early stage start-up called [Trell](https://trell.co/). I chose Trell mostly because I didn’t wanna cheat my check-list for the internship. And boy, did it make all the difference in the world.
Experience at Trell
So there I am at, at the Bangalore Central Railway station. I knew zero people in the city. I decided, “well, fuck the semester exam” and got on a train to Bangalore two weeks before the semester exam. Looking back, that was a reckless decision that could’ve backfired, but thankfully I ended up passing all papers albeit with shitty grade.
The people at Trell were kind enough to give me access to the Database one month before itself so I could be familiar with the schema. And boy was it overwhelming. I knew nothing about how to work with a huge Database and 10s of tables. They had millions of users. Being the naive “Data Science Intern”, I downloaded the CSV of each table and merged it using Pandas cause I was scared of using SQL-joins. And I did some stupid experimentations with that database which turned out to be huge failures but an amazing learning opportunity, so not a failure at the end, eh?
Okay, now back to the Bangalore Railway Station. I knew zero people in the city, all my friends are back home studying for semsters and I have a back-back with clothes and laptop and an address in my hand of someplace at Kormangala that I was going to crash at and nothing else. I somehow reached the address and boy, those people were the most hospitable people I stayed with. I was jumping from house to house, during the whole time and I made some amazing friends along the way. I didn’t even have to go searching for a PG. If it was not for the amazing and supportive people that I met on the way, the internship wouldn’t have been this amazing.
Work
Okay, so that was an emotional rollercoaster but we’re here for doing some Data Science. FOCUS! The number of things you learn inside a company subconsciously cannot be explained in this Medium article, nor a whole 500-page book. It’s something you have to experience. You develop a kind of respect and attitude for the tools and architecture in front of you. You never piss it off. You maintain it and be patient with it. That was my experience with doing Data Science there. It started with the Data Analyst teaching me to connect my Jupyter Notebook to SQL and another colleague helping me overcome my fear of SQL by pushing me to write a query with like 5 joins. BOOM. The first day just shattered my very shitty conformational bias.
From that day on, my mind was racing from idea to idea, from doing statistical experiments on the population to test the recommender engine to building a model to predict churn albeit I couldn’t complete all of them. And I still think the greatest thing I did at Trell was taking a presentation on Application of Data Science for the tech team. I still did manage to build a few cool stuff.
I build a content rating model, which would rate the “virality” factor of the post based on the user’s history, time, etc etc. Then I build a recommender model. Put it in production. Normalized the tables, all by myself. Which was pretty cool. I might’ve drunk like four beers in celebration of that. In the end, I made a content tagger based on the content of what was in the posts using Image recognition. If I ever get the chance, I would put all of this on Github (With the consent of the company of course).
I got the chance to experiment with Airflow, Spark, Hadoop, Turicreate, Tensorflow, lots and lots of SQLLLLLLL, Docker, Flask, Production System, Designing Tables, Splitting Jupyter workflows, Statistical Experiments, T-testing, Clustering, PCA, AWS, GCP, Cloud Vision, Modelling and Validation and that just from the top of my head. I’m just noting it down here so that I can update my LinkedIn later with this.
I’m extremely grateful for all the people that helped me along this journey and I’m still sure I know nothing in this field or the field of Software Engineering. But I’m pretty sure I know more than what I knew yesterday and that’s all that matters at the end of the day for me. And this journey is nowhere near the end, it just barely started hahaha… and my thirst for knowing and learning more is greater than ever since I now know there is a shit-tonne of stuff that I do not know.
In my opinion, if I don’t attempt to understand at least a small scratch of this vast human wisdom which is in my hands right now. I’ve failed as a human being. I am attempting and I don’t wanna let my ancestors down tbh.