Venturing into Coding and Data Science for Normal People


Assume you have already learned basic coding skills, e.g., taken some course on codecademy for example (I recommend this one https://www.codecademy.com/learn/learn-python), you have a basic understanding of how to analyze data, e.g., by being an excel geek or introductory stats courses in college, and you have landed a project or internship with a focus on coding and data science. If you have not, don’t worry about it, the steps to get there are very well explained by CS Dojo (https://www.youtube.com/watch?v=mElVGah7Epg) and this blog is relevant for you anyway. So keep on reading!

 

The myth and the problem with the self-made software genius

The problem with many of the videos and stories you read online about how to get into code is that they are usually written by or about highly gifted software geniuses that have started coding early in their life and through playful exploration of code became masters of their craft. And then the jobs with facebook, google, microsoft and so on simply happened. Well, you might (a) not be a genius and (b) have already started a career (or life) and still want to get into coding and data science.

That’s pretty much what I found myself in. M.A. in Psychology, 3.5 year of change management consulting (people, not numbers driven), pretty much all of my life spent in the West, I find myself at a data scientist position for six month at a fintech and social enterprise in India. Wow. 4 weeks into the job I find imitating software geniuses somewhat frustrating and illusionary to say the least. However, I found 4 different learning paths that are extremely effective when combined.

Learning path 1: Logical understanding

Last time I had to appreciate this perspective was when I had added a very simple line of code to categorize user data as well as a test checking that code. Everything looked clear-cut simple and obvious. But the tests failed over and over again. So down the rabbit hole I went. I read through every function that was called in the testing file and every-sub function and looked up all the stuff I did not know on stack overflow. Guess what. I fixed the bug and actually found a mistake in the code that was unknowingly affecting other tests as well. My first mini-contribution to the company. Check.

So remember, code is nothing but logic. Every single operation and command can be understood. This means that seeking this understanding is sometimes very necessary and that more generally building your knowledge of your programming language and IT-infrastructure is useful. Try the following:

  • Make an effort to understand the logic in front of you - I know, stackoverflow is tempting as procrastination, but sometimes only your grey cells will do
  • Write a learning journal where you keep your most important commands and insights
  • Ask for a top-down overview of the IT infrastructure from a colleague to better navigate it
  • Go as far as teaching yourself the most important commands on vocabulary cards (don’t go overboard with it) - for example while being stuck in your daily commute

Learning path 2: Trial-and-error

Typically I have to remind myself of this when starting a new data analysis. Oh, pristine interface. So clean. Not a single error message has defiled you yet. I will look up everything and write a perfect code at first attempt and never feel the pain of an error message again. Yeah, right. It’s like writing an essay by doing research, or thinking about it, or [you fill the gap]. It doesn’t move you forward.

You learn coding and data science by doing. Just like you learn any other language or writing. About every programmer will tell you this. So make sure, you keep at it. Especially when other management tasks, data analysis presentation tasks, or statistical considerations keep you away from coding. Specifically try the following:

  • Code at least 30 min everyday. Set a time for it. And if you do not have something at hand in your current project to code (think twice really), get a problem from https://projecteuler.net/ or kaggle.com
  • If a given piece of code does not work and you have could not find the logic flaw, erase all code and start from scratch
  • Make sure you have an environment set up where you can fail without causing too much trouble with trial-and-error, e.g., what happens if I run out of memory, and do I know who to talk to reset things?

Learning path 3: Focus on less, but focus

My most blissful moments are those when I know what I want, have found where to make the change, made the change, run the test and am done. That usually means one of two things happened: Either I just copy-pasted a line of code with minimal modifications or I had the perseverance and will power to continuously only do what was necessary and not one keyboard stroke more. Or both. :)

The point is that you can never understand the whole code of your server/infrastructure/project. And you don’t need to. As they put it in the agile manifesto “The art of maximizing the amount of work not done is essential.” Yet, that requires that you know exactly what you want, that you shameless purse that, and that you will not be distracted. Try the following:

  • Before you start, clarify your goal. It is easy getting lost “exploring the data.”  Instead go to your internal clients and draw up together the empty tables and graphs incl. variable labels that you will produce for them to make a decision. It’s amazing how much that forces you to think. And then you will know your goal.
  • Write down your steps in very simple terms, e.g., “Ask Tom which data set to access”, “Merge data sets”, “Copy code from the last analysis”, “Run analysis and check for errors”, “Fill in powerpoint slide for Joe”. This keeps you on track like a railway.
  • Finally make as few changes as possible and copy with pride.

Learning path 4: Bang your head against the wall, umm, screen - but not for too long

Yeah, and then there are those hours, where you go into the logic and can’t get it, where re-writing the whole code in a different way still throws the same error, and the goal is clear but you just won’t get there. The last time this happened to me was trying to set up PyCharm on Windows so I could read our server code. First week at the new job. Great feeling. (The non-digital analogy to this is you coming into the office and not being able to open your own backpack. Literally). Felt pretty shitty.

It’s okay. And it’s okay that it sucks. No one can avoid it. Just don’t stay with it for too long. If after thirty minutes you have not made any progress with any of the three perspectives above, try the following;

  • Do something completely different and return later
  • Ask somebody and don’t feel bad about it
  • Take a break or even better meditate for 5 min ;)

Conclusion

Learning by doing, i.e., trial-and-error is only one way of learning and not always most effective. Appreciate that sometimes it does make sense to go through all the logic of something and that sometimes doing the least possible to get the job done without worrying too much about how and why it works are equally valid ways of solving a problem. Moreover, combining those three paths will provide you with more learning and allow you to make contributions in your new position quickly. This matters particularly if you are not a genius yet.

Acknowledgements

I am extremely grateful to Igor, Noam, Jan-Matthis, all of Shubh Loans and the IDEX Team for their encouragement and support allowing me to venture into data science. It’s been a great ride so far!