MMS • RSS
Article originally posted on Data Science Central. Visit Data Science Central
This post came out of the inspiration I got after I read Rafael Knuth’s Learning Sabbatical. I read part 1 of his sabbatical too and I felt compelled to put my experience and future plan .
Background: After graduating as Bachelors of Science (general) from Panjab Univeristy, I wanted to study further but did not have enough money and in India it is very difficult to get a job with a general bachelors degree in hand (77% or more of the Indian population is unemployed). I started doing odd jobs, first one was as a steward in some restaurant, then a loans salesman and then I got a license for becoming an insurance advisor. I had computer applications as vocational subject in graduation but they only taught us DOS and I hated it with a passion, and computers we used to use had only Windows which ran extremely slow and were very inefficient for any work except of using calculator, they used to have frequent crashes and almost no tools were available to do anything useful. Since that was my first stint with computers, I thought computers were stupid and not good for anything except creating websites and using a digital calculator. Later I had job as a call center executive and a lot others I have forgotten and lastly as a door to door salesman for water purifiers. This went on for several years. Then one day I saw a movie titled Hackers on HBO and I was instantly hooked onto the computer hardware and software they shown in the movie, things like RISC architecture, Dragon Book, PCI Bus, The Pink Shirt Book and UNIX fascinated me. I had an old Celeron 600 MHz computer with a tiny fan, 20 GB HDD and 256 MB RAM. I could not get UNIX but got free CDs of Red Hat Linux (now known as Fedora Linux) with a book. I installed it and started learning and reading as much as I could. Linux was not that great at that time, I could not play any mp3 songs and any videos and if I wanted to play a VCD, I had to learn to write commands in the shell using vcdimager, which was way more difficult for a person who had used only Windows and in Windows one could just use copy-paste for the VCD but not on Linux. I had to live without listening to my favorite songs and watching videos but there were 2 factors that kept me hooked on Linux, wealth of information on internet and an extremely helpful community with large user base. Internet and Linux were kind of bound together. Internet was extremely expensive those days (back in 2005 in India) and hence I would go once a week to a Cyber Cafe and do my research, download articles and bring them home to read. Later I got a dial-up connection at my home and I would search answers to my questions on YAHOO and Google search (yeah, Google was not that big thing in India at that time) or post them on Linux Questions. One day, while writing code in Emacs, I pressed C-h g (Ctrl-h and then g) and I came across The GNU Project, it hit me hard, I realized, Linux was just a part of bigger Free and Libre software community that included Linux, GNU, USENET, BSDs etc. I started posting questions on comp.os.linux.misc and comp.lang.c like a ritual and later GNU and BSD mailing lists comp.lang.lisp and c++ become a part of everyday routine. I installed all flavors of Linux, and even used FreeBSD and OpenBSD and NetBSD as my personal OS for several weeks, even tried my hands on Linux From Scratch and built DragonFly BSD from source. I started blogging and 3 years later one of my friends sent some of my blog-post links to a person he knew in a startup company and they called me for an interview and that is how I started my professional software development career as a Trainee Systems Software Engineer in Hyderabad (INDIA).
Software Development: Fast forward to four and half years and I got promoted to Senior Software Engineer with another company in another city. It was a contract job with their client MasterCard. While in a startup I learned most of my C language and software development skills, in MasterCard I learned how to really work in a team, how to handle tough situations and how to work with people who have very different mindsets. The amazingly diverse culture there was a great place for personal development too. I was just a guy from a village, who, out of some luck, got a software job but remained a village guy (my village did not have a municipal water supply till 2012) and could not understand city life and its complications, I did not know that I have to upgrade my skills in IT industry, that experience is not enough, learning just hard C is not enough, mastering Linux to do solve problems is not enough, that solving problems using tools that make solutions looks like magic is not enough, that being able to learn anything is not enough, that learning English and learning to have good communication skills is not enough, that once which was future, will be a distant past in the blink of an eye in Indian IT industry. After my contract was over, I had to go back to my village to handle some personal and family problems and later I tried my hands on different businesses which all failed. While going through all these problems and failures I could not feel anything as addictive as technology and how it affects our lives in a positive way, I was even getting nightmares of coding and software. I felt it was my calling, that I should never ever leave this field. I decided to reboot my professional life.
Data Science: In the beginning of 2018 I observed that Software Development was not much in demand. Yeah, you could still get a job but it was like as if it has reached its saturation point. Besides that, most of software jobs in India are service based, unlike my first job which was a product based startup. I started searching again for facts and realized 2 most important distinctions that redefined my career: First, that world has moved from creating-software to using-software and Second, software has become much more useful than it was when I started, that this shift from creating to using was adding much more value to the world. I was always technology guy from beginning, I was always interested in adding value to the world using software and technology and this shift put my interests running on steroids. I had a choice to learn C++ and master Object-Oriented Analysis and Design and OOP and Design-Patterns. Since learning all that will still not put any industrial-projects on my Resume, I would need a job for that, then why not explore more options. There has to be a balance between interests and a good career. I decided to explore more and came across several emerging fields Blockchain development, Full-Stack development, AI and Data Science. I was most intrigued by Data Science and AI. By May this year, I decided to pursue Data Science full time. But there was a huge gap when it came to skills.
Learning Sabbatical: How do you fill the skills gap ? Well, internet is booming is India and it is 100 times cheaper than what it used to be when I started studying 13 years ago (thanks to Jio for bringing India into internet age). I felt like I am starting again, just like I did back in 2005 but with one difference, I was much younger back then. Now I am in midlife-crisis. This world has a way of breaking you down, if you allow it. Either you can get scared or you can forge ahead confidently with a positive mindset and learn all what you need to. With 4 years of experience writing C code and writing toy programs in dozens of other languages, Python seems as easy as making Omelette. To find out the skills I was lacking, there was wealth of information out there: KDNuggets, Data Science Central are 2 honorable mentions that helped a lot, following certain influential data scientists on Twitter is another major way of finding out what to learn and how. I created new profiles on Linkedin and Twitter and started this new blog. My Learning Sabbatical was supposed to be 6 months which was divided in 2 parts. In first 3 months I would learn basic Math (my Math was pretty weak) and next 3 I would spend on Data Analysis:
- Learned Algebra 1 and 2 from Khan Academy
- College Level Algebra from Arizona State University through edX.
- MIT Big Picture of Calculus from YouTube.
- Calculus Made Easy by Silvanus P. Thompson. Available for free from Project Gutenberg
- Calculus 1A: Differentiation from MIT at edX.
- Limits and Integral Calculus from Caluclus-1 at Khan Academy.
This took 4 months full-time and I have still no grip on Statistics, Probability, Linear Algebra and Multi-Variable Calculus. In between May and now, both industry and Data Science wee always continuously changing when it comes to requirements and skills. back in January there was no strict definition of Data Scientist, now I see industry is coming close to define one. Data Analysis is just cleaning, wrangling, visualizing data and using it for some purpose (predictive analytics e.g.) and Data Science is bigger thing which includes Data Analysis along with Machine Learning and Neural Networks and well, Deep Learning and Deep neural Networks etc. (last 3 skills define Machine Learning Engineering/Research) As a junior Data Scientist, 80% of your work will be data analysis. Non-AI Startups, small and medium sized companies will put you mostly on Data Analysis work while large companies which are data-driven (Google, Amazon, Facebook) will want you to work like a full-fledged data scientist. So to impress them Data Analysis will not be enough I guess, they will want you to understand and master Data Analysis and Machine Learning Engineering before you appear in an interview with them. This would require another 9 months for which I don’t have enough money. Rafael Knuth raised this point very well and in his article and I advise you to take note if it. So, I had to create a Plan B (which again was pointed out by Rafael), instead of going after my heart (Data Science) I decided to be practical and go after Data Analysis. I have already learned enough Math to not to get scared by Integral symbols appearing in Statistics and Probability books. Now when I see one, I say to myself “I know it” 🙂 . Python I already know a bit and programming/coding has become a part of my DNA. So this is how the plan B sounds:
- Descriptive Statistics at Udacity
- Inferential Statistics at Udacity
- Python for Data Analysis – Wes McKinney
- Do a Project
- Do more projects
- Apply for jobs as Data Analyst
Point #3 will require lot of reading and coding through official manuals and reference documentation of NumPy, Pandas and Sci-Kit. One great skill I learned while doing programming is the ability to read and understand through official reference manuals. You just go to the reference doc of Pandas e.g and learn directly something like this and you don’t need to spend time by reading entry level books and then proceed to advance level books on understanding Pandas. You read the reference and you know how the creator of Pandas Library writes code (terse, efficient and non-readable to beginners) and for mentoring, there is this large community and post on Stack Exchange or Stack Overflow or their mailing lists. You should post when you get stuck and you should still post even if you got good solution, for further improvement. You have no idea how much you can learn and what kind of person you will become by doing that. The Open-Source community is an amazing place to hang on.
Regarding choosing a project, all of the articles I have read (may be around 1000 by now) and as advised by some of top and experienced data scientists, I must choose something which interests me and better if I choose something closer to my heart. I like doing something to take care of Mother Nature, to build some green cities, to make efficient use of electricity and water inside a city and/or to add more value to healthcare. May be I can use some open datasets on Earth, Weather and Healthcare to start with. This way I can keep up with my deadline, it may get extended 1 month extra too but somehow I have to manage.
I have one last thing to say: Data Science Central is a place geared towards a bit mature audience. Yeah we do get many articles for beginners but it is kind of much more oriented towards solid practical solutions, towards solving industrial, real life problems. As a beginner I never got interested in Data Science Central, I was scared actually by the highly technical articles of Vincent Granville, I have to admit his articles used to hurt my head. Now after little bit of Data Science experience I think I must have done something good in life to come across Data Science Central.
Regarding how much Mathematics and Statistics you need for Data Science, you should read these 3 articles: Data Science is different from Statistics, Rebel Statistics, Fake Data Science
Originally appeared on my blog.