Data science: From start to scale
The need to focus on data is real today considering the amount of data that startups are collecting has grown exponentially over the past few years. However, the big question that plagues many of the founders’ include- at what stage of the journey should they start investing in a data team especially as it’s a costly affair.
Stellaris Venture Partners hosted a discussion recently led by experts who have handled data very effectively in their various roles and shared insights from their learnings.
The panel discussion was led by:
Gaurav Aggarwal – Ex-Head of data science, OLA
Akash Saxena – CTO, Hotstar
Jayesh Sidhwani – Director of Engineering, Hotstar
Deepak Singh – Growth, Flipkart; ex-Director of Growth, MX Player
We invited a curated set of 30 people (Founders, CTOs, Engineering heads, Data scientists) who had shown interest in the event, and who had sent us questions related to data science in advance. Deepak moderated the panel discussion, addressing the questions asked in advance by the audience, as well as adding more questions based on his experience.
Here’s an overview of the event.
Deepak: Why do you think startups need data science or a data science team?
Panel: It is important to set the foundation right. As long as you’ll have a product or considerable amount of data to deal with in the future, you’ll need to think about data science from day zero. If you don’t focus on getting the data quality right from early on, you’ll end up paying a significant price later for data engineering and getting the data quality right.
Deepak: How can data teams start small in the early days?
Panel: In the beginning you don’t necessarily need a data scientist, you can start with the basics like building the processes of collecting data and making sure that your data is of good quality. Also, initially, starting with Excel sheets is just fine. One should focus on tools and building complex systems only when absolutely necessary.
Deepak: What should be the initial team composition?
Panel: Before hiring any researcher or ML people in the team, make sure that your data is clean and available for queries with minimum latency. For that you’ll need to first build the data platform team with data engineers. This needs to be followed by ML engineers with whom you can initially work on a consulting basis, and then onboard them full time when you need to. And finally, you’ll need to hire data product managers for productization of the data.
Deepak: What should you look for in the first hire of your data science team?
Panel: Your first data hire should have strong first principles. If the founders are not data / ML people, it is absolutely important to get someone who has done this before. A lot of measures in data science are about basic hygiene and new people may not realize their mistakes until it is too late. Hence, someone with data background / experience should be preferred.
Deepak: Are there any trade-offs between using a Saas tool in the beginning which offers auto ML & data warehousing solutions vs hiring a good person and doing it on your own?
Panel: If the nature of the business includes analyzing the numbers and you don’t have to run heavy algorithms or optimization, then using a SaaS tool can work. However, it would be difficult if you have optimization and modeling requirements. That is because existing tools are not very mature when it comes to more complex requirements – while Google and Amazon are trying to sell their ML on their clouds, not many people can use them well. Eventually these tools will mature to the required standard, but as of now, probably there is a gap between requirement and offering out there. Net net, you’ll still need a person and we would highly recommend that.
Deepak: How do you evaluate the ROI of a data science project?
Panel: It is often difficult to estimate direct ROI. What is important is to have a data driven culture in your organization to solve the problems. Often we have seen that we set out to solve a particular problem and found a more interesting and bigger problem along the way which is an offshoot of the existing solution. You cannot always plan for it. If you are observant about the problems around you, and have the data mindset in your team, more often than not you’ll be able to find critical problems that can impact business in a meaningful way. If data is important for your business and your engineering team has to convince / justify the founder to build a data science team, then you are already in a bad shape. This data driven orientation should ideally come from the top.
Deepak: How does one build a data driven culture within an organization?
Panel: One thing which you should definitely do is to empower your team members to ask questions. Don’t just do things because your CEO said so. Push people to look for data and come back with data points to justify their point of view. Encourage them to inculcate a data driven feedback culture. Your organization should be able to think data at every level for taking decisions. This has to be driven from the top.`
The audience participated actively and asked multiple questions to the panel. Below are a few questions which were relevant for a wider group.
Audience: How does the org structure of Data Science as a function look, and who does it report to?
Panel: It has to be central and should report directly to either the CEO or CTO. This helps management to have a bird’s eye view of all the problems. Even though you have people working on different projects within the data science team, if they sit together and regularly interact with each other they end up helping each other a lot. The underlying business and data infrastructure are the same and a lot of work they do – such as cleaning data and writing scripts – is common across functions and reusable. While modeling is different, it is actually a very small part of the work. The problem in having one data person each for different teams is that they don’t know what they are doing with their career, and you’ll see a lot of churn because they don’t know who to look up to.
Audience: We realized that both mathematical skills and coding skills are important for a data scientist. Should we hire each of them separately or look for people who are all-rounders and good in both?
Panel: Don’t hire a different person for mathematics and programming. This approach will be slow but it is better to hire someone who has both the skills. One needs to be a good programmer and not a reluctant one. Not knowing a particular language or all ML functions is fine. But both are important, you should enjoy programming and you need to know math well.
Audience: How do you think about performance management of the data science team since the job is inherently probabilistic?
Panel: Great question. It is really difficult. In something like data science, the business impact should matter but that can’t be the only thing. Sometimes the problem can be very hard and at other times you can be plain lucky. One framework is a set of 7 ‘I’s to evaluate your team:
Initiative – Give talks, take up tough projects, help others
Implementation – How well do you execute
Independence – Not bothering other people in the team too much
Innovation – Doing something out of the box
Integrity – Don’t try to fool the manager. Be honest about work
Impact – How much value are you giving back to the business
Inspiration – Are your colleagues following you because you are so right
P.S. – We will be posting bite-size video series of the event on our Twitter handle. Follow us on @stellarisvp for more details.