Read as John Hooks talks about his career as a Data Scientist. Find him at www.thequintessentialdatascientist.blogspot.ca and on his Twitter feed in the sidebar of this interview.
What do you do for a living?
I am a Data Scientist. I am also a Council Member in the Technology and Media practice of a well-known Group of Researchers. My area of expertise as a consultant within this group and other engagements is Big Data Predictive Analytics. I am also an entrepreneur with a consistent track record of establishing ventures in high technology and information science.
How would you describe what you do?
The three major elements of my role as Data Scientist are database architecture design, business intelligence reporting, and data mining. I have worked professionally in all three of these areas, and all three are necessary elements of building the “business organism” in a company. Many companies operate in a “brainless” manner, with no integrated circulatory system providing the business information “nutrients” to support the business body, and no “digital nervous system” (a’ la Bill Gates) in which to vend products of data mining done in the business brain. All three of these elements are necessary to provide the communication and support pathways to bring decisions based on data mining analyses into action to create profitability in the company.
What does your work entail?
Designing and building large and complex data sets, thinking strategically about uses of data and how data use interacts with data design, designing and implementing statistical data quality procedures around new data sources, performing data studies and data discovery around new data sources or new uses for existing data sources, implementing any software required for accessing and handling data appropriately, implementing and handing off data checking and updating procedures, performing statistical analyses with existing data sets, visualizing and reporting data findings creatively in a variety of formats, researching and implementing software and hardware related to mobile technologies.
What’s a typical work week like?
My weekly workload has systems maintenance and programming performance on one side and pent-up demand (and weighty expectations) from clients caught eager to see their data crunched. Those clients want to know where to find their most valuable data source(s) but need help drinking out of a literal “firehose” of often real-time and therefore perishable information. A typical day of the week for me is very interruption-driven, because as the person who knows where all the data is, I get a lot of questions like, ‘John, what is the relationship between this and that?’ and ‘John, do you think this is going to be true?’ and ‘John, do you think that’s not going to be true?’ All of this falls under the ‘predictive analysis” hat I wear.
For many of my clients, it’s a question of whether I can help convert them from a [website] visitor to a lead, and from a lead to an opportunity, and from an opportunity to a revenue producing, loyal and repeat customer.
To answer these and many other questions l may need to write some Hadoop jobs to get a good data set, and then analyze it using tools like a visualization engine chosen depending on the size and complexity of the data set. A lot of times I look at that and say, ‘Wow, there is absolutely nothing here.’ And I throw it away. And other times I look at it and I say, ‘Wow this is close,’ but I look at it and the actual piece of data that I need is still back in those back end files. And then I am running another Hadoop job to generate another data set that is very similar to the last one but isn’t quite what I was looking for. At the end of the process, I share reports iteratively with the business owners, the people who can steer me in the direction to know what the value to the business is. I also know it myself, but you always need to talk to those people. And in the end, I’ve got a refined report that I can either produce repeatedly, or one off, or hand to some of our data engineers to be truly automated. That is my general work flow over a couple of days to generate a set of reports on a specific set of data.
How did you get started?
I started out as a financial data analyst at a large information technology company manufacturing and research facility. Knowing the flow of money and other resources within the organization and how the executives I worked for used it to make decisions was invaluable. At the end of the day, If a Data Scientist has a Ph.D but no business experience or domain expertise, then whatever effort they bring to the data flow is just, to use an old programming adage, “GIGO” –Garbage In, Garbage Out!
What do you like about what you do?
I really do like the feeling of power that comes with having a command of the tools necessary to mine, analyze, and apply the information stream flowing within an organization or ecosystem (i.e. Social Media). This puts me in a nearly irreplaceable position within my industry. I also like the reaction I receive- the “AHA” moment, if you will- when the data I curate is embraced by a client and sets into motion a variety of activities that generate positive results.
What do you dislike?
The recent explosion of data created by the “outer limits” of data collection technologies –especially mobile and embedded devices – is presenting new challenges for traditional Data Scientists. The number of disparate data points that need to be collected and analyzed is growing every day, and the real-time nature of most of the more valuable information means that new and often unproven ways of harnessing this power must be used. While many approaches are in play, there is not one single “killer app” that has emerged. In many ways, this has created a Data Artist role where different colors and shades, and different palettes must be used. Constant research and learning is now the differentiator between an effective Data Scientist and a mere data collector.
How do you make money/or how are you compensated?
As a subject matter expert with 20+ years of business intelligence experience, I now work on an hourly rate or per diem basis. If you are just starting out or prefer working in a W-2 position at any organization you will receive a salary commensurate with the level of responsibility you are given. With the recent shortage of data science skills available relative to current and anticipated demand, these salaries have begun to rise. For those who look for other more lucrative forms of compensation or have entrepreneurial leanings, there are abundant opportunities currently at venture-backed start-up companies that offer slightly less than market rate salaries in exchange for an equity position in the venture and/or stock options.
How much money do Data Scientists make?
Starting salaries for Ph.D in statistical or predictive analysis rage from $60 – $110K depending on school attended, and any additional skill sets possessed such as MBA or Masters in software engineering/programming. Chief Data Scientists at firms like Linkedin or Facebook can make up to $175K in total compensation. Big Data Research consultants like me charge anywhere from $125- $200/hr for complex “McKinsey Style” engagements that include strategy and implementation road maps. My advice to those considering this role is to get experience with one of the Big 3 consulting firms (IBM, Accenture, Booz-Allan Hamilton for Gov’t) then work for a client or go independent.
This will keep your compensation competitive and allow you to solve multiple data issues at the same time. Avoid getting pigeon-holed in one industry or limit your skill set to one vendor product (SAS, MATLAB, etc.). The highest paid Data Scientists have a large tool belt with new tools being added frequently (see my blog post for some of these: http://thequintessentialdatascientist.blogspot.com/2012/06/bigdataguru-s-big-data-tool-belt-part-1.html
How much money did/do you make starting out?
The pay scale today is different than when I first got into the information sciences industry. While there is more demand for Data Scientist than at any other time in history, entry-level positions tend to have lower salaries due to the limited hands-on experience of most candidates.
What education, schooling, or skills are needed to do this?
There is ongoing debate as to what skills and/or degrees are essential for success as a Data Scientist. Traditionally, Hedge Funds and Insurance companies have favored Ph.D’s from MIT, Stanford, etc in Quantitative Analysis or Statistics/Linear Algebra. Now it is also important to be proficient in programming languages like C, R, or ones designed specifically for large data sets like Julia, and open-source frameworks such as Hadoop.
What is most challenging about what you do?
Currently, the biggest challenge I, and all Data Scientists face is the rapidly developing “internet of things” (see blog post: http://thequintessentialdatascientist.blogspot.com/2012/05/caution-big-data-embedded-device.html ). The amount of data expected to be generated and transfer between these embedded devices will dwarf the amount of data that will reside in relational databases. Today’s Data Scientist must “roll their own” schemas to address these and add new skills such as embedded software operating system programming to their resume.
What is most rewarding?
Having the “sexiest job on the planet”! The financial rewards are a nice perk, too.
What advice would you offer someone considering this career?
As glamorous as the recent articles in Forbes and studies by McKinsey and others make this job seem, it requires a lot of hard work and constant experimentation to get to a meaningful result. Very often, it comes down to a “Black Art” with a mixture of algorithms and ‘hunches”. You need to be a propeller head and a consummate communicator at the same time. It’s a pity, but some of the sales guys that work for the Big Data Software companies will make 2x what you do and know very little about Data Science. But most Data Scientists are not into it for the money. It is “mind candy” like no other.
How much time off do you get/take?
Hardly any. It’s hard to when you are having this much fun and getting paid at the same time.
What is a common misconception people have about what you do?
That Data Scientists are geeks that sit in small rooms in front of High-Performance Computers all day and have no social life/can’t get a date. Some of the best Data Scientists I know (including myself) are the most fun at parties…enough said.
What are your goals/dreams for the future?
I am hoping to create a Data Science-based company that will bring together all the emerging technologies like Big Data, Cloud, and embedded computing platforms and provide seamless access to real-time data to solve both commercial and societal problems. I also hope to be the first net worth Trillionaire.
What else would you like people to know about your job/career?
The nature of the data scientist’s role—whether they should be trained as scientists, be domain experts in their organization’s work or machine learning programmers—is likely to continue for some time as the demand rises for people who can analyze unstructured data sets that grow in size and complexity.