Ted Petrou is the author of Pandas Cookbook and founder of both Dunder Data and the Houston Data Science Meetup group. He worked as a data scientist at Schlumberger where he spent the vast majority of his time exploring data. Ted received his Master's degree in statistics from Rice University and used his analytical skills to play poker professionally and teach math before becoming a data scientist.
A typical data scientist’s workflow in Python consists of firing up a Jupyter Notebook, importing NumPy, Pandas, Matplotlib, and Scikit-Learn into the workspace and then completing a data analysis. The APIs from these libraries are well-known, mostly stable, and provide a powerful and flexible way of analyzing data. These libraries have contributed an enormous amount to the success of Python as a language of choice for doing data science as well as increasing productivity for the data scientists that use them. For those data scientists that are interested in learning how to develop their own data science tools, relying on these popular, easy-to-use libraries hides the complexities and underlying Python code. In fact, it is so easy to produce data science results in Python, that one only needs to know the very basics of the language along with knowledge of the library’s API. In this hands-on tutorial, we will build our own data analysis package from scratch. Specifically, our package will contain a DataFrame Class with a Pandas-like API. We will make heavy use of the Python data model, which contains special methods to help our DataFrame work with Python operators. By the end of the tutorial, we will have built a Python package that you can import into your workspace capable of performing the most important operations available in Pandas.