Data Science and Me, a History.

Clare Dunne
5 min readMay 18, 2021

I’m a student at the Flatiron School Data Science Bootcamp. It’s a mouthful, and anyway the word ‘bootcamp’ feels weird in my mouth. No one’s asking me to climb a 20ft wall or crawl under razor wire through some freezing cold muck, or whatever they do at Barry’s.

So, inspired by Veep’s Mike McLintock, I now refer to it as ‘numbers camp’.

This blog post was supposed to be a beautiful, fun, dip-more-than-a-dive into library book data, but my internet gave up the ghost and since the dataset I wanted to work with has one hundred and seven million rows, that one’s been consigned to the drafts folder for this week.

Things that aren’t in the drafts folder are all my previous forays into data science: data analyses I did before I even knew what an EDA was.

An Early One:

This has been haunting me for the last month or so. Consider yourself my confessor. I have a recollection of deciding, aged eight or nine, that I would count the occurrences of each word in a book I was reading. So off I went, armed with… a highlighter, or just a pen? The good news is I gave up very quickly, and did not save a draft.

A Real One:

Aged 21 and in pursuit of a degree in Energy Engineering, a college classmate and I constructed a spreadsheet that can only be described as cumbersome. Its aim (ultimately successful) was to assess the feasibility of installing wave-energy converters off the west coast of Ireland. During installation the waves would have to have a maximum height and minimum period. We had 15 years of weather data in 15 minute increments, and were DELIGHTED that we had done enough work that it took over a dozen minutes for our spreadsheet of 525,600 rows to run all of its calculations. It even came equipped with macros written in VBA. NBD.

Findings here, for your perusal.

In which I even generate the dataset myself:

More college. This time I was working toward a masters in education and, horrified by the thought of writing a qualitative thesis, I felt that some data collection was in order. The result? “Gender and Mathematics: A Case Study in Rural Ireland”.

The goal was to compile a questionnaire succinct enough for 15 to 18 year olds to complete, and analyze their responses, combined with some publicly available data. My favorite finding was that accounting is the only subject available to Irish secondary school students that is not perceived as being more suitable for male or female students.

In which I only gather the data, and a dog:

A section about a Ph.D., but not mine (I don’t have one). The Dr. Dunne in question is my father, who is an Agricultural Scientist. One of his research questions involved predicting the amount of funding a farmer would receive from the European Union.

The survey was comprehensive, with 78 questions, and I was enlisted as an enumerator. I was driving around the countryside, thankful for Ireland’s gun laws and the fact that I didn’t really have to worry about a prospective respondent pointing one at me, when a small dog ran out of the woods in front of the car. So I scooped him up and after determining that no one was looking for him, kept him and named him Bones. He even has a shout out in Dr. Dunne’s acknowledgements.

Bones is the tall one.

A weather one:

California, 2018. I was hired as an intern and then the building energy analyst quit, so after ten weeks on the job I was the building energy analyst. Nightmare. Now I was generating models that took hours to run. Representations of some of San Francisco’s largest buildings that were somehow both exhausting in their detail and laughably oversimplified.

The water feature calculation was soothing by comparison. Given the dimensions of a pond-of-sorts, and the trend toward hotter summers in Sonoma County, did it make sense for the homeowners to try to use stored rainfall to keep the pond full over summer, or let it dry out like a natural creek bed?

Armed only with my wits, a CSV file full of weather data, and the specific latent heat of vaporization of water, I determined, sadly, but unsurprisingly, that their water feature was more likely to boil off with each passing year.

A crafting one:

This is a current one. As I go, I mark the completed cross stitches off a chart made from conditionally formatted, square cells in a spreadsheet. The stitches are counted, and with some linear regression analysis, the date on which I can expect to finish the bloody thing is updated. Current projected completion date is December 19th, 2022, and it disappears further over the horizon every time I fall down another data science or crafting rabbit hole.

It’s a graduation gift for the aforementioned Dr. Dunne, the cross stitch.

The rest, as they say…

Is the first four weeks of my life at numbers camp.

So far it’s feeling like the right move for me.

Stay tuned for more adventures in data science, including the 107 million rows of Seattle Library check out data that I’m concerned my computer and I are completely unprepared to deal with.

--

--

Clare Dunne

Data Scientist | Energy Engineer | STEM Educator | C2 in my English Leaving Cert, you have been warned.