One component of my day-to-day is retrieving reports and turning them into something useful. Simple, right? Ha! I bet you laughed, too. Data comes in all shapes and sizes…in fact, it usually comes in every form besides the one we need. As data scientists, our job isn’t only to interpret the data and provide someone with the results to make an informed decisions – although that’s great. Visualizations? Those are great, too. But, if you don’t start with the right data, you might as well not analyze or visualize it. …so what?
Well, canned reports suck (unless they’re super customized by your IT team, but who has time for that?!?). When I ask for a data pull, I hate being told “no” or “yes, I’ll get that to you in a few weeks.” Usually when I need a report, it’s because someone important needs to make a decision. So instead of waiting around for someone to do it for me, I put on my data manipulation cap and get ready to finesse some Excel files.
Enough with the back story, right? You need tools! Admittedly, my most common problem is having all the data I need…spread across a bunch of out-of-the-box reports. Easy you say? Sure, if I was only dealing with a few rows of data. We’re talking thousands of rows. So, here are a few tools I’ve come across that have turned my data nightmare into something usable:
- Merge Table Wizard for Microsoft Excel (the app with the world’s longest name) – Mac
- If you’re a PC user, use ActiveData
So let’s talk about these tools…
Consolidate is great when you have several entries for one company (or participant) that you need summed, averaged, or a number of other functions. It’s as easy as five steps:
- Select your data.
- Go to the Data tab.
- Choose the Consolidate function.
- Select your data (be sure to note where the label that you’re consolidating is at).
- Choose your function.
Done! Now you’ve got your data in one place. In my case, I needed each company’s entry to be consolidated into one entry, and I wanted the sum of all of their orders.
Next, I wanted to tie e-mail addresses to the company. That’s where my fancy application came into play. Here’s a quick walk-through on merging two reports to make one useful report.
- Open both reports.
- Open Merge Tables Wizard (Mac)
- ActiveData (PC)
- Select the data you want to merge on (pick the dataset with the majority of the data). This is your primary dataset.
- From the other worksheet (or, in my case, a workbook), select the data you want to merge.
- Your tool will likely verify that it identified the variable you want to merge on (i.e. company).
- The default is to add a new column for any data that does not exist in the primary.
- After that, the remaining steps are fluff. Select what you want and run it!
Boom. In several minutes, you just created a data file with all the data you needed. Of course, this is an oversimplification. But, you can apply it to anything – especially those nasty datasets.
So, what about you? What are your favorite tools for forcing Excel to comply?