• Emily Singerhouse

Data Cleaning 101

The quality of your data – regardless of type of data – is incredibly important. Are you working with data to measure the implementation of your programming? Are you trying to optimize community engagement, expand your reach, or show funders & decision-makers what you are up to? Are you just starting to conduct research? If any of the above sound like your situation, data cleaning is an item you add to your to-do list.


Here is a quick foundation (with pictures!) to get you started on your data cleaning journey.


What is data cleaning?

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. This helps promote the accuracy and quality of your data.



Why is data cleaning important?

  1. It removes errors or inconsistencies

  2. It will help you and your team be more efficient long-term

  3. It allows you and your team to think more critically about the data and what it means by viewing a clean and organized dataset.

What are the different types of data cleaning operations?

Reviewing

Read through the data to examine what is currently in the data set and to ensure the data set is complete.


Cleaning

Ensure all data values are uniform. For example, all “female” gender responses to a survey are indicated as “female” rather than “F.” This ensures uniformity throughout the data set and allows you to read and categorize it easier.





De-duplication

Removing the duplicates from the dataset.








Aggregating

Sorting through data and putting it into summary form. In the example below, we wanted to know how many people were from each city surveyed. We put the data into aggregate form, in other words, we combined the data.







Filtering

Narrow down the dataset into a specific group. In the example below, we wanted to filter the names of people in the Girls on the Run program.









Merging

When you have multiple data sets, you can combine the needed parts of each into a new table.














Appending

To append datasets means to stack them together to create a larger dataset.
















Transforming

Combining or transforming datasets into a new version. There is a trick to do this in excel. Check this out.






Data Cleaning Best Practices

  1. Create a backup copy of the original data in a separate workbook to avoid losing any work in the event of a data management issue.

  2. Regularly backup the working file so all changes are saved. You don’t want to have to re-do hours of work if there are any issues with technology!

  3. Keep the questionnaire close to double check that the data is fulfilling the purpose of the questions. This helps track any mistakes that might happen in the data. For example, someone mistyping numbers or entering information in the wrong question.

SPS specializes in training agencies and organizations on research and evaluation methods. Contact us today to discuss more about how we can help you handle your data!