How to Prepare your Text Data for Coding in QDAS

Learn best practices for preparing text-based data for analysis

Step 1: Transcribe

If you have qualitative data (from focus groups, interviews, etc.) that was recorded and saved as a video or audio file, the first step in data preparation will be to transcribe that data. Transcribing the data means converting the data into a text format. Here are some popular transcription services:

Otter.ai

Rev.ai

oTranscribe

Do not upload your research data to free or public AI tools (ChatGPT, etc.) for transcription because these tools use the data that you have submitted to train their AI models. This type of data sharing will likely violate participants’ data privacy and confidentiality, unless you have collected explicit consent for AI-based transcription within your study consent form. With any transcription service, you should read the data use/data sharing policy to make sure it complies with your institutional review board requirements before using.

‍

Step 2: De-identify

It is important to remove any data from your documents that could be used to identify participants. This includes obvious identifiers such as names and locations but could also include information that makes participants identifiable based on the context (such as in studies involving specific subpopulations of people).

To de-identify your data, you can search for words or phrases (like names) in your word processing software and replace them with pseudonyms or ID numbers. There are also tools that can help you with the de-identification process. One such tool on the horizon is De-ID, a data anonymization app that is secure and not connected to AI, slated to be released later this year.

Using De-ID will include a simple, user-controlled process. You first upload your text file, then indicate the anonymization parameters using fields (such as name, location, age). The application processes the documents, then you must review and accept suggested changes to your document to anonymize all potential identifiers. If you are interested in using De-ID in the future, sign up to be on the first-access list here: https://bit.ly/DeIDapp

‍

Step 3: Format

Formatting your documents involves ensuring that your documents are in the layout you want to use when you begin coding them. Some transcription services will include time stamps or line numbers in your documents. You can consider removing those elements if you prefer to read through your data without them. In the formatting process, you can also choose to insert spaces or bolded text to identify the shift between speakers, such as in the example text below:

SM: Thank you for joining me today, Elena. Before we start, I want to make sure you're comfortable with our conversation being recorded and transcribed for research purposes?

ER: Yes, absolutely. I'm happy to share my story if it helps with your research.

SM: Thank you. As I mentioned in our email, we're studying the experiences of individuals who have received service dogs. Could you begin by telling me a bit about yourself and what led you to getting a service dog?

In this example interview, a change in speaker is denoted by extra spacing between paragraphs as well as bolded text for the speaker initials. It is a good idea to do a “test run” by first formatting a single document and then importing it into your qualitative data analysis software. This allows you to view the document in the environment where you will be coding it. You can then assess the appearance of the document and make any desired formatting changes to the rest of your documents before importing them into the software.

Dr. Michelle Salmona and Dr. Hannah G. Calvert are both affiliated with the Institute for Mixed Methods Research. Read more about their backgrounds on the "Leadership Team" page.