← Projects
Completed Research

Leximancer Thematic Analysis Pipeline

View on GitHub ↗
LeximancerPythonpandasSPSSR

About this project

Automated qualitative-to-quantitative thematic analysis pipeline using Leximancer for corpus-level concept mapping. Applied to interview transcripts and open-text survey responses in remote work research.

Background

The qualitative component of the PhD involved 60 semi-structured interviews with remote workers across Australia. Manually coding 60 interviews is tractable but slow. Leximancer provides a computational alternative: it generates concept maps from text corpora by identifying co-occurring concepts and their relative importance, which gives you a quantitative view of what themes are present and how they relate before you begin detailed manual coding.

The pipeline automates the pre-processing work that would otherwise be repetitive and error-prone: transcript cleaning (removing interviewer prompts, filler words, PII), speaker diarisation (separating participant responses from interviewer questions), and stopword tuning (domain-specific terms that are frequent but not conceptually meaningful need to be suppressed). The output of the pipeline feeds directly into Leximancer without manual steps.

The most valuable output was the concept co-occurrence matrix, which I used to triangulate against the SEM constructs before running CFA. If Leximancer identified "liminal space" and "disconnection" as co-occurring concepts in the interview data, but the survey items for Liminal Spaces didn't load onto a coherent factor, that's a signal that the operationalisation needed revision. Using qualitative and quantitative approaches in parallel created a feedback loop that strengthened both.

Highlights

  • Concept map generation from unstructured interview transcripts
  • Theme co-occurrence matrices for qualitative pattern identification
  • Integration of Leximancer outputs with SEM constructs for triangulation
  • Preprocessing pipeline: transcript cleaning, speaker diarisation, stopword tuning
  • Used to validate and refine liminal space construct items pre-CFA
← All projects GitHub ↗
← Remote Work Intensity Survey Instrument Publication-Quality Figure Generator →