Skip to content

Explainable Personality Assessment Method using Heterogeneous Linguistic Features and Off-the-Shelf LLMs

License

Notifications You must be signed in to change notification settings

SMIL-SPCRAS/ExPAM

Repository files navigation

ExPAM: Explainable Personality Assessment Method using Heterogeneous Linguistic Features and Off-the-Shelf LLMs


Elena Ryumina, Dmitry Ryumin, Maxim Markitantov, Alexey Karpov


Abstract

Many organizations are increasingly adopting personalization techniques to enhance a user satisfaction. However, current systems generally lack the ability to automatically infer and interpret individual Personality traits (PTs), which are key drivers of user behavior. Large Language Models (LLMs) are widely used, but they are still not well-suited to reliable and explainable Personality Assessment (PA). To address this gap, we propose ExPAM, a novel Explainable Personality Assessment Method that leverages hybrid feature fusion and in-context learning with off-the-shelf Large Language Models (LLMs) to predict Big Five PTs from textual data. It allows explicitly grounding predictions in interpretable linguistic patterns without requiring Large Language Models (LLMs) fine-tuning. The hybrid fusion is designed to simultaneously enhance predictive performance and model interpretability in Personality Assessment (PA). Specifically, transformer-based embeddings encode local contextual information, while features extracted via the Linguistic Inquiry and Word Count (LIWC) dictionary provide complementary global and local linguistic indicators of PTs. These interpretable feature patterns are incorporated into prompts that guide the LLM to generate both PTs predictions and human-understandable explanations. Evaluated on the ChaLearn First Impressions v2 corpus, ExPAM outperforms models relying on either feature type alone, achieving a mean accuracy (mACC) of 0.891 and a Concordance Correlation Coefficient (CCC) of 0.333. Moreover, prompting the LLM with hybrid global-local patterns yields a relative CCC improvement of 9.6%. Qualitative interpretability analysis reveals trait-specific linguistic patterns, offering valuable insights for psychological research, computational linguistics, and paralinguistic studies. The proposed method thus advances both accuracy and transparency in PA, with promising applications in psychological profiling, personnel selection, and personalized recommendation systems.


Framework Pipeline

ExPAM Pipeline

Figure 1: Pipeline of ExPAM.


Materials

The project uses the ChaLearn First Impressions V2 corpus, which includes:

  • Video recordings of more than 3000 individuals.
  • Ground-truth personality trait scores (continuous values from 0 to 1) for Big Five traits: Openness (O), Conscientiousness (C), Extraversion (E), Agreeableness (N), non-Neuroticism (N).

The corpus is available after registration. After registration, raw data is downloaded. The prepared data is available at src/prepered_dataframes.

Code Information

The codebase is structured as follows:

project_root/
├── figures/ # visualizations
├── src/
│ ├── prepered_dataframes
│ │ ├── dev_full_with_ASR.csv
│ │ ├── test_full_with_ASR.csv
│ │ ├── train_full_with_ASR.csv
│ ├── datasets.py # Custom PyTorch datasets and collate functions
│ ├── losses.py # Loss functions (LogCoshGL, etc.)
│ ├── measures.py # Evaluation metrics (CCC, MAE)
│ ├── models.py # Model architectures (BiLSTMAtt, MambaAtt, fusion_model)
│ ├── text_preprocessing.py # Embedding extraction and LIWC feature generation
│ ├── training_utils.py # Training loops, early stopping, checkpointing
│ └── utils.py # Helper functions
├── get_attention_weights.py # Generate attention weights for test set
├── get_explanation_for_LLM.py # Generate hybrid-based explanations for LLMs
├── get_explanation_with_LLM.py # Generate trait explanations + LLM refinement
├── refine_with_llm.py # Code for refined predictions and explanations using an LLM
├── train_single_models.py # Train base models (XLM, LIWC)
├── train_fusion_model.py # Train ensemble/fusion model
├── transcribe_with_asr.py # Generate ASR transcripts from audio
└── README.md # This file

Usage Instructions

1. Setup Environment

# Clone repository
git clone https://github.com/yourname/ExPAM.git
cd ExPAM

# Install dependencies
pip install -r requirements.txt

2. Generate ASR Transcripts (Optional)

python transcribe_with_asr.py \
  --data_path "path/to/audio/" \
  --df_path "path/to/csvs/" \
  --whisper_model "openai/whisper-large-v3-turbo" \
  --device "cuda:0"

This will add text_ASR column to your CSVs. The prepared data is available at src/prepered_dataframes.

3. Train Single Models

Train single models on XLM-RoBERTa, JINA, BERT and LIWC features:

python train_single_models.py \
  --models BiLSTMAtt ReBiLSTMAtt MambaAtt ReMambaAtt \
  --encoders xlm jina-v3 bert bge liwc \
  --lrs 1e-5 1e-4\
  --dropouts 0.1 0.0\
  --hds 64 128\
  --epochs 60 \
  --seed 42 \
  --patience 10 \
  --bs 32 \
  --save_dir "saved_single_models"

4. Train Fusion Model

Combine predictions from best single models:

python train_fusion_model.py \
  --nn_model_path "saved_single_models/BEST MODEL BASED ON DEEP FEATURES" \
  --hc_model_path "saved_single_models/BEST MODEL BASED ON HAND-CRAFTED FEATURES" \
  --save_dir "saved_fusion_models" \
  --deep_model_architecture "BEST DEEP MODEL ARCHITECTURE" \
  --hc_model_architecture "BEST HAND-CRAFTED MODEL ARCHITECTURE" \
  --deep_encoder "BEST DEEP ENCODER" \
  --lr 1e-2\
  --epochs 500 \
  --seed 42 \
  --patience 100 \
  --bs 128 \
  --save_dir "saved_fusion_models"  

5. Generate Attention Weights for All Train / Test Examples

This step is necessary for interpreting the hybrid model results and generating explanations:

python get_attention_weights.py \
  --nn_model_path "saved_single_models/BEST MODEL BASED ON DEEP FEATURES" \
  --hc_model_path "saved_single_models/BEST MODEL BASED ON HAND-CRAFTED FEATURES" \
  --save_dir "saved_fusion_models" \
  --deep_model_architecture "BEST DEEP MODEL ARCHITECTURE" \
  --hc_model_architecture "BEST HAND-CRAFTED MODEL ARCHITECTURE" \
  --deep_encoder "BEST DEEP ENCODER" \
  --dataset_path "src/prepered_dataframes/train_full_with_ASR.csv" \
  --subset "train"

This step is required before running get_explanation_for_LLM.py and get_explanation_with_LLM.py.

6. Generate Explanations Without LLM for All Test Examples

To generate explanations for all test examples for building prompts for LLM, you should use:

python get_explanation_for_LLM.py \
--test_csv "src/prepered_dataframes/test_full_with_ASR.csv" \
--train_weights "train_attention_weights.pickle" \
--test_weights "test_attention_weights.pickle" \
--liwc_path "LIWC2007.txt" \
--save_pickle "test_explanations.pickle"

7. Refine Predictions and Explanation with LLM for All Test Examples

Several Large Language Models (LLMs) were evaluated in four different experimental setups (zero-shot, one-shot, few-shot, and explanation-based):

However, in terms of performance measures (mACC, CCC), Falcon-H1-7B-Instruct outperformed the others. See Figure 2:

ExPAM Performance measures of LLMs.

Figure 2: Performance measures of LLM. ZS, OS, FS and EX refer to zero-, one-, few-shot and explanation-based setups. T means a thinking mode.

To obtain refined predictions and explanations using an LLM, you should use:

python refine_with_llm.py \
--prompt_type explanation \
--prompt_pickle "test_explanations.pickle" \
--input_csv "src/prepered_dataframes/test_full_with_ASR.csv" \
--output_csv out_expl.csv \
--log_file log_expl.txt \
--llm_model_id tiiuae/Falcon-H1-7B-Instruct

8. Generate Explanations With / Without LLM for One Example

For a specific video and trait:

python get_explanation_with_LLM.py \
  --train_weights "train_attention_weights.pickle" \
  --test_weights "test_attention_weights.pickle" \
  --video_name "BSfClgoqf00.001" \
  --test_csv "test_full_with_ASR.csv" \
  --run_llm \
  --llm_model_path "tiiuae/Falcon-H1-7B-Instruct" \
  --output_dir "results/BSfClgoqf00.001" \

Methodology

  1. Data Preprocessing
  • Text normalization: lowercase, contraction expansion, punctuation removal.
  • Tokenization and embedding extraction via XLM-RoBERTa / JINA / BERT.
  • LIWC feature extraction per token using dictionary matching.
  1. Model Architecture
  • Single Models: BiLSTM + Attention (BiLSTMAtt) / Residual + BiLSTM + Attention (ReBiLSTMAtt) / Mamba + Attention (BiMambaAtt) / Residual + Mamba + Attention (ReMambaAtt) for each modality.
  • Fusion Model: Concatenates predictions from single models based on deep and hand-crafted features → single dense layer → sigmoid output.
  1. Interpretability
  • Global attention weights aggregated across training set.
  • Local token-level attention normalized and visualized.
  • Explanation generated based on top positive/negative tokens and categories.
  1. LLM integration
  • Prompt instructs LLM to reinterpret scores based on our explanation-based prompt, ignoring initial predictions unless supported.
  • Output: refined scores + natural language explanation (200 words).

Citations

If you use this work, please cite the following paper (currently under review):

@article{ryumina2025ber,
  title   = {ExPAM: Explainable Personality Assessment Method using Heterogeneous Linguistic Features and Off-the-Shelf LLMs},
  author  = {Ryumina, Elena and Ryumin, Dmitry and Markitantov, Maxim and Karpov, Alexey},
  journal = {PeerJ Computer Science},
  year    = {2026},
  note    = {Under review}
}

License

This project is released under the MIT License — see LICENSE for details.

About

Explainable Personality Assessment Method using Heterogeneous Linguistic Features and Off-the-Shelf LLMs

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages