Setup

PersonaProfiler is a small project that leverages large language models (LLMs) to analyze exported ChatGPT (or from any other provider) conversations and deduce personal information. The goal is to demonstrate how digital dialogues can reveal sensitive data — not only what is directly shared but also what can be inferred through context.

I wrote a detailed a medium story about this project and its findings using my own data. You can check it here.

Setup

(Optional) create a virtual environment
pip install -r requirements.txt
Export you conversations from ChatGPT into chatGPT-data.
- You are free to rename this folder, but you will also need to update the scripts accordingly.
- The scripts only support reading ChatGPT exports, so if you are using other provider you will need to adapt the loading code. Particularly, the export from ChatGPT has a conversations.json file that is used to load the conversations.
Rename the .env.example to .env and put your OpenAI key.

Running the code

The pipeline is composed of 3 steps, each having a corresponding script:

Information Extraction (profile_extractor.py): For each conversation, a model (e.g., GPT-4 variants) is prompted to extract personal details and compile a running summary of deductions. Both the conversation and the latest summary is fed to the model, such that the summary is continuously increasing and being corrected. After analyzing a conversation, the summary is written into òutputs/{model_name}/summary_{i}, with i the index of the conversation. The final summary contains all the information from all conversations -- we store intermediate summaries for cases where there are some connectivity issues.
Profiling (profile_aggregator.py): Another model uses the comprehensive summary to generate a final profile in JSON format, covering aspects like personal attributes, professional background, lifestyle, and additional inferences.
Person Finder (webperson_finder.py): A model with web search capability attempts to locate the user online based on the profile details.

Feel free to change the openAI models that are used in each step by specifying them in the main of each script. By default, the information extraction uses gpt-4o, the profiling uses gpt-o3-mini, and the person finder uses gpt-4o with web search capabilities.

A jupyter notebook data_analysis.ipynb is provided with some code to inspect the conversations and all outputs of the models.

Disclaimer

This project is intended for educational purposes only. It demonstrates the potential privacy implications of using chatbots and LLMs. Please handle all personal data responsibly and avoid publicly sharing sensitive information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup

Running the code

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
chatGPT-data		chatGPT-data
outputs		outputs
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
data_analysis.ipynb		data_analysis.ipynb
profile_aggregator.py		profile_aggregator.py
profile_extractor.py		profile_extractor.py
requirements.txt		requirements.txt
webperson_finder.py		webperson_finder.py

Folders and files

Latest commit

History

Repository files navigation

Setup

Running the code

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages