This project is a desktop application that analyzes medical texts (in PDF format) to extract clinical entities. It uses the Apache cTAKES clinical text analysis system to identify entities like medications, symptoms, diseases, anatomical sites, and procedures.
- PDF to Text Conversion: Converts uploaded PDF files into plain text using the CloudConvert API.
- Clinical Entity Extraction: Identifies and categorizes clinical entities from the text using a cTAKES server.
- Entity Enrichment: Enriches the extracted entities with additional information from:
- SNOMED CT: Retrieves standard medical terms and concepts.
- Wikipedia: Fetches descriptions and images.
- PubMed: Searches for and retrieves related medical articles.
- GUI: A PyQt5-based graphical user interface for file upload, entity visualization, and information access.
- Data Storage: Saves extracted data and summaries in JSON and text formats.
- Clone the repository:
git clone https://github.com/your-username/Doctor-s-Basecode-UCL.git
- Install the required dependencies:
pip install -r requirements.txt
- Run the application:
python3 ctakes.py
- The application will open a window. Click "Choose file" to upload a PDF file.
- Optionally, you can check the "Download Articles" and "Wikipedia Descriptions" boxes to fetch additional information.
- Click "Run" to start the analysis.
- Once the analysis is complete, a new window will display the extracted entities, categorized into:
- Medicine
- Symptoms
- Anatomy
- Procedures
- Diseases/Disorders
- You can click on an entity to view its SNOMED code and click the "Wikipedia" or "Articles" buttons to view the fetched information.
├── apis/
│ ├── cloudconvert_api.py
│ ├── entrez_api.py
│ ├── snomed_api.py
│ └── wikipedia_api.py
├── classes/
│ ├── entity_class.py
│ └── summary_class.py
├── data/
├── gui/
│ └── gui.py
├── tests/
├── ctakes.py
├── requirements.txt
└── README.md
ctakes.py: The main script that orchestrates the entire process.gui/gui.py: Defines the PyQt5 GUI.apis/: Modules for interacting with external APIs.classes/: Contains theEntityandSummaryclasses.tests/: Unit tests for the application.data/: Directory for storing extracted data and summaries.
The project uses the following libraries:
requestswikipediacloudconvertPyQt5package
This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE.md file for details.