Professional data processing, analysis, and OCR extraction system.
Python 3.10+ | MIT License | Code style: black
Portugues | English
Projetos refinados que demonstram crescimento tecnico e arquitetura profissional:
- Natural Language Processing with Disaster Tweets - NLP avancado com transformers
- Public Data Pipeline for Business Insights - Pipeline ETL profissional em cloud
- SQL for Budget Data Analysis - Analise governamental com SQL otimizado
- Kotlin Data Pipeline - Pipeline poliglota com Kotlin
Repositorio com modulos profissionais para engenharia de dados, analise de dados e extracao de informacoes por OCR. Sistema integrado para processamento de faturas fiscais brasileiras, gestao de banco de dados e analise de dados estruturados.
- CSV Handler (src/csv_handler.py) - Processamento seguro de arquivos CSV com validacao de caminho
- Database Manager (src/database.py) - Gerenciador SQLAlchemy com queries parametrizadas (SQL injection safe)
- Invoice OCR (src/invoice_ocr.py) - Extrator consolidado de dados de faturas PDF com Tesseract + OpenCV
- Invoice Analysis (src/invoice_analysis.py) - Analise e exportacao de dados de faturas para Excel
TRABALHOSPython/
├── src/
│ ├── csv_handler.py # Gerenciador CSV
│ ├── database.py # Gerenciador banco de dados
│ ├── invoice_ocr.py # Extrator OCR (consolidado)
│ └── invoice_analysis.py # Analise de faturas
├── tests/ # Testes unitarios
├── examples/ # Scripts de exemplo
├── requirements.txt # Dependencias do projeto
├── .gitignore # Git ignore profissional
└── README.md # Este arquivo
# Clone o repositorio
git clone https://github.com/bellDataSc/TRABALHOSPython.git
cd TRABALHOSPython
# Crie um ambiente virtual
python -m venv venv
source venv/bin/activate # Linux/Mac
# ou
venv\\Scripts\\activate # Windows
# Instale as dependencias
pip install -r requirements.txt
# Configure tesseract (se usar OCR)
# Windows: https://github.com/UB-Mannheim/tesseract/wiki
# Linux: sudo apt-get install tesseract-ocr
# Mac: brew install tesseractfrom src.csv_handler import read_csv, save_csv
# Ler CSV
df = read_csv('data.csv')
# Salvar CSV
save_csv(df, 'output.csv')from src.database import Database
db = Database("sqlite:///database.db")
# Query segura (parametrizada)
df = db.query(
"SELECT * FROM users WHERE name LIKE :pattern",
{"pattern": "%John%"}
)
db.close()from pathlib import Path
from src.invoice_ocr import InvoiceOCR
ocr = InvoiceOCR(dpi=300)
df = ocr.process_directory(
directory=Path("pdfs"),
output_excel=Path("result.xlsx")
)- Type hints em todas as funcoes
- Docstrings em formato Google
- Context managers para gerenciamento de recursos
- Queries parametrizadas (SQL injection safe)
- Dataclasses para estruturas de dados
- Logging estruturado
| Original | Refatorado | Ganho |
|---|---|---|
| Pandas.py | csv_handler.py | Type hints + error handling |
| Query.py | database.py | SQLAlchemy + seguranca |
| dadosdog.py + extracdog.py | invoice_ocr.py | Consolidado em 1 arquivo |
| script.py | invoice_analysis.py | Dataclasses + type hints |
# Executar testes
pytest tests/
# Com cobertura
pytest --cov=src tests/
# Verificacao de tipo
mypy src/
# Linting
flake8 src/
black --check src/- Fork o projeto
- Crie uma branch para sua feature (git checkout -b feature/AmazingFeature)
- Commit suas mudancas (git commit -m 'Add some AmazingFeature')
- Push para a branch (git push origin feature/AmazingFeature)
- Abra um Pull Request
Este projeto esta sob a licenca MIT. Veja o arquivo LICENSE para mais detalhes.
Repository with professional modules for data engineering, data analysis, and OCR information extraction. Integrated system for Brazilian fiscal invoice processing, database management, and structured data analysis.
- CSV Handler (src/csv_handler.py) - Safe CSV file processing with path validation
- Database Manager (src/database.py) - SQLAlchemy manager with parameterized queries (SQL injection safe)
- Invoice OCR (src/invoice_ocr.py) - Consolidated PDF invoice data extractor with Tesseract + OpenCV
- Invoice Analysis (src/invoice_analysis.py) - Invoice data analysis and Excel export
TRABALHOSPython/
├── src/
│ ├── csv_handler.py # CSV manager
│ ├── database.py # Database manager
│ ├── invoice_ocr.py # OCR extractor (consolidated)
│ └── invoice_analysis.py # Invoice analysis
├── tests/ # Unit tests
├── examples/ # Example scripts
├── requirements.txt # Project dependencies
├── .gitignore # Professional gitignore
└── README.md # This file
# Clone the repository
git clone https://github.com/bellDataSc/TRABALHOSPython.git
cd TRABALHOSPython
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\\Scripts\\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Setup tesseract (if using OCR)
# Windows: https://github.com/UB-Mannheim/tesseract/wiki
# Linux: sudo apt-get install tesseract-ocr
# Mac: brew install tesseractfrom src.csv_handler import read_csv, save_csv
# Read CSV
df = read_csv('data.csv')
# Save CSV
save_csv(df, 'output.csv')from src.database import Database
db = Database("sqlite:///database.db")
# Safe query (parameterized)
df = db.query(
"SELECT * FROM users WHERE name LIKE :pattern",
{"pattern": "%John%"}
)
db.close()from pathlib import Path
from src.invoice_ocr import InvoiceOCR
ocr = InvoiceOCR(dpi=300)
df = ocr.process_directory(
directory=Path("pdfs"),
output_excel=Path("result.xlsx")
)- Type hints in all functions
- Google-style docstrings
- Context managers for resource management
- Parameterized queries (SQL injection safe)
- Dataclasses for data structures
- Structured logging
| Original | Refactored | Benefit |
|---|---|---|
| Pandas.py | csv_handler.py | Type hints + error handling |
| Query.py | database.py | SQLAlchemy + security |
| dadosdog.py + extracdog.py | invoice_ocr.py | Consolidated into 1 file |
| script.py | invoice_analysis.py | Dataclasses + type hints |
# Run tests
pytest tests/
# With coverage
pytest --cov=src tests/
# Type checking
mypy src/
# Linting
flake8 src/
black --check src/- Fork the project
- Create a feature branch (git checkout -b feature/AmazingFeature)
- Commit your changes (git commit -m 'Add some AmazingFeature')
- Push to the branch (git push origin feature/AmazingFeature)
- Open a Pull Request
This project is licensed under the MIT License. See the LICENSE file for details.
Isabel Cruz - @bellDataSc
LinkedIn: belcruz | Medium: @belgon | GitHub: bellDataSc