ocups-kafka

Author	SHA1	Message	Date
李顺东	0976909cc8	rabbitmq	2026-01-15 21:20:57 +08:00
李顺东	0ed3c8f94d	chore(job_crawler): update docker-compose to use pre-built image - Replace inline Dockerfile build configuration with pre-built image reference - Change app service to use `job-crawler:latest` image instead of building from context - Simplifies docker-compose configuration and enables faster container startup - Assumes image is built separately via deploy script with no-cache flag	2026-01-15 20:44:31 +08:00
李顺东	1d8778037f	chore(job_crawler): add no-cache flag to Docker build in deploy script - Add --no-cache flag to docker build command to ensure fresh image builds - Prevents cached layers from being used, guaranteeing latest dependencies - Improves reliability of deployment process by avoiding stale artifacts	2026-01-15 20:32:58 +08:00
李顺东	53288327a1	chore(job_crawler): enhance deploy script with Kafka logging and reset functionality - Add logs-kafka command to view Kafka container logs separately - Implement reset command to clean data volumes and reinitialize services - Add confirmation prompt for destructive reset operation - Update help text to clarify logs command shows application logs - Improve command case statement alignment for better readability - Add documentation for new reset command with data volume cleanup - Separate clean command documentation to focus on image pruning only	2026-01-15 18:03:21 +08:00
李顺东	3cacaf040a	feat(job_crawler): enhance logging and tracking for data filtering and Kafka production - Add logging when API returns empty data to track offset progression - Track expired job count separately from valid filtered jobs - Initialize produced counter to handle cases with no filtered jobs - Consolidate logging into single comprehensive info log per batch - Log includes: total fetched, valid, expired, and Kafka-produced counts - Improves observability for debugging data flow and filtering efficiency	2026-01-15 17:59:12 +08:00
李顺东	3acc0a9221	feat(job_crawler): implement reverse-order incremental crawling with real-time Kafka publishing - Add comprehensive sequence diagrams documenting container startup, task initialization, and incremental crawling flow - Implement reverse-order crawling logic (from latest to oldest) to optimize performance by processing new data first - Add real-time Kafka message publishing after each batch filtering instead of waiting for task completion - Update progress tracking to store last_start_offset for accurate incremental crawling across sessions - Enhance crawler service with improved offset calculation and batch processing logic - Update configuration files to support new crawling parameters and Kafka integration - Add progress model enhancements to track crawling state and handle edge cases - Improve main application initialization to properly handle lifespan events and task auto-start This change enables efficient incremental data collection where new data is prioritized and published immediately, reducing latency and improving system responsiveness.	2026-01-15 17:46:55 +08:00
李顺东	63cd432a0c	docs(job_crawler): add deployment guide and scripts for Linux/Mac/Windows - Add comprehensive DEPLOY.md with quick start instructions for all platforms - Add deploy.sh script for Linux/Mac with build, up, down, restart, logs, status, and clean commands - Add deploy.bat script for Windows with equivalent deployment commands - Include manual deployment steps using docker and docker-compose - Document configuration setup and environment variables - Add production environment recommendations for external Kafka, data persistence, and logging - Include troubleshooting section for common deployment issues - Provide health check and service status verification commands	2026-01-15 17:12:51 +08:00
李顺东	ae681575b9	feat(job_crawler): initialize job crawler service with kafka integration - Add technical documentation (技术方案.md) with system architecture and design details - Create FastAPI application structure with modular organization (api, core, models, services, utils) - Implement job data crawler service with incremental collection from third-party API - Add Kafka service integration with Docker Compose configuration for message queue - Create data models for job listings, progress tracking, and API responses - Implement REST API endpoints for data consumption (/consume, /status) and task management - Add progress persistence layer using SQLite for tracking collection offsets - Implement date filtering logic to extract data published within 7 days - Create API client service for third-party data source integration - Add configuration management with environment-based settings - Include Docker support with Dockerfile and docker-compose.yml for containerized deployment - Add logging configuration and utility functions for date parsing - Include requirements.txt with all Python dependencies and README documentation	2026-01-15 17:09:43 +08:00

8 Commits