ocups-kafka

Author	SHA1	Message	Date
李顺东	1a5d14e0e7	Revert "rabbitmq" This reverts commit `0976909cc8`.	2026-01-15 22:08:12 +08:00
李顺东	0976909cc8	rabbitmq	2026-01-15 21:20:57 +08:00
李顺东	3cacaf040a	feat(job_crawler): enhance logging and tracking for data filtering and Kafka production - Add logging when API returns empty data to track offset progression - Track expired job count separately from valid filtered jobs - Initialize produced counter to handle cases with no filtered jobs - Consolidate logging into single comprehensive info log per batch - Log includes: total fetched, valid, expired, and Kafka-produced counts - Improves observability for debugging data flow and filtering efficiency	2026-01-15 17:59:12 +08:00
李顺东	3acc0a9221	feat(job_crawler): implement reverse-order incremental crawling with real-time Kafka publishing - Add comprehensive sequence diagrams documenting container startup, task initialization, and incremental crawling flow - Implement reverse-order crawling logic (from latest to oldest) to optimize performance by processing new data first - Add real-time Kafka message publishing after each batch filtering instead of waiting for task completion - Update progress tracking to store last_start_offset for accurate incremental crawling across sessions - Enhance crawler service with improved offset calculation and batch processing logic - Update configuration files to support new crawling parameters and Kafka integration - Add progress model enhancements to track crawling state and handle edge cases - Improve main application initialization to properly handle lifespan events and task auto-start This change enables efficient incremental data collection where new data is prioritized and published immediately, reducing latency and improving system responsiveness.	2026-01-15 17:46:55 +08:00
李顺东	ae681575b9	feat(job_crawler): initialize job crawler service with kafka integration - Add technical documentation (技术方案.md) with system architecture and design details - Create FastAPI application structure with modular organization (api, core, models, services, utils) - Implement job data crawler service with incremental collection from third-party API - Add Kafka service integration with Docker Compose configuration for message queue - Create data models for job listings, progress tracking, and API responses - Implement REST API endpoints for data consumption (/consume, /status) and task management - Add progress persistence layer using SQLite for tracking collection offsets - Implement date filtering logic to extract data published within 7 days - Create API client service for third-party data source integration - Add configuration management with environment-based settings - Include Docker support with Dockerfile and docker-compose.yml for containerized deployment - Add logging configuration and utility functions for date parsing - Include requirements.txt with all Python dependencies and README documentation	2026-01-15 17:09:43 +08:00

5 Commits