Commit Graph

5 Commits

Author SHA1 Message Date
1a5d14e0e7 Revert "rabbitmq"
This reverts commit 0976909cc8.
2026-01-15 22:08:12 +08:00
0976909cc8 rabbitmq 2026-01-15 21:20:57 +08:00
3cacaf040a feat(job_crawler): enhance logging and tracking for data filtering and Kafka production
- Add logging when API returns empty data to track offset progression
- Track expired job count separately from valid filtered jobs
- Initialize produced counter to handle cases with no filtered jobs
- Consolidate logging into single comprehensive info log per batch
- Log includes: total fetched, valid, expired, and Kafka-produced counts
- Improves observability for debugging data flow and filtering efficiency
2026-01-15 17:59:12 +08:00
3acc0a9221 feat(job_crawler): implement reverse-order incremental crawling with real-time Kafka publishing
- Add comprehensive sequence diagrams documenting container startup, task initialization, and incremental crawling flow
- Implement reverse-order crawling logic (from latest to oldest) to optimize performance by processing new data first
- Add real-time Kafka message publishing after each batch filtering instead of waiting for task completion
- Update progress tracking to store last_start_offset for accurate incremental crawling across sessions
- Enhance crawler service with improved offset calculation and batch processing logic
- Update configuration files to support new crawling parameters and Kafka integration
- Add progress model enhancements to track crawling state and handle edge cases
- Improve main application initialization to properly handle lifespan events and task auto-start
This change enables efficient incremental data collection where new data is prioritized and published immediately, reducing latency and improving system responsiveness.
2026-01-15 17:46:55 +08:00
ae681575b9 feat(job_crawler): initialize job crawler service with kafka integration
- Add technical documentation (技术方案.md) with system architecture and design details
- Create FastAPI application structure with modular organization (api, core, models, services, utils)
- Implement job data crawler service with incremental collection from third-party API
- Add Kafka service integration with Docker Compose configuration for message queue
- Create data models for job listings, progress tracking, and API responses
- Implement REST API endpoints for data consumption (/consume, /status) and task management
- Add progress persistence layer using SQLite for tracking collection offsets
- Implement date filtering logic to extract data published within 7 days
- Create API client service for third-party data source integration
- Add configuration management with environment-based settings
- Include Docker support with Dockerfile and docker-compose.yml for containerized deployment
- Add logging configuration and utility functions for date parsing
- Include requirements.txt with all Python dependencies and README documentation
2026-01-15 17:09:43 +08:00