Commit Graph

8 Commits

Author SHA1 Message Date
0976909cc8 rabbitmq 2026-01-15 21:20:57 +08:00
0ed3c8f94d chore(job_crawler): update docker-compose to use pre-built image
- Replace inline Dockerfile build configuration with pre-built image reference
- Change app service to use `job-crawler:latest` image instead of building from context
- Simplifies docker-compose configuration and enables faster container startup
- Assumes image is built separately via deploy script with no-cache flag
2026-01-15 20:44:31 +08:00
1d8778037f chore(job_crawler): add no-cache flag to Docker build in deploy script
- Add --no-cache flag to docker build command to ensure fresh image builds
- Prevents cached layers from being used, guaranteeing latest dependencies
- Improves reliability of deployment process by avoiding stale artifacts
2026-01-15 20:32:58 +08:00
53288327a1 chore(job_crawler): enhance deploy script with Kafka logging and reset functionality
- Add logs-kafka command to view Kafka container logs separately
- Implement reset command to clean data volumes and reinitialize services
- Add confirmation prompt for destructive reset operation
- Update help text to clarify logs command shows application logs
- Improve command case statement alignment for better readability
- Add documentation for new reset command with data volume cleanup
- Separate clean command documentation to focus on image pruning only
2026-01-15 18:03:21 +08:00
3cacaf040a feat(job_crawler): enhance logging and tracking for data filtering and Kafka production
- Add logging when API returns empty data to track offset progression
- Track expired job count separately from valid filtered jobs
- Initialize produced counter to handle cases with no filtered jobs
- Consolidate logging into single comprehensive info log per batch
- Log includes: total fetched, valid, expired, and Kafka-produced counts
- Improves observability for debugging data flow and filtering efficiency
2026-01-15 17:59:12 +08:00
3acc0a9221 feat(job_crawler): implement reverse-order incremental crawling with real-time Kafka publishing
- Add comprehensive sequence diagrams documenting container startup, task initialization, and incremental crawling flow
- Implement reverse-order crawling logic (from latest to oldest) to optimize performance by processing new data first
- Add real-time Kafka message publishing after each batch filtering instead of waiting for task completion
- Update progress tracking to store last_start_offset for accurate incremental crawling across sessions
- Enhance crawler service with improved offset calculation and batch processing logic
- Update configuration files to support new crawling parameters and Kafka integration
- Add progress model enhancements to track crawling state and handle edge cases
- Improve main application initialization to properly handle lifespan events and task auto-start
This change enables efficient incremental data collection where new data is prioritized and published immediately, reducing latency and improving system responsiveness.
2026-01-15 17:46:55 +08:00
63cd432a0c docs(job_crawler): add deployment guide and scripts for Linux/Mac/Windows
- Add comprehensive DEPLOY.md with quick start instructions for all platforms
- Add deploy.sh script for Linux/Mac with build, up, down, restart, logs, status, and clean commands
- Add deploy.bat script for Windows with equivalent deployment commands
- Include manual deployment steps using docker and docker-compose
- Document configuration setup and environment variables
- Add production environment recommendations for external Kafka, data persistence, and logging
- Include troubleshooting section for common deployment issues
- Provide health check and service status verification commands
2026-01-15 17:12:51 +08:00
ae681575b9 feat(job_crawler): initialize job crawler service with kafka integration
- Add technical documentation (技术方案.md) with system architecture and design details
- Create FastAPI application structure with modular organization (api, core, models, services, utils)
- Implement job data crawler service with incremental collection from third-party API
- Add Kafka service integration with Docker Compose configuration for message queue
- Create data models for job listings, progress tracking, and API responses
- Implement REST API endpoints for data consumption (/consume, /status) and task management
- Add progress persistence layer using SQLite for tracking collection offsets
- Implement date filtering logic to extract data published within 7 days
- Create API client service for third-party data source integration
- Add configuration management with environment-based settings
- Include Docker support with Dockerfile and docker-compose.yml for containerized deployment
- Add logging configuration and utility functions for date parsing
- Include requirements.txt with all Python dependencies and README documentation
2026-01-15 17:09:43 +08:00