- Replace inline Dockerfile build configuration with pre-built image reference
- Change app service to use `job-crawler:latest` image instead of building from context
- Simplifies docker-compose configuration and enables faster container startup
- Assumes image is built separately via deploy script with no-cache flag
- Add --no-cache flag to docker build command to ensure fresh image builds
- Prevents cached layers from being used, guaranteeing latest dependencies
- Improves reliability of deployment process by avoiding stale artifacts
- Add logs-kafka command to view Kafka container logs separately
- Implement reset command to clean data volumes and reinitialize services
- Add confirmation prompt for destructive reset operation
- Update help text to clarify logs command shows application logs
- Improve command case statement alignment for better readability
- Add documentation for new reset command with data volume cleanup
- Separate clean command documentation to focus on image pruning only
- Add logging when API returns empty data to track offset progression
- Track expired job count separately from valid filtered jobs
- Initialize produced counter to handle cases with no filtered jobs
- Consolidate logging into single comprehensive info log per batch
- Log includes: total fetched, valid, expired, and Kafka-produced counts
- Improves observability for debugging data flow and filtering efficiency
- Add comprehensive sequence diagrams documenting container startup, task initialization, and incremental crawling flow
- Implement reverse-order crawling logic (from latest to oldest) to optimize performance by processing new data first
- Add real-time Kafka message publishing after each batch filtering instead of waiting for task completion
- Update progress tracking to store last_start_offset for accurate incremental crawling across sessions
- Enhance crawler service with improved offset calculation and batch processing logic
- Update configuration files to support new crawling parameters and Kafka integration
- Add progress model enhancements to track crawling state and handle edge cases
- Improve main application initialization to properly handle lifespan events and task auto-start
This change enables efficient incremental data collection where new data is prioritized and published immediately, reducing latency and improving system responsiveness.
- Add comprehensive DEPLOY.md with quick start instructions for all platforms
- Add deploy.sh script for Linux/Mac with build, up, down, restart, logs, status, and clean commands
- Add deploy.bat script for Windows with equivalent deployment commands
- Include manual deployment steps using docker and docker-compose
- Document configuration setup and environment variables
- Add production environment recommendations for external Kafka, data persistence, and logging
- Include troubleshooting section for common deployment issues
- Provide health check and service status verification commands
- Add technical documentation (技术方案.md) with system architecture and design details
- Create FastAPI application structure with modular organization (api, core, models, services, utils)
- Implement job data crawler service with incremental collection from third-party API
- Add Kafka service integration with Docker Compose configuration for message queue
- Create data models for job listings, progress tracking, and API responses
- Implement REST API endpoints for data consumption (/consume, /status) and task management
- Add progress persistence layer using SQLite for tracking collection offsets
- Implement date filtering logic to extract data published within 7 days
- Create API client service for third-party data source integration
- Add configuration management with environment-based settings
- Include Docker support with Dockerfile and docker-compose.yml for containerized deployment
- Add logging configuration and utility functions for date parsing
- Include requirements.txt with all Python dependencies and README documentation