mirror of
https://git.datalinker.icu/deepseek-ai/DeepSeek-V3.git
synced 2026-03-16 11:37:16 +08:00
Comprehensive intelligence retrieval system for collecting and aggregating information about Nairobi, Kenya from multiple sources. Features: - Multi-source data collection (news, social media, government, tourism, business) - RESTful API with FastAPI - Automated scheduling for continuous data collection - Intelligence brief generation - Real-time trending topics tracking - Alert system for important updates - Web scraping with rate limiting and caching - Social media integration (Twitter, Instagram) - NLP-powered categorization and processing - Docker support for easy deployment - CLI for manual operations Components: - Data models with SQLAlchemy - Base collector class with extensible architecture - Source-specific collectors (news, social, government, tourism, business) - Data processor for brief generation - Scheduler for automated collection - Comprehensive API endpoints - CLI interface for manual control Documentation: - Complete README with setup instructions - Quick start guide - Example usage scripts - Docker Compose configuration - Environment configuration templates
4.7 KiB
4.7 KiB
Quick Start Guide
Get the Nairobi Information Collector up and running in minutes!
Prerequisites
- Python 3.9+ or Docker
- PostgreSQL (optional, SQLite works for development)
- API keys for various services (optional but recommended)
Installation
Option 1: Using Docker (Recommended)
# Clone the repository
git clone <repository-url>
cd nairobi-info-collector
# Copy environment file
cp .env.example .env
# Edit .env with your API keys
nano .env
# Start with Docker Compose
docker-compose up -d
# Check logs
docker-compose logs -f app
The API will be available at http://localhost:8000
Option 2: Local Installation
# Clone the repository
git clone <repository-url>
cd nairobi-info-collector
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download NLP model
python -m spacy download en_core_web_sm
# Copy and configure environment
cp .env.example .env
nano .env
# Initialize database
python cli.py init-db
# Run the application
python -m app.main
Configuration
Required API Keys
Edit .env and add your API keys:
# Social Media (optional but recommended)
TWITTER_BEARER_TOKEN=your_twitter_bearer_token
GOOGLE_MAPS_API_KEY=your_google_maps_key
# NLP Processing (optional)
OPENAI_API_KEY=your_openai_key
# Database (for production)
DATABASE_URL=postgresql://user:password@localhost:5432/nairobi_info
Free Tier Options
You can start without API keys:
- News collection works without keys (web scraping)
- Government data works without keys
- Social media requires API keys
Usage
Web API
-
Access the API documentation:
- Open
http://localhost:8000/docsin your browser - Interactive Swagger UI with all endpoints
- Open
-
Get the latest brief:
curl http://localhost:8000/api/v1/brief/latest -
Search for information:
curl "http://localhost:8000/api/v1/search?q=restaurant&category=food" -
Get trending topics:
curl http://localhost:8000/api/v1/trending
Command Line Interface
# Collect news
python cli.py collect news
# Collect from all sources
python cli.py collect all
# Generate a brief
python cli.py brief --hours 24 --output brief.md
# Collect social media (requires API keys)
python cli.py collect social --platform twitter
Testing
Manual Collection Test
# Test news collection
python cli.py collect news
# Check the database
python -c "from app.database import SessionLocal; from app.models.data_models import InformationItem; db = SessionLocal(); print(f'Items collected: {db.query(InformationItem).count()}')"
Generate a Brief
# Generate and save brief
python cli.py brief --output my_brief.md
# View the brief
cat my_brief.md
Accessing the Data
Via API
import requests
# Get latest brief
response = requests.get("http://localhost:8000/api/v1/brief/latest")
brief = response.json()
# Search
response = requests.get(
"http://localhost:8000/api/v1/search",
params={"q": "nairobi", "limit": 10}
)
results = response.json()
Via Database
from app.database import SessionLocal
from app.models.data_models import InformationItem
db = SessionLocal()
items = db.query(InformationItem).limit(10).all()
for item in items:
print(f"{item.title} - {item.category}")
Automation
The application automatically:
- Collects data every 5 minutes (configurable)
- Generates briefs every 6 hours
- Updates trending topics in real-time
To change collection frequency:
# In .env
COLLECTION_INTERVAL_SECONDS=300 # 5 minutes
Troubleshooting
Database connection errors
# Check PostgreSQL is running
docker-compose ps
# Reset database
docker-compose down -v
docker-compose up -d
No data being collected
- Check logs:
docker-compose logs -f app - Verify network connectivity
- Check API keys in
.env - Try manual collection:
python cli.py collect news
Import errors
# Reinstall dependencies
pip install -r requirements.txt --force-reinstall
Next Steps
- Add API Keys: Configure Twitter, Google Maps, etc. for more data sources
- Customize Sources: Edit
app/config.pyto add/remove sources - Set Up Monitoring: Configure Sentry for error tracking
- Deploy to Production: Use Docker Compose with proper environment variables
API Documentation
Full API documentation available at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
Support
For issues and questions:
- Check logs:
tail -f logs/app.log - View API health:
http://localhost:8000/api/v1/health - See stats:
http://localhost:8000/api/v1/stats