mirror of https://git.datalinker.icu/deepseek-ai/DeepSeek-V3.git synced 2026-03-16 11:37:16 +08:00

Add Nairobi Information Collector application

Comprehensive intelligence retrieval system for collecting and aggregating
information about Nairobi, Kenya from multiple sources.

Features:
- Multi-source data collection (news, social media, government, tourism, business)
- RESTful API with FastAPI
- Automated scheduling for continuous data collection
- Intelligence brief generation
- Real-time trending topics tracking
- Alert system for important updates
- Web scraping with rate limiting and caching
- Social media integration (Twitter, Instagram)
- NLP-powered categorization and processing
- Docker support for easy deployment
- CLI for manual operations

Components:
- Data models with SQLAlchemy
- Base collector class with extensible architecture
- Source-specific collectors (news, social, government, tourism, business)
- Data processor for brief generation
- Scheduler for automated collection
- Comprehensive API endpoints
- CLI interface for manual control

Documentation:
- Complete README with setup instructions
- Quick start guide
- Example usage scripts
- Docker Compose configuration
- Environment configuration templates

2025-11-21 02:06:23 +00:00

4.7 KiB

Raw Blame History

Quick Start Guide

Get the Nairobi Information Collector up and running in minutes!

Prerequisites

Python 3.9+ or Docker
PostgreSQL (optional, SQLite works for development)
API keys for various services (optional but recommended)

Installation

Option 1: Using Docker (Recommended)

# Clone the repository
git clone <repository-url>
cd nairobi-info-collector

# Copy environment file
cp .env.example .env

# Edit .env with your API keys
nano .env

# Start with Docker Compose
docker-compose up -d

# Check logs
docker-compose logs -f app

The API will be available at http://localhost:8000

Option 2: Local Installation

# Clone the repository
git clone <repository-url>
cd nairobi-info-collector

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download NLP model
python -m spacy download en_core_web_sm

# Copy and configure environment
cp .env.example .env
nano .env

# Initialize database
python cli.py init-db

# Run the application
python -m app.main

Configuration

Required API Keys

Edit .env and add your API keys:

# Social Media (optional but recommended)
TWITTER_BEARER_TOKEN=your_twitter_bearer_token
GOOGLE_MAPS_API_KEY=your_google_maps_key

# NLP Processing (optional)
OPENAI_API_KEY=your_openai_key

# Database (for production)
DATABASE_URL=postgresql://user:password@localhost:5432/nairobi_info

Free Tier Options

You can start without API keys:

News collection works without keys (web scraping)
Government data works without keys
Social media requires API keys

Usage

Web API

Access the API documentation:
- Open http://localhost:8000/docs in your browser
- Interactive Swagger UI with all endpoints

Get the latest brief:

curl http://localhost:8000/api/v1/brief/latest

Search for information:

curl "http://localhost:8000/api/v1/search?q=restaurant&category=food"

Get trending topics:

curl http://localhost:8000/api/v1/trending

Command Line Interface

# Collect news
python cli.py collect news

# Collect from all sources
python cli.py collect all

# Generate a brief
python cli.py brief --hours 24 --output brief.md

# Collect social media (requires API keys)
python cli.py collect social --platform twitter

Testing

Manual Collection Test

# Test news collection
python cli.py collect news

# Check the database
python -c "from app.database import SessionLocal; from app.models.data_models import InformationItem; db = SessionLocal(); print(f'Items collected: {db.query(InformationItem).count()}')"

Generate a Brief

# Generate and save brief
python cli.py brief --output my_brief.md

# View the brief
cat my_brief.md

Accessing the Data

Via API

import requests

# Get latest brief
response = requests.get("http://localhost:8000/api/v1/brief/latest")
brief = response.json()

# Search
response = requests.get(
    "http://localhost:8000/api/v1/search",
    params={"q": "nairobi", "limit": 10}
)
results = response.json()

Via Database

from app.database import SessionLocal
from app.models.data_models import InformationItem

db = SessionLocal()
items = db.query(InformationItem).limit(10).all()

for item in items:
    print(f"{item.title} - {item.category}")

Automation

The application automatically:

Collects data every 5 minutes (configurable)
Generates briefs every 6 hours
Updates trending topics in real-time

To change collection frequency:

# In .env
COLLECTION_INTERVAL_SECONDS=300  # 5 minutes

Troubleshooting

Database connection errors

# Check PostgreSQL is running
docker-compose ps

# Reset database
docker-compose down -v
docker-compose up -d

No data being collected

Check logs: docker-compose logs -f app
Verify network connectivity
Check API keys in .env
Try manual collection: python cli.py collect news

Import errors

# Reinstall dependencies
pip install -r requirements.txt --force-reinstall

Next Steps

Add API Keys: Configure Twitter, Google Maps, etc. for more data sources
Customize Sources: Edit app/config.py to add/remove sources
Set Up Monitoring: Configure Sentry for error tracking
Deploy to Production: Use Docker Compose with proper environment variables

API Documentation

Full API documentation available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Support

For issues and questions:

Check logs: tail -f logs/app.log
View API health: http://localhost:8000/api/v1/health
See stats: http://localhost:8000/api/v1/stats

4.7 KiB Raw Blame History