mirror of
https://git.datalinker.icu/deepseek-ai/DeepSeek-V3.git
synced 2025-12-09 12:54:33 +08:00
Comprehensive intelligence retrieval system for collecting and aggregating information about Nairobi, Kenya from multiple sources. Features: - Multi-source data collection (news, social media, government, tourism, business) - RESTful API with FastAPI - Automated scheduling for continuous data collection - Intelligence brief generation - Real-time trending topics tracking - Alert system for important updates - Web scraping with rate limiting and caching - Social media integration (Twitter, Instagram) - NLP-powered categorization and processing - Docker support for easy deployment - CLI for manual operations Components: - Data models with SQLAlchemy - Base collector class with extensible architecture - Source-specific collectors (news, social, government, tourism, business) - Data processor for brief generation - Scheduler for automated collection - Comprehensive API endpoints - CLI interface for manual control Documentation: - Complete README with setup instructions - Quick start guide - Example usage scripts - Docker Compose configuration - Environment configuration templates
Nairobi Information Collector
An advanced intelligence retrieval system designed to collect, verify, and synthesize comprehensive information about Nairobi, Kenya from multiple reliable digital sources.
Features
- Multi-Source Data Collection: Gathers information from news sites, social media, government portals, tourism platforms, and business sources
- Real-Time Updates: Continuously collects and updates information
- Structured Data: Organizes information into categories (News, Events, Culture, Economy, etc.)
- RESTful API: Easy-to-use API endpoints for accessing collected data
- Automated Scheduling: Runs collectors at scheduled intervals
- Data Verification: Tracks sources and reliability levels
- Categorization: Automatically categorizes information by type
Architecture
nairobi-info-collector/
├── app/
│ ├── main.py # FastAPI application entry point
│ ├── config.py # Configuration management
│ ├── models/ # Data models
│ ├── collectors/ # Source-specific data collectors
│ ├── processors/ # Data processing and NLP
│ ├── api/ # API endpoints
│ ├── database/ # Database connection and setup
│ └── scheduler/ # Task scheduling
├── requirements.txt # Python dependencies
├── .env # Environment variables
└── docker-compose.yml # Docker setup
Installation
Prerequisites
- Python 3.9+
- PostgreSQL (or SQLite for development)
- Redis (for caching and task queue)
Setup
- Clone the repository:
git clone <repository-url>
cd nairobi-info-collector
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Configure environment variables:
cp .env.example .env
# Edit .env with your configuration
- Initialize the database:
python -m app.database.db init
- Run the application:
uvicorn app.main:app --reload
Using Docker
docker-compose up -d
API Endpoints
Get Latest Brief
GET /api/v1/brief/latest
Returns the most recent intelligence brief.
Get Information by Category
GET /api/v1/info/{category}
Categories: news, events, culture, economy, food, social, travel, places, community
Search Information
GET /api/v1/search?q={query}&category={category}&from={date}&to={date}
Get Trending Topics
GET /api/v1/trending
Get Real-Time Alerts
GET /api/v1/alerts
Data Sources
News & Media
- Nation Africa
- Standard Media
- Citizen Digital
- BBC Africa
- Business Daily Africa
Government & Public
- Nairobi City County
- Kenya Open Data Portal
- NTSA, KCAA, KNBS
Tourism
- TripAdvisor
- Google Maps
- Airbnb Experiences
Social Media
- Twitter/X (via API)
- Instagram (via unofficial APIs)
- TikTok trending
- YouTube
Business
- TechCabal
- StartUp Kenya
- LinkedIn insights
Configuration
Edit .env file to configure:
# Database
DATABASE_URL=postgresql://user:password@localhost:5432/nairobi_info
# API Keys
TWITTER_API_KEY=your_key
GOOGLE_MAPS_API_KEY=your_key
OPENAI_API_KEY=your_key # For NLP processing
# Collection Settings
COLLECTION_INTERVAL=300 # seconds
MAX_ITEMS_PER_SOURCE=100
# Cache
REDIS_URL=redis://localhost:6379
Usage Examples
Python Client
import requests
# Get latest brief
response = requests.get("http://localhost:8000/api/v1/brief/latest")
brief = response.json()
# Search for specific information
response = requests.get(
"http://localhost:8000/api/v1/search",
params={"q": "restaurant opening", "category": "food"}
)
results = response.json()
CLI
# Trigger manual collection
python -m app.collectors.run --source news
# Generate brief
python -m app.processors.generate_brief
Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
Ethical Considerations
- Respects robots.txt
- Implements rate limiting
- Uses official APIs where available
- Caches responses to minimize requests
- Only collects publicly available information
License
MIT License
Support
For issues and questions, please open a GitHub issue.