mirror of
https://git.datalinker.icu/deepseek-ai/DeepSeek-V3.git
synced 2026-03-16 11:37:16 +08:00
Comprehensive intelligence retrieval system for collecting and aggregating information about Nairobi, Kenya from multiple sources. Features: - Multi-source data collection (news, social media, government, tourism, business) - RESTful API with FastAPI - Automated scheduling for continuous data collection - Intelligence brief generation - Real-time trending topics tracking - Alert system for important updates - Web scraping with rate limiting and caching - Social media integration (Twitter, Instagram) - NLP-powered categorization and processing - Docker support for easy deployment - CLI for manual operations Components: - Data models with SQLAlchemy - Base collector class with extensible architecture - Source-specific collectors (news, social, government, tourism, business) - Data processor for brief generation - Scheduler for automated collection - Comprehensive API endpoints - CLI interface for manual control Documentation: - Complete README with setup instructions - Quick start guide - Example usage scripts - Docker Compose configuration - Environment configuration templates
4.5 KiB
4.5 KiB
Nairobi Information Collector
An advanced intelligence retrieval system designed to collect, verify, and synthesize comprehensive information about Nairobi, Kenya from multiple reliable digital sources.
Features
- Multi-Source Data Collection: Gathers information from news sites, social media, government portals, tourism platforms, and business sources
- Real-Time Updates: Continuously collects and updates information
- Structured Data: Organizes information into categories (News, Events, Culture, Economy, etc.)
- RESTful API: Easy-to-use API endpoints for accessing collected data
- Automated Scheduling: Runs collectors at scheduled intervals
- Data Verification: Tracks sources and reliability levels
- Categorization: Automatically categorizes information by type
Architecture
nairobi-info-collector/
├── app/
│ ├── main.py # FastAPI application entry point
│ ├── config.py # Configuration management
│ ├── models/ # Data models
│ ├── collectors/ # Source-specific data collectors
│ ├── processors/ # Data processing and NLP
│ ├── api/ # API endpoints
│ ├── database/ # Database connection and setup
│ └── scheduler/ # Task scheduling
├── requirements.txt # Python dependencies
├── .env # Environment variables
└── docker-compose.yml # Docker setup
Installation
Prerequisites
- Python 3.9+
- PostgreSQL (or SQLite for development)
- Redis (for caching and task queue)
Setup
- Clone the repository:
git clone <repository-url>
cd nairobi-info-collector
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Configure environment variables:
cp .env.example .env
# Edit .env with your configuration
- Initialize the database:
python -m app.database.db init
- Run the application:
uvicorn app.main:app --reload
Using Docker
docker-compose up -d
API Endpoints
Get Latest Brief
GET /api/v1/brief/latest
Returns the most recent intelligence brief.
Get Information by Category
GET /api/v1/info/{category}
Categories: news, events, culture, economy, food, social, travel, places, community
Search Information
GET /api/v1/search?q={query}&category={category}&from={date}&to={date}
Get Trending Topics
GET /api/v1/trending
Get Real-Time Alerts
GET /api/v1/alerts
Data Sources
News & Media
- Nation Africa
- Standard Media
- Citizen Digital
- BBC Africa
- Business Daily Africa
Government & Public
- Nairobi City County
- Kenya Open Data Portal
- NTSA, KCAA, KNBS
Tourism
- TripAdvisor
- Google Maps
- Airbnb Experiences
Social Media
- Twitter/X (via API)
- Instagram (via unofficial APIs)
- TikTok trending
- YouTube
Business
- TechCabal
- StartUp Kenya
- LinkedIn insights
Configuration
Edit .env file to configure:
# Database
DATABASE_URL=postgresql://user:password@localhost:5432/nairobi_info
# API Keys
TWITTER_API_KEY=your_key
GOOGLE_MAPS_API_KEY=your_key
OPENAI_API_KEY=your_key # For NLP processing
# Collection Settings
COLLECTION_INTERVAL=300 # seconds
MAX_ITEMS_PER_SOURCE=100
# Cache
REDIS_URL=redis://localhost:6379
Usage Examples
Python Client
import requests
# Get latest brief
response = requests.get("http://localhost:8000/api/v1/brief/latest")
brief = response.json()
# Search for specific information
response = requests.get(
"http://localhost:8000/api/v1/search",
params={"q": "restaurant opening", "category": "food"}
)
results = response.json()
CLI
# Trigger manual collection
python -m app.collectors.run --source news
# Generate brief
python -m app.processors.generate_brief
Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
Ethical Considerations
- Respects robots.txt
- Implements rate limiting
- Uses official APIs where available
- Caches responses to minimize requests
- Only collects publicly available information
License
MIT License
Support
For issues and questions, please open a GitHub issue.