arXiv Product Validator - Complete Build
A comprehensive prompt for building the arXiv Product Validator – a Python CLI tool that validates product ideas by searching and analyzing academic research.
Create a complete Python CLI tool called "arXiv Product Validator" that helps users validate product ideas by searching academic research on arXiv.
## Project Overview
Build a command-line tool that:
1. Takes a product idea description as input
2. Generates 3-6 relevant search queries from the description
3. Searches arXiv API for academic papers across multiple categories
4. Analyzes papers to extract insights and market signals
5. Generates professional markdown reports and JSON exports
## Project Structure
Create these Python modules:
- main.py: CLI entry point with argument parsing and interactive mode
- arxiv_client.py: arXiv API wrapper with 3-second rate limiting
- query_generator.py: Converts product ideas to search queries and selects categories
- analyzer.py: Extracts themes, evidence, challenges, and market signals from papers
- reporter.py: Generates markdown and JSON reports
- requirements.txt: Python dependencies (requests>=2.31.0, feedparser>=6.0.10, python-dateutil>=2.8.2)
## CLI Features (main.py)
- Accept product idea as positional argument or prompt interactively
- Command-line options:
- `-n, --num-papers`: Number of papers to analyze (default 15, max 30)
- `-f, --format`: Output format (markdown, json, both; default: markdown)
- `-o, --output-dir`: Custom output directory (default: validation_reports)
- `-v, --verbose`: Show detailed progress
- `--no-save`: Don't save reports to files
- `-i, --interactive`: Force interactive mode
- Interactive prompts for preferences if needed
- Progress indicators with emojis (🔍, ✅, 📊, etc.)
- Save reports with timestamps: validation_YYYYMMDD_HHMMSS.{md,json}
## arXiv Client (arxiv_client.py)
- ArxivClient class for querying arXiv API
- Methods:
- search(queries: list, categories: list, max_results_per_query: int) -> list of papers
- Each paper should have: title, authors, abstract, published_date, arxiv_id, categories, pdf_url
- Features:
- Query the arXiv REST API with Atom feeds
- Respect 3-second rate limiting between requests
- Handle API errors gracefully
- Parse XML responses using feedparser
- Deduplicate results across multiple queries
- Support up to 20 papers per query
## Query Generator (query_generator.py)
- QueryGenerator class to convert product ideas to search queries
- generate_queries(idea: str) -> list of 3-6 search queries
- Extract domain keywords, problem statements, and solution approaches
- Support 20+ domain mappings (AI, e-commerce, sustainability, healthcare, etc.)
- QueryCategorizer class to select relevant arXiv categories
- auto_categorize(idea: str) -> list of category codes
- Support categories: cs.AI, cs.LG, cs.CY, econ.GN, stat.ML, q-bio.QM, math.OC, etc.
- Return 5-10 relevant categories based on keywords
## Paper Analyzer (analyzer.py)
- PaperAnalyzer class with analyze_papers(papers: list) -> analysis_dict
- Extract and return:
- key_themes: Top 8 themes with frequency counts
- supporting_evidence: Papers with solution keywords, ranked by relevance
- challenges: Papers discussing limitations and risks
- solution_approaches: Common technical methodologies
- publication_timeline: Yearly distribution of papers
- market_signals: Dict with research_volume, maturity, trend, competing_solutions, commercialization
- Research maturity levels: "Emerging", "Developing", "Mature"
- Trend detection: "Growing", "Stable", "Declining"
- Research volume levels: "Very Active" (20+), "Active" (10-20), "Emerging" (5-10), "Early Stage" (<5)
## Report Generator (reporter.py)
- MarkdownReporter class:
- generate_report() -> formatted markdown string
- Sections: Executive Summary, Research Overview, Supporting Evidence, Challenges & Risks, Solution Approaches, Market Signals, Top Papers, Recommendations
- Include emoji formatting, links to arXiv papers, PDF download links
- Professional styling with clear hierarchy
- JSONExporter class:
- export_json() -> dict with structured data
- Include all analysis, papers, metadata, and generated_at timestamp
## Key Features
- Rate limiting: 3 seconds between API requests
- Multiple search queries per product idea for comprehensive coverage
- Automatic category selection based on keywords
- Relevance scoring and ranking
- Publication timeline analysis
- Market signal interpretation with visual indicators
- Professional report formatting
- Verbose mode for debugging
- Error handling for network issues and API timeouts
- Support for both interactive and command-line modes
## Documentation
- README.md: Complete usage guide with examples
- QUICKSTART.md: 2-minute getting started guide
- FEATURES.md: Comprehensive feature checklist
- ARCHITECTURE.md: Technical design documentation
- Setup shell script for automated installation
## Example Usage
```bash
python main.py
python main.py "AI-powered meal planning app"
python main.py "Mobile app to reduce food waste" -n 25 -f both -v
python main.py "Sustainable fashion marketplace" --no-save -i
```
Expected Output
- Interactive CLI with clear prompts
- Real-time progress indicators
- Detailed markdown reports with tables, lists, and links
- JSON exports for programmatic access
- Summary display of key metrics
- Professional formatting with emojis and sections
Ensure the code is production-ready, well-documented, includes comprehensive error handling, and follows Python best practices.