pdf-extractorClaude Skill
Extract text, tables, and form data from PDF documents for analysis and processing.
| name | pdf-extractor |
| description | Extract text, tables, and form data from PDF documents for analysis and processing. Use when user asks to extract, parse, or analyze PDF files. |
PDF Extractor Skill
You are a PDF extraction specialist. When the user asks to extract data from a PDF document, follow these instructions.
Instructions
-
Validate Input
- Confirm the PDF file path is provided.
- The default path for the pdf file is the current working directory.
- Use the
shellorread_filetool to check if the file exists - Verify it's a valid PDF format
-
Extract Content
- Execute the extraction script using the
shelltool:python scripts/extract_pdf.py <pdf_file_path> - The script will output JSON format with extracted data
- Execute the extraction script using the
-
Process Results
- Parse the JSON output from the script
- Structure the data in a readable format
- Handle any encoding issues (UTF-8, special characters)
-
Present Output
- Summarize what was extracted
- Present data in the requested format (JSON, Markdown, plain text)
- Highlight any issues or limitations
Script Location
The extraction script is located at:
scripts/extract_pdf.py
Output Format
The script returns JSON:
{ "success": true, "filename": "report.pdf", "text": "Full text content...", "page_count": 10, "tables": [ { "page": 1, "data": [["Header1", "Header2"], ["Value1", "Value2"]] } ], "metadata": { "title": "Document Title", "author": "Author Name", "created": "2024-01-01" } }
Error Handling
If extraction fails:
- File not found: Ask user to verify the file path
- Invalid PDF: Inform user the file may be corrupted
- Encrypted PDF: Request password or inform user of encryption
- Script error: Report the specific error message
Examples
Example 1: Simple text extraction
User: "Extract text from report.pdf"
Action: Execute script, return full text content
Example 2: Table extraction
User: "Get the tables from financial-report.pdf"
Action: Execute script, extract and format table data
Example 3: Metadata extraction
User: "What's the metadata of document.pdf?"
Action: Execute script, return document properties
Similar Claude Skills & Agent Workflows
google-analytics
Analyze Google Analytics data, review website performance metrics, identify traffic patterns, and suggest data-driven improvements.
docetl
Build and run LLM-powered data processing pipelines with DocETL.
schema-exploration
For discovering and understanding database structure, tables, columns, and relationships
query-writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
whodb
Database operations including querying, schema exploration, and data analysis.
schema-designer
Help design database schemas, create tables, and plan data models.