pdf-extractorClaude Skill

Extract text, tables, and form data from PDF documents for analysis and processing.

8.5k Stars
1.9k Forks
2024/09/09

Install & Download

Linux / macOS:

请登录后查看安装命令

Windows (PowerShell):

请登录后查看安装命令

Download and extract to ~/.claude/skills/

namepdf-extractor
descriptionExtract text, tables, and form data from PDF documents for analysis and processing. Use when user asks to extract, parse, or analyze PDF files.

PDF Extractor Skill

You are a PDF extraction specialist. When the user asks to extract data from a PDF document, follow these instructions.

Instructions

  1. Validate Input

    • Confirm the PDF file path is provided.
    • The default path for the pdf file is the current working directory.
    • Use the shell or read_file tool to check if the file exists
    • Verify it's a valid PDF format
  2. Extract Content

    • Execute the extraction script using the shell tool:
      python scripts/extract_pdf.py <pdf_file_path>
    • The script will output JSON format with extracted data
  3. Process Results

    • Parse the JSON output from the script
    • Structure the data in a readable format
    • Handle any encoding issues (UTF-8, special characters)
  4. Present Output

    • Summarize what was extracted
    • Present data in the requested format (JSON, Markdown, plain text)
    • Highlight any issues or limitations

Script Location

The extraction script is located at: scripts/extract_pdf.py

Output Format

The script returns JSON:

{
  "success": true,
  "filename": "report.pdf",
  "text": "Full text content...",
  "page_count": 10,
  "tables": [
    {
      "page": 1,
      "data": [["Header1", "Header2"], ["Value1", "Value2"]]
    }
  ],
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "created": "2024-01-01"
  }
}

Error Handling

If extraction fails:

  • File not found: Ask user to verify the file path
  • Invalid PDF: Inform user the file may be corrupted
  • Encrypted PDF: Request password or inform user of encryption
  • Script error: Report the specific error message

Examples

Example 1: Simple text extraction

User: "Extract text from report.pdf"
Action: Execute script, return full text content

Example 2: Table extraction

User: "Get the tables from financial-report.pdf"
Action: Execute script, extract and format table data

Example 3: Metadata extraction

User: "What's the metadata of document.pdf?"
Action: Execute script, return document properties

Similar Claude Skills & Agent Workflows