Skip to content

Claude Cowork Data Processing

Use Claude Cowork to extract information from documents and data files. Important Anthropic cowork application for data processing.

HTML File Data Extraction

If you've saved web pages as HTML, Claude Cowork can extract data:

I've saved 20 e-commerce product pages as HTML in "product_pages" folder.

Extract from each:
- Product name (usually in <h1> or class="product-title")
- Price (look for "$" or "price" text)
- Description (first product intro paragraph, max 200 chars)
- Stock status (look for "in stock", "out of stock")
- Original filename (for reference)

Generate Excel "product_info.xlsx" with all fields.
If any field not found, write "not found".

PDF Data Extraction

Claude Cowork excels at extracting structured info from PDFs:

Invoice Processing

"Invoices" folder contains 100+ PDF invoices.

Extract:
- Invoice number (usually starts with INV-)
- Date (convert to YYYY-MM-DD)
- Vendor name
- Buyer name
- Pre-tax amount
- Tax amount
- Total (verify: pre-tax + tax = total)

Special handling:
1. If total verification fails, note "amount mismatch" in remarks
2. If missing required fields, mark "incomplete" in status
3. Sort by amount descending

Output:
- "invoice_summary_[date].xlsx" - complete table
- "problem_invoices.txt" - list of problematic invoice filenames

After processing, tell me success count and anomaly count.

Excel/CSV Processing

Data Cleaning

"RawData" folder has 5 customer data Excel files.

**Step 1: Merge**
- Combine 5 files into one
- Add "source_file" column

**Step 2: Clean**
- Remove duplicate rows
- Standardize phone numbers: remove spaces/dashes, keep 10-11 digits
- Convert emails to lowercase
- Standardize dates to YYYY-MM-DD
- Trim whitespace from all columns

**Step 3: Validate**
- Check phone numbers are 10-11 digits, mark invalid in "data_quality" column
- Check emails contain @, mark invalid
- Check required fields (name, phone) not empty

**Output**:
- "customer_data_cleaned_[date].xlsx"
- "data_quality_report.txt" - duplicate and invalid counts

Tell me total records and issues found before cleaning.

Tips

Improve Accuracy

  1. Provide field location hints: Tell Claude Cowork where data typically appears
  2. Give format examples: Show expected output format
  3. Handle exceptions: Specify how to handle missing/invalid data

Performance

  1. Batch large files: Process in chunks of 50 files
  2. Set checkpoints: Generate interim reports every 50 files

Security

  • ⚠️ Ensure Claude Cowork only accesses necessary folders
  • ⚠️ Don't include real sensitive info in prompts
  • ⚠️ Review output for accidental data exposure

Note: Claude Cowork processes local files, not live websites. Save web pages as HTML or export to CSV/Excel first.

Related:

MIT Licensed