Pangolinfo API Amazon Data Scraper Guide: From Integration to Enterprise Deployment - Pangolinfo

What Is Pangolinfo API and What Problems Does It Solve?

Pangolinfo API is an enterprise-grade data Scraper interface designed for businesses and developers who need to acquire Amazon platform data at scale with high timeliness. Through standardized RESTful endpoints, users can on-demand retrieve product details, keyword search results, sales rankings, user reviews, sponsored ad placements, category nodes, and more — covering 15 major Amazon marketplaces worldwide.

Compared to traditional self-built scraping solutions, Pangolinfo API encapsulates distributed crawling, intelligent parsing, anti-bot countermeasures, and proxy scheduling on the server side. Developers only need to focus on business logic and data applications. According to platform statistics, enterprises using Pangolinfo API have reduced engineering investment related to data Scraper by an average of 78%, while improving data acquisition stability from 82% to over 99.5%.

This article is a comprehensive developer guide covering authentication, synchronous/asynchronous calling methods, various data Scraper scenarios, response structure analysis, enterprise system integration, and AI Agent applications — with runnable code examples and production-tested best practices.

Pangolinfo API Ecosystem Overview

Before diving into technical details, let’s establish a holistic understanding of Pangolinfo’s data product ecosystem. The core products for Amazon data Scraper include four APIs and one Agent skill:

Core API Product Matrix

API Product	Core Capability	Typical Use Case	Data Freshness
Scrape API	General ecommerce page scraping: product details, search, rankings	Product research, competitor monitoring, price tracking	Minute-level
Reviews Scraper API	Professional review Scraper: all reviews, image/video reviews	User insights, negative review analysis, product improvement	Hour-level
AI Overview SERP API	Google AI Overview search result Scraper	SEO strategy, competitor search performance	Real-time
AMZ Data Tracker	Visual data monitoring and tracking dashboard	Operations monitoring, trend analysis, alerts	Scheduled sync

Data Coverage

Pangolinfo API currently supports 15 Amazon marketplaces: US, Canada, Mexico, UK, Germany, France, Italy, Spain, Netherlands, Sweden, Poland, Japan, Australia, UAE, and Singapore. Data types include:

Product Data: Title, brand, price (current/historical), stock status, shipping, seller info, variants, images, A+ content, bullet points, technical specs.

Ranking Data: Best Sellers, New Releases, Movers & Shakers, Most Wished For, Gift Ideas with real-time rankings and BSR category paths.

Review Data: Title, content, star rating, reviewer info, Vine badge, image/video reviews, helpful votes, review date.

Advertising Data: SP (Sponsored Products) ad placements, ad copy, position (top/middle/bottom/detail page).

Search Data: Keyword search results, ad distribution, organic vs paid ranking, search suggestions.

Pangolinfo API Amazon Data Scraper Guide

Output Format Options

All APIs support three output formats via the output_format parameter: json (default, structured and parsed), html (raw page HTML), markdown (for content processing and LLM input).

Getting Started: Authentication, Quotas, and Configuration

Step 1: Obtain Your API Key

Visit the Pangolinfo Console to register and create a project. In the “API Keys” section, click “Generate New Key” to get your exclusive API Key. We recommend creating separate Keys for different environments (dev/staging/production) for permission management and usage tracking.

API Keys follow the format pgo_xxxxxxxxxxxxxxxx and must be passed in each request header as Authorization: Bearer YOUR_API_KEY.

Step 2: Understand Credits and Billing

Pangolinfo API uses usage-based billing with “Page Credits” as the unit. Each successfully scraped page consumes 1 Credit. Rates vary by data type:

Data Type	Credits per Request	Notes
Product detail page	1	Single ASIN product details
Search listing page	1	Typically 20-24 products per page
Ranking page	1	50 products per page
Review page	1	10 reviews per page
Sponsored ads	2	Higher due to ad placement parsing

New registrations receive free Credits for testing. For production, choose a plan based on your volume and enable usage alerts.

Step 3: Install the SDK (Optional)

pip install pangolinfo-api

The SDK encapsulates authentication, retries, error handling, and pagination. Recommended for production use.

Synchronous Integration: REST API Real-Time Calls

Synchronous integration is the most direct way to use Pangolinfo API. The client sends an HTTP request, the server executes Scraper in real-time, and returns results immediately. Suitable for latency-sensitive scenarios with relatively small data volumes.

Basic Request Structure

POST /v1/amazon/scrape HTTP/1.1
Host: api.pangolinfo.com
Authorization: Bearer pgo_your_api_key_here
Content-Type: application/json

Scenario 1: Product Detail Scraper (Sync)

import requests
import json

API_KEY = "pgo_your_api_key_here"
BASE_URL = "https://api.pangolinfo.com/v1"

def get_product_details(site: str, asins: list):
    url = f"{BASE_URL}/amazon/product"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {
        "site": site,
        "asins": asins,
        "output_format": "json",
        "include_variants": True,
        "include_bsr": True,
        "include_seller_info": True
    }
    response = requests.post(url, headers=headers, json=payload, timeout=30)
    response.raise_for_status()
    return response.json()

result = get_product_details("amazon.com", ["B08N5WRWNW", "B08N5M7S6K", "B09V3KXJPB"])
for product in result.get("products", []):
    print(f"ASIN: {product['asin']} | Price: {product['price']['current']} | Rating: {product['rating']['average']}")

Response Structure (Product Details)

{
  "status": "success",
  "request_id": "req_abc123def456",
  "products": [{
    "asin": "B08N5WRWNW",
    "title": "Apple AirPods (3rd Generation)",
    "brand": "Apple",
    "price": {"current": 149.99, "currency": "USD", "list_price": 179.00},
    "rating": {"average": 4.5, "count": 84532},
    "bsr": {"rank": 42, "category": "Electronics"},
    "images": {"main": "https://m.media-amazon.com/images/..."},
    "features": ["Spatial audio", "Adaptive EQ"],
    "variants": [{"asin": "B08N5WRWNW", "attribute": "Color", "value": "White"}],
    "seller": {"name": "Amazon.com", "is_amazon": true, "fulfillment": "FBA"},
    "availability": {"status": "In Stock"},
    "collected_at": "2026-05-25T08:30:00Z"
  }],
  "credits_used": 3,
  "credits_remaining": 997
}

Scenario 2: Keyword Search Data Scraper

def search_keywords(site: str, keyword: str, pages: int = 1):
    url = f"{BASE_URL}/amazon/search"
    headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
    all_results = []
    for page in range(1, pages + 1):
        payload = {
            "site": site, "keyword": keyword, "page": page,
            "sort_by": "relevance", "output_format": "json",
            "include_sponsored": True, "include_bsr": True
        }
        response = requests.post(url, headers=headers, json=payload, timeout=45)
        data = response.json()
        all_results.extend(data.get("products", []))
    return all_results

products = search_keywords("amazon.com", "wireless earbuds", pages=2)
print(f"Total products: {len(products)}")

Scenario 3: Best Sellers Ranking Scraper

def get_best_sellers(site: str, node_id: str, pages: int = 1):
    url = f"{BASE_URL}/amazon/bestsellers"
    headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
    payload = {"site": site, "node_id": node_id, "pages": pages, "output_format": "json"}
    response = requests.post(url, headers=headers, json=payload, timeout=30)
    return response.json()

result = get_best_sellers("amazon.com", "172282", pages=2)
for item in result.get("products", []):
    print(f"#{item['rank']} {item['title'][:50]}... | ${item['price']['current']}")

Scenario 4: Review Scraper

def get_reviews(site: str, asin: str, pages: int = 3):
    url = f"{BASE_URL}/amazon/reviews"
    headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
    payload = {"site": site, "asin": asin, "pages": pages, "sort_by": "recent", "include_images": True}
    response = requests.post(url, headers=headers, json=payload, timeout=30)
    return response.json()

reviews = get_reviews("amazon.com", "B08N5WRWNW", pages=2)
for review in reviews.get("reviews", [])[:5]:
    print(f"⭐ {review['rating']}/5 | {review['title']} | by {review['author']}")

Sync API Performance Characteristics

Average response latency: 3-15 seconds depending on target site speed and page complexity. Recommended timeout: 30-45 seconds. Maximum per request: 50 ASINs or 5 search pages. For larger batches, use async interface.

Asynchronous Integration: Best Practices for Large-Scale Scraper

When your business needs to collect thousands of products, hundreds of keywords, or continuously monitor multiple rankings, synchronous serial waiting becomes a bottleneck. Pangolinfo API’s async task mechanism allows submitting large batches at once, processed by distributed clusters, with results delivered via Webhook or polling.

Async Workflow

Phase 1: Task Submission. Client submits to task endpoint, server immediately returns task ID without waiting.

Phase 2: Distributed Processing. Tasks distributed across global Scraper nodes with automatic retries, proxy rotation, anti-bot handling.

Phase 3: Result Delivery. Upon completion, server pushes results via configured Webhook URL, or client pulls via status query.

Submit Async Tasks

def submit_async_task(task_type: str, tasks: list):
    url = f"{BASE_URL}/amazon/tasks"
    headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
    payload = {
        "task_type": task_type,
        "tasks": tasks,
        "webhook_url": "https://your-domain.com/webhook/pangolinfo",
        "webhook_secret": "your_webhook_secret",
        "priority": "normal",
        "output_format": "json"
    }
    response = requests.post(url, headers=headers, json=payload, timeout=30)
    return response.json()

asins = ["B08N5WRWNW", "B08N5M7S6K", ...]  # 100 ASINs
tasks = [{"site": "amazon.com", "asin": asin} for asin in asins]
result = submit_async_task("product_details", tasks)
print(f"Batch ID: {result['batch_id']} | Tasks: {result['task_count']}")

Webhook Receiver (Flask Example)

from flask import Flask, request, jsonify
import hmac, hashlib

app = Flask(__name__)
WEBHOOK_SECRET = "your_webhook_secret"

@app.route('/webhook/pangolinfo', methods=['POST'])
def handle_webhook():
    signature = request.headers.get('X-Pangolinfo-Signature')
    expected = hmac.new(WEBHOOK_SECRET.encode(), request.data, hashlib.sha256).hexdigest()
    if not hmac.compare_digest(signature, expected):
        return jsonify({"error": "Invalid signature"}), 401
    
    data = request.json
    if data['event'] == 'task.completed':
        save_to_database(data['task_id'], data['result'])
    elif data['event'] == 'task.failed':
        log_failure(data['task_id'], data['error'])
    return jsonify({"status": "ok"}), 200

Active Polling

def check_task_status(batch_id: str):
    url = f"{BASE_URL}/amazon/tasks/{batch_id}"
    headers = {"Authorization": f"Bearer {API_KEY}"}
    response = requests.get(url, headers=headers, timeout=30)
    return response.json()

batch_id = "batch_abc123"
while True:
    status = check_task_status(batch_id)
    print(f"Progress: {status['completed']}/{status['total']}")
    if status['status'] in ['completed', 'failed']:
        break
    time.sleep(10)

Sync vs Async Comparison

Dimension	Sync	Async
Latency	3-15s	Immediate task ID
Max batch size	50 ASINs / 5 pages	10,000 tasks/batch
Result delivery	HTTP response	Webhook / Polling
Use case	Real-time, small batch	Large-scale, scheduled
Retry	Client handles	Server auto-retries x3
Concurrency	100 req/min	No hard limit

Enterprise Integration: Connecting Pangolinfo API to Your Data Systems and ERP

For mid-to-large enterprises, API Scraper is just the first step. Seamlessly integrating collected data into existing data warehouses, ERPs, BI systems, or custom platforms determines whether data value can be fully realized.

Integration Pattern 1: Direct Database Write

import psycopg2
from datetime import datetime

def save_products_to_db(products: list, db_config: dict):
    conn = psycopg2.connect(**db_config)
    cursor = conn.cursor()
    insert_sql = """
        INSERT INTO amazon_products (asin, site, title, brand, current_price, currency,
            rating_avg, rating_count, bsr_rank, collected_at)
        VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
        ON CONFLICT (asin, site) DO UPDATE SET
            title = EXCLUDED.title, current_price = EXCLUDED.current_price,
            rating_avg = EXCLUDED.rating_avg, collected_at = EXCLUDED.collected_at
    """
    for product in products:
        cursor.execute(insert_sql, (
            product['asin'], product['site'], product['title'],
            product.get('brand', ''), product.get('price', {}).get('current'),
            product.get('price', {}).get('currency', 'USD'),
            product.get('rating', {}).get('average'),
            product.get('rating', {}).get('count'),
            product.get('bsr', {}).get('rank'), datetime.now()
        ))
    conn.commit()
    cursor.close()
    conn.close()

Integration Pattern 2: Message Queue Middleware

import redis
redis_client = redis.Redis(host='localhost', port=6379, db=0)

def publish_to_queue(channel: str, data: dict):
    redis_client.xadd(channel, {"data": json.dumps(data)})

@app.route('/webhook/pangolinfo', methods=['POST'])
def handle_webhook():
    data = request.json
    if data['event'] == 'task.completed':
        publish_to_queue("amazon:data:products", data['result'])
    return jsonify({"status": "ok"}), 200

Integration Pattern 3: Data Warehouse ETL Pipeline

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def extract_amazon_data(**context):
    batch_id = submit_async_task("product_details", task_list)
    return wait_for_completion(batch_id)

def transform_data(**context):
    raw_data = context['ti'].xcom_pull(task_ids='extract')
    return clean_data(raw_data)

def load_to_warehouse(**context):
    data = context['ti'].xcom_pull(task_ids='transform')
    bulk_insert(data)

with DAG('amazon_data_pipeline', start_date=datetime(2026, 1, 1),
     schedule_interval='0 6 * * *', catchup=False) as dag:
    extract = PythonOperator(task_id='extract', python_callable=extract_amazon_data)
    transform = PythonOperator(task_id='transform', python_callable=transform_data)
    load = PythonOperator(task_id='load', python_callable=load_to_warehouse)
    extract >> transform >> load

ERP Integration Example

class AmazonERPConnector:
    def __init__(self, api_key: str, erp_db):
        self.api_key = api_key
        self.erp = erp_db
        self.base_url = "https://api.pangolinfo.com/v1"
    
    def sync_product_catalog(self, site: str, category_node: str):
        products = self._fetch_category_products(site, category_node)
        erp_products = [{
            "sku": f"AMZ-{p['asin']}", "name": p['title'],
            "category": p['bsr']['category'], "sale_price": p['price']['current'],
            "supplier": p['brand'], "source_platform": "amazon"
        } for p in products]
        self.erp.bulk_upsert_products(erp_products)
        return len(erp_products)
    
    def update_competitive_pricing(self, site: str, sku_list: list):
        asins = [sku.replace("AMZ-", "") for sku in sku_list]
        products = self._fetch_products(site, asins)
        for product in products:
            self.erp.update_price_monitor(
                sku=f"AMZ-{product['asin']}",
                competitor_price=product['price']['current']
            )

AI Agent Integration: Letting LLMs Directly Invoke Amazon Data Capabilities

Connecting large language models with external data tools is reshaping analytical workflows. Pangolinfo Amazon Scraper Skill, based on MCP (Model Context Protocol), enables AI Agents to directly invoke Amazon data Scraper capabilities without writing code.

MCP Protocol Overview

MCP is an open standard defining communication between AI Agents and external tools. Through MCP, Agents understand tool inputs, invocation methods, and return formats for autonomous decision-making. Pangolinfo is among the first ecommerce data providers supporting MCP.

Configuring Pangolinfo Amazon Scraper Skill

{
  "mcpServers": {
    "pangolinfo-amazon": {
      "command": "npx",
      "args": ["-y", "@pangolinfo/amazon-scraper-mcp@latest"],
      "env": {"PANGOLINFO_API_KEY": "pgo_your_api_key_here"}
    }
  }
}

Agent Use Cases

Case 1: Market Research
User: “Analyze wireless earbuds competition on Amazon US. I need: 1) Brand count in top 3 pages; 2) Price distribution; 3) Products with 1000+ reviews.”
Agent invokes amazon_search, processes 60 products, generates structured report.

Case 2: Competitor Monitoring
User: “Monitor ASIN B08N5WRWNW. Alert me if price drops >10% or BSR falls below 100.”
Agent fetches data, compares with history, triggers alerts with recommended actions.

Case 3: Review Sentiment Analysis
User: “Get last 100 reviews for these 5 competitor ASINs. What do users like and dislike most?”
Agent collects 500 reviews, extracts themes: “Most liked: sound quality (87%), battery (72%); Most disliked: charging speed (23%).”

Technical Implementation for Custom Agents

from langchain.tools import BaseTool
from pydantic import BaseModel, Field

class AmazonSearchInput(BaseModel):
    keyword: str = Field(description="Amazon search keyword")
    site: str = Field(default="amazon.com")
    pages: int = Field(default=1, ge=1, le=5)

class AmazonSearchTool(BaseTool):
    name = "amazon_search"
    description = "Search Amazon products and return structured data"
    args_schema = AmazonSearchInput
    def _run(self, keyword: str, site: str = "amazon.com", pages: int = 1):
        response = requests.post("https://api.pangolinfo.com/v1/amazon/search",
            headers={"Authorization": f"Bearer {API_KEY}"},
            json={"site": site, "keyword": keyword, "pages": pages})
        return response.json()

from langchain.agents import initialize_agent, Tool
tools = [Tool(name="amazon_search", func=AmazonSearchTool()._run)]
agent = initialize_agent(tools, llm, agent="zero-shot-react-description")
result = agent.run("Find top 10 wireless earbuds under $100 with 4.5+ rating")

Production Best Practices and Performance Optimization

1. Request Deduplication and Caching

import functools, redis
from datetime import timedelta
cache = redis.Redis()

def cached_api_call(ttl_seconds: int = 300):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            cache_key = f"pangolinfo:{func.__name__}:{hash(str(args)+str(kwargs))}"
            cached = cache.get(cache_key)
            if cached: return json.loads(cached)
            result = func(*args, **kwargs)
            cache.setex(cache_key, timedelta(seconds=ttl_seconds), json.dumps(result))
            return result
        return wrapper
    return decorator

2. Rate Limiting and Backoff

import time
from functools import wraps

def rate_limit(max_per_minute: int = 100):
    min_interval = 60.0 / max_per_minute
    last_call_time = {}
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            key = func.__name__
            now = time.time()
            if key in last_call_time:
                elapsed = now - last_call_time[key]
                if elapsed < min_interval: time.sleep(min_interval - elapsed)
            last_call_time[key] = time.time()
            return func(*args, **kwargs)
        return wrapper
    return decorator

3. Data Quality Validation

def validate_product_data(product: dict) -> tuple[bool, list]:
    errors = []
    if not product.get('asin'): errors.append("Missing ASIN")
    if not product.get('title') or len(product['title']) < 5: errors.append("Invalid title")
    price = product.get('price', {}).get('current')
    if price is None or price <= 0: errors.append("Invalid price")
    return len(errors) == 0, errors

4. Cost Control Strategies

Strategy	Approach	Savings
Incremental updates	Only changed fields, not full refresh	30-50%
Smart scheduling	Off-peak execution by timezone	10-20%
Local caching	Hot data cached 5-30 min	20-40%
Async priority	Large batches via async	15-25%
Data reuse	One Scraper, multiple consumers	25-35%

Conclusion: From Data Access to Business Value

This article has systematically covered Pangolinfo API’s complete usage for Amazon data Scraper — from synchronous REST API calls to asynchronous batch processing; from product details, keyword search, ranking monitoring, review Scraper to ad placement analysis; from direct database integration to message queue middleware and AI Agent intelligence applications. Each section provides production-tested code examples and best practices.

The essence of ecommerce data Scraper isn’t technical complexity, but providing timely, accurate, complete data for business decisions. Choosing the right technical approach, building reliable data pipelines, and letting teams focus on insight extraction and strategy formulation — that’s what data-driven operations truly mean.

Start building your Amazon data capabilities today: Visit Pangolinfo Scrape API for your API Key and full documentation, or explore Amazon Scraper Skill to give your AI Agent direct ecommerce data Scraper capabilities.Read the Amazon Scrape API documentation

Frequently Asked Questions

What Amazon data Scraper scenarios does Pangolinfo API support?

Pangolinfo API supports keyword search, Best Sellers, New Releases, product details, reviews, sponsored ads, browse nodes, and AI Overview SERP data Scraper across 15 major Amazon marketplaces globally.

What is the difference between sync and async integration?

Sync integration returns results directly via REST API, suitable for real-time single requests with 3-15s latency. Async integration submits tasks and receives results via Webhook callbacks, ideal for large-scale batch collection.

How do I integrate Pangolinfo API with my ERP system?

Three approaches: 1) Direct REST API calls writing to ERP database; 2) Middleware service for periodic synchronization; 3) Webhook push mechanism for real-time data delivery.

What data formats does Pangolinfo API return?

Default is structured JSON. Also supports raw HTML and Markdown output formats via the output_format parameter.

How can AI Agents use Pangolinfo Amazon Scraper Skill?

Pangolinfo Amazon Scraper Skill uses MCP protocol, enabling AI Agents to invoke data collection via natural language commands.