Pangolinfo API Amazon Data Scraper Guide: From Integration to Enterprise Deployment

Pangolinfo
05/25, 2026

What Is Pangolinfo API and What Problems Does It Solve?

Pangolinfo API is an enterprise-grade data Scraper interface designed for businesses and developers who need to acquire Amazon platform data at scale with high timeliness. Through standardized RESTful endpoints, users can on-demand retrieve product details, keyword search results, sales rankings, user reviews, sponsored ad placements, category nodes, and more — covering 15 major Amazon marketplaces worldwide.

Compared to traditional self-built scraping solutions, Pangolinfo API encapsulates distributed crawling, intelligent parsing, anti-bot countermeasures, and proxy scheduling on the server side. Developers only need to focus on business logic and data applications. According to platform statistics, enterprises using Pangolinfo API have reduced engineering investment related to data Scraper by an average of 78%, while improving data acquisition stability from 82% to over 99.5%.

This article is a comprehensive developer guide covering authentication, synchronous/asynchronous calling methods, various data Scraper scenarios, response structure analysis, enterprise system integration, and AI Agent applications — with runnable code examples and production-tested best practices.

Pangolinfo API Ecosystem Overview

Before diving into technical details, let’s establish a holistic understanding of Pangolinfo’s data product ecosystem. The core products for Amazon data Scraper include four APIs and one Agent skill:

Core API Product Matrix

API ProductCore CapabilityTypical Use CaseData Freshness
Scrape APIGeneral ecommerce page scraping: product details, search, rankingsProduct research, competitor monitoring, price trackingMinute-level
Reviews Scraper APIProfessional review Scraper: all reviews, image/video reviewsUser insights, negative review analysis, product improvementHour-level
AI Overview SERP APIGoogle AI Overview search result ScraperSEO strategy, competitor search performanceReal-time
AMZ Data TrackerVisual data monitoring and tracking dashboardOperations monitoring, trend analysis, alertsScheduled sync

Data Coverage

Pangolinfo API currently supports 15 Amazon marketplaces: US, Canada, Mexico, UK, Germany, France, Italy, Spain, Netherlands, Sweden, Poland, Japan, Australia, UAE, and Singapore. Data types include:

Product Data: Title, brand, price (current/historical), stock status, shipping, seller info, variants, images, A+ content, bullet points, technical specs.

Ranking Data: Best Sellers, New Releases, Movers & Shakers, Most Wished For, Gift Ideas with real-time rankings and BSR category paths.

Review Data: Title, content, star rating, reviewer info, Vine badge, image/video reviews, helpful votes, review date.

Advertising Data: SP (Sponsored Products) ad placements, ad copy, position (top/middle/bottom/detail page).

Search Data: Keyword search results, ad distribution, organic vs paid ranking, search suggestions.

Pangolinfo API Amazon Data Scraper Guide

Output Format Options

All APIs support three output formats via the output_format parameter: json (default, structured and parsed), html (raw page HTML), markdown (for content processing and LLM input).

Getting Started: Authentication, Quotas, and Configuration

Step 1: Obtain Your API Key

Visit the Pangolinfo Console to register and create a project. In the “API Keys” section, click “Generate New Key” to get your exclusive API Key. We recommend creating separate Keys for different environments (dev/staging/production) for permission management and usage tracking.

API Keys follow the format pgo_xxxxxxxxxxxxxxxx and must be passed in each request header as Authorization: Bearer YOUR_API_KEY.

Step 2: Understand Credits and Billing

Pangolinfo API uses usage-based billing with “Page Credits” as the unit. Each successfully scraped page consumes 1 Credit. Rates vary by data type:

Data TypeCredits per RequestNotes
Product detail page1Single ASIN product details
Search listing page1Typically 20-24 products per page
Ranking page150 products per page
Review page110 reviews per page
Sponsored ads2Higher due to ad placement parsing

New registrations receive free Credits for testing. For production, choose a plan based on your volume and enable usage alerts.

Step 3: Install the SDK (Optional)

pip install pangolinfo-api

The SDK encapsulates authentication, retries, error handling, and pagination. Recommended for production use.

Synchronous Integration: REST API Real-Time Calls

Synchronous integration is the most direct way to use Pangolinfo API. The client sends an HTTP request, the server executes Scraper in real-time, and returns results immediately. Suitable for latency-sensitive scenarios with relatively small data volumes.

Basic Request Structure

POST /v1/amazon/scrape HTTP/1.1
Host: api.pangolinfo.com
Authorization: Bearer pgo_your_api_key_here
Content-Type: application/json

Scenario 1: Product Detail Scraper (Sync)

import requests
import json

API_KEY = "pgo_your_api_key_here"
BASE_URL = "https://api.pangolinfo.com/v1"

def get_product_details(site: str, asins: list):
    url = f"{BASE_URL}/amazon/product"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {
        "site": site,
        "asins": asins,
        "output_format": "json",
        "include_variants": True,
        "include_bsr": True,
        "include_seller_info": True
    }
    response = requests.post(url, headers=headers, json=payload, timeout=30)
    response.raise_for_status()
    return response.json()

result = get_product_details("amazon.com", ["B08N5WRWNW", "B08N5M7S6K", "B09V3KXJPB"])
for product in result.get("products", []):
    print(f"ASIN: {product['asin']} | Price: {product['price']['current']} | Rating: {product['rating']['average']}")

Response Structure (Product Details)

{
  "status": "success",
  "request_id": "req_abc123def456",
  "products": [{
    "asin": "B08N5WRWNW",
    "title": "Apple AirPods (3rd Generation)",
    "brand": "Apple",
    "price": {"current": 149.99, "currency": "USD", "list_price": 179.00},
    "rating": {"average": 4.5, "count": 84532},
    "bsr": {"rank": 42, "category": "Electronics"},
    "images": {"main": "https://m.media-amazon.com/images/..."},
    "features": ["Spatial audio", "Adaptive EQ"],
    "variants": [{"asin": "B08N5WRWNW", "attribute": "Color", "value": "White"}],
    "seller": {"name": "Amazon.com", "is_amazon": true, "fulfillment": "FBA"},
    "availability": {"status": "In Stock"},
    "collected_at": "2026-05-25T08:30:00Z"
  }],
  "credits_used": 3,
  "credits_remaining": 997
}

Scenario 2: Keyword Search Data Scraper

def search_keywords(site: str, keyword: str, pages: int = 1):
    url = f"{BASE_URL}/amazon/search"
    headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
    all_results = []
    for page in range(1, pages + 1):
        payload = {
            "site": site, "keyword": keyword, "page": page,
            "sort_by": "relevance", "output_format": "json",
            "include_sponsored": True, "include_bsr": True
        }
        response = requests.post(url, headers=headers, json=payload, timeout=45)
        data = response.json()
        all_results.extend(data.get("products", []))
    return all_results

products = search_keywords("amazon.com", "wireless earbuds", pages=2)
print(f"Total products: {len(products)}")

Scenario 3: Best Sellers Ranking Scraper

def get_best_sellers(site: str, node_id: str, pages: int = 1):
    url = f"{BASE_URL}/amazon/bestsellers"
    headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
    payload = {"site": site, "node_id": node_id, "pages": pages, "output_format": "json"}
    response = requests.post(url, headers=headers, json=payload, timeout=30)
    return response.json()

result = get_best_sellers("amazon.com", "172282", pages=2)
for item in result.get("products", []):
    print(f"#{item['rank']} {item['title'][:50]}... | ${item['price']['current']}")

Scenario 4: Review Scraper

def get_reviews(site: str, asin: str, pages: int = 3):
    url = f"{BASE_URL}/amazon/reviews"
    headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
    payload = {"site": site, "asin": asin, "pages": pages, "sort_by": "recent", "include_images": True}
    response = requests.post(url, headers=headers, json=payload, timeout=30)
    return response.json()

reviews = get_reviews("amazon.com", "B08N5WRWNW", pages=2)
for review in reviews.get("reviews", [])[:5]:
    print(f"⭐ {review['rating']}/5 | {review['title']} | by {review['author']}")

Sync API Performance Characteristics

Average response latency: 3-15 seconds depending on target site speed and page complexity. Recommended timeout: 30-45 seconds. Maximum per request: 50 ASINs or 5 search pages. For larger batches, use async interface.

Asynchronous Integration: Best Practices for Large-Scale Scraper

When your business needs to collect thousands of products, hundreds of keywords, or continuously monitor multiple rankings, synchronous serial waiting becomes a bottleneck. Pangolinfo API’s async task mechanism allows submitting large batches at once, processed by distributed clusters, with results delivered via Webhook or polling.

Async Workflow

Phase 1: Task Submission. Client submits to task endpoint, server immediately returns task ID without waiting.

Phase 2: Distributed Processing. Tasks distributed across global Scraper nodes with automatic retries, proxy rotation, anti-bot handling.

Phase 3: Result Delivery. Upon completion, server pushes results via configured Webhook URL, or client pulls via status query.

Submit Async Tasks

def submit_async_task(task_type: str, tasks: list):
    url = f"{BASE_URL}/amazon/tasks"
    headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
    payload = {
        "task_type": task_type,
        "tasks": tasks,
        "webhook_url": "https://your-domain.com/webhook/pangolinfo",
        "webhook_secret": "your_webhook_secret",
        "priority": "normal",
        "output_format": "json"
    }
    response = requests.post(url, headers=headers, json=payload, timeout=30)
    return response.json()

asins = ["B08N5WRWNW", "B08N5M7S6K", ...]  # 100 ASINs
tasks = [{"site": "amazon.com", "asin": asin} for asin in asins]
result = submit_async_task("product_details", tasks)
print(f"Batch ID: {result['batch_id']} | Tasks: {result['task_count']}")

Webhook Receiver (Flask Example)

from flask import Flask, request, jsonify
import hmac, hashlib

app = Flask(__name__)
WEBHOOK_SECRET = "your_webhook_secret"

@app.route('/webhook/pangolinfo', methods=['POST'])
def handle_webhook():
    signature = request.headers.get('X-Pangolinfo-Signature')
    expected = hmac.new(WEBHOOK_SECRET.encode(), request.data, hashlib.sha256).hexdigest()
    if not hmac.compare_digest(signature, expected):
        return jsonify({"error": "Invalid signature"}), 401
    
    data = request.json
    if data['event'] == 'task.completed':
        save_to_database(data['task_id'], data['result'])
    elif data['event'] == 'task.failed':
        log_failure(data['task_id'], data['error'])
    return jsonify({"status": "ok"}), 200

Active Polling

def check_task_status(batch_id: str):
    url = f"{BASE_URL}/amazon/tasks/{batch_id}"
    headers = {"Authorization": f"Bearer {API_KEY}"}
    response = requests.get(url, headers=headers, timeout=30)
    return response.json()

batch_id = "batch_abc123"
while True:
    status = check_task_status(batch_id)
    print(f"Progress: {status['completed']}/{status['total']}")
    if status['status'] in ['completed', 'failed']:
        break
    time.sleep(10)

Sync vs Async Comparison

DimensionSyncAsync
Latency3-15sImmediate task ID
Max batch size50 ASINs / 5 pages10,000 tasks/batch
Result deliveryHTTP responseWebhook / Polling
Use caseReal-time, small batchLarge-scale, scheduled
RetryClient handlesServer auto-retries x3
Concurrency100 req/minNo hard limit

Enterprise Integration: Connecting Pangolinfo API to Your Data Systems and ERP

For mid-to-large enterprises, API Scraper is just the first step. Seamlessly integrating collected data into existing data warehouses, ERPs, BI systems, or custom platforms determines whether data value can be fully realized.

Integration Pattern 1: Direct Database Write

import psycopg2
from datetime import datetime

def save_products_to_db(products: list, db_config: dict):
    conn = psycopg2.connect(**db_config)
    cursor = conn.cursor()
    insert_sql = """
        INSERT INTO amazon_products (asin, site, title, brand, current_price, currency,
            rating_avg, rating_count, bsr_rank, collected_at)
        VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
        ON CONFLICT (asin, site) DO UPDATE SET
            title = EXCLUDED.title, current_price = EXCLUDED.current_price,
            rating_avg = EXCLUDED.rating_avg, collected_at = EXCLUDED.collected_at
    """
    for product in products:
        cursor.execute(insert_sql, (
            product['asin'], product['site'], product['title'],
            product.get('brand', ''), product.get('price', {}).get('current'),
            product.get('price', {}).get('currency', 'USD'),
            product.get('rating', {}).get('average'),
            product.get('rating', {}).get('count'),
            product.get('bsr', {}).get('rank'), datetime.now()
        ))
    conn.commit()
    cursor.close()
    conn.close()

Integration Pattern 2: Message Queue Middleware

import redis
redis_client = redis.Redis(host='localhost', port=6379, db=0)

def publish_to_queue(channel: str, data: dict):
    redis_client.xadd(channel, {"data": json.dumps(data)})

@app.route('/webhook/pangolinfo', methods=['POST'])
def handle_webhook():
    data = request.json
    if data['event'] == 'task.completed':
        publish_to_queue("amazon:data:products", data['result'])
    return jsonify({"status": "ok"}), 200

Integration Pattern 3: Data Warehouse ETL Pipeline

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def extract_amazon_data(**context):
    batch_id = submit_async_task("product_details", task_list)
    return wait_for_completion(batch_id)

def transform_data(**context):
    raw_data = context['ti'].xcom_pull(task_ids='extract')
    return clean_data(raw_data)

def load_to_warehouse(**context):
    data = context['ti'].xcom_pull(task_ids='transform')
    bulk_insert(data)

with DAG('amazon_data_pipeline', start_date=datetime(2026, 1, 1),
     schedule_interval='0 6 * * *', catchup=False) as dag:
    extract = PythonOperator(task_id='extract', python_callable=extract_amazon_data)
    transform = PythonOperator(task_id='transform', python_callable=transform_data)
    load = PythonOperator(task_id='load', python_callable=load_to_warehouse)
    extract >> transform >> load

ERP Integration Example

class AmazonERPConnector:
    def __init__(self, api_key: str, erp_db):
        self.api_key = api_key
        self.erp = erp_db
        self.base_url = "https://api.pangolinfo.com/v1"
    
    def sync_product_catalog(self, site: str, category_node: str):
        products = self._fetch_category_products(site, category_node)
        erp_products = [{
            "sku": f"AMZ-{p['asin']}", "name": p['title'],
            "category": p['bsr']['category'], "sale_price": p['price']['current'],
            "supplier": p['brand'], "source_platform": "amazon"
        } for p in products]
        self.erp.bulk_upsert_products(erp_products)
        return len(erp_products)
    
    def update_competitive_pricing(self, site: str, sku_list: list):
        asins = [sku.replace("AMZ-", "") for sku in sku_list]
        products = self._fetch_products(site, asins)
        for product in products:
            self.erp.update_price_monitor(
                sku=f"AMZ-{product['asin']}",
                competitor_price=product['price']['current']
            )

AI Agent Integration: Letting LLMs Directly Invoke Amazon Data Capabilities

Connecting large language models with external data tools is reshaping analytical workflows. Pangolinfo Amazon Scraper Skill, based on MCP (Model Context Protocol), enables AI Agents to directly invoke Amazon data Scraper capabilities without writing code.

MCP Protocol Overview

MCP is an open standard defining communication between AI Agents and external tools. Through MCP, Agents understand tool inputs, invocation methods, and return formats for autonomous decision-making. Pangolinfo is among the first ecommerce data providers supporting MCP.

Configuring Pangolinfo Amazon Scraper Skill

{
  "mcpServers": {
    "pangolinfo-amazon": {
      "command": "npx",
      "args": ["-y", "@pangolinfo/amazon-scraper-mcp@latest"],
      "env": {"PANGOLINFO_API_KEY": "pgo_your_api_key_here"}
    }
  }
}

Agent Use Cases

Case 1: Market Research
User: “Analyze wireless earbuds competition on Amazon US. I need: 1) Brand count in top 3 pages; 2) Price distribution; 3) Products with 1000+ reviews.”
Agent invokes amazon_search, processes 60 products, generates structured report.

Case 2: Competitor Monitoring
User: “Monitor ASIN B08N5WRWNW. Alert me if price drops >10% or BSR falls below 100.”
Agent fetches data, compares with history, triggers alerts with recommended actions.

Case 3: Review Sentiment Analysis
User: “Get last 100 reviews for these 5 competitor ASINs. What do users like and dislike most?”
Agent collects 500 reviews, extracts themes: “Most liked: sound quality (87%), battery (72%); Most disliked: charging speed (23%).”

Technical Implementation for Custom Agents

from langchain.tools import BaseTool
from pydantic import BaseModel, Field

class AmazonSearchInput(BaseModel):
    keyword: str = Field(description="Amazon search keyword")
    site: str = Field(default="amazon.com")
    pages: int = Field(default=1, ge=1, le=5)

class AmazonSearchTool(BaseTool):
    name = "amazon_search"
    description = "Search Amazon products and return structured data"
    args_schema = AmazonSearchInput
    def _run(self, keyword: str, site: str = "amazon.com", pages: int = 1):
        response = requests.post("https://api.pangolinfo.com/v1/amazon/search",
            headers={"Authorization": f"Bearer {API_KEY}"},
            json={"site": site, "keyword": keyword, "pages": pages})
        return response.json()

from langchain.agents import initialize_agent, Tool
tools = [Tool(name="amazon_search", func=AmazonSearchTool()._run)]
agent = initialize_agent(tools, llm, agent="zero-shot-react-description")
result = agent.run("Find top 10 wireless earbuds under $100 with 4.5+ rating")

Production Best Practices and Performance Optimization

1. Request Deduplication and Caching

import functools, redis
from datetime import timedelta
cache = redis.Redis()

def cached_api_call(ttl_seconds: int = 300):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            cache_key = f"pangolinfo:{func.__name__}:{hash(str(args)+str(kwargs))}"
            cached = cache.get(cache_key)
            if cached: return json.loads(cached)
            result = func(*args, **kwargs)
            cache.setex(cache_key, timedelta(seconds=ttl_seconds), json.dumps(result))
            return result
        return wrapper
    return decorator

2. Rate Limiting and Backoff

import time
from functools import wraps

def rate_limit(max_per_minute: int = 100):
    min_interval = 60.0 / max_per_minute
    last_call_time = {}
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            key = func.__name__
            now = time.time()
            if key in last_call_time:
                elapsed = now - last_call_time[key]
                if elapsed < min_interval: time.sleep(min_interval - elapsed)
            last_call_time[key] = time.time()
            return func(*args, **kwargs)
        return wrapper
    return decorator

3. Data Quality Validation

def validate_product_data(product: dict) -> tuple[bool, list]:
    errors = []
    if not product.get('asin'): errors.append("Missing ASIN")
    if not product.get('title') or len(product['title']) < 5: errors.append("Invalid title")
    price = product.get('price', {}).get('current')
    if price is None or price <= 0: errors.append("Invalid price")
    return len(errors) == 0, errors

4. Cost Control Strategies

StrategyApproachSavings
Incremental updatesOnly changed fields, not full refresh30-50%
Smart schedulingOff-peak execution by timezone10-20%
Local cachingHot data cached 5-30 min20-40%
Async priorityLarge batches via async15-25%
Data reuseOne Scraper, multiple consumers25-35%

Conclusion: From Data Access to Business Value

This article has systematically covered Pangolinfo API’s complete usage for Amazon data Scraper — from synchronous REST API calls to asynchronous batch processing; from product details, keyword search, ranking monitoring, review Scraper to ad placement analysis; from direct database integration to message queue middleware and AI Agent intelligence applications. Each section provides production-tested code examples and best practices.

The essence of ecommerce data Scraper isn’t technical complexity, but providing timely, accurate, complete data for business decisions. Choosing the right technical approach, building reliable data pipelines, and letting teams focus on insight extraction and strategy formulation — that’s what data-driven operations truly mean.

Start building your Amazon data capabilities today: Visit Pangolinfo Scrape API for your API Key and full documentation, or explore Amazon Scraper Skill to give your AI Agent direct ecommerce data Scraper capabilities.Read the Amazon Scrape API documentation

Frequently Asked Questions

What Amazon data Scraper scenarios does Pangolinfo API support?

Pangolinfo API supports keyword search, Best Sellers, New Releases, product details, reviews, sponsored ads, browse nodes, and AI Overview SERP data Scraper across 15 major Amazon marketplaces globally.

What is the difference between sync and async integration?

Sync integration returns results directly via REST API, suitable for real-time single requests with 3-15s latency. Async integration submits tasks and receives results via Webhook callbacks, ideal for large-scale batch collection.

How do I integrate Pangolinfo API with my ERP system?

Three approaches: 1) Direct REST API calls writing to ERP database; 2) Middleware service for periodic synchronization; 3) Webhook push mechanism for real-time data delivery.

What data formats does Pangolinfo API return?

Default is structured JSON. Also supports raw HTML and Markdown output formats via the output_format parameter.

How can AI Agents use Pangolinfo Amazon Scraper Skill?

Pangolinfo Amazon Scraper Skill uses MCP protocol, enabling AI Agents to invoke data collection via natural language commands.

Scan WhatsApp
to Contact

QR Code
Quick Test

联系我们,您的问题,我们随时倾听

无论您在使用 Pangolin 产品的过程中遇到任何问题,或有任何需求与建议,我们都在这里为您提供支持。请填写以下信息,我们的团队将尽快与您联系,确保您获得最佳的产品体验。

Talk to our team

If you encounter any issues while using Pangolin products, please fill out the following information, and our team will contact you as soon as possible to ensure you have the best product experience.