A VPN is an essential component of IT security, whether you’re just starting a business or are already up and running. Most business interactions and transactions happen online and VPN
一套API打通多平台的数据管道|Multi-Platform Data Collection API Pipeline

No fluff. This article walks through how to build a stable, deep and extensible data pipeline across platforms—canonical schema, API design, anti-bot & observability, ETL & analytics, common pitfalls and code you can actually run.

Contents

  1. Why a single multi-platform API
  2. Supported platforms and key differences
  3. Canonical schema: Product/Listing/Offer/Seller/Review/AdSlot
  4. API design: endpoints, params, paging, idempotency and caching
  5. Anti-bot, observability and data quality
  6. ETL and analytics: normalization, variant aggregation, FX conversion, brand/category mapping
  7. Code examples
  8. Metrics and dashboards
  9. Case study: same air fryer across three platforms
  10. FAQ and compliance

1. Why a single multi-platform API

Cross-platform operations are noisy: different structures, different rhythms. Scripts break; manual work doesn’t scale. We want one pipeline that’s uniformcontrollableextensible and rollback-friendly.

  • Uniform: one API handles search/listing/detail/reviews/ad slots across platforms.
  • Controllable: rate limiting, retries, caching, degradation and canary in config, not hardcoded.
  • Extensible: adding a platform means adding mappings and an adapter, not rewriting the downstream.
  • Rollback-friendly: versioned adapters and quick rollback when signals drift.

2. Supported platforms & differences

  • Amazon: ASIN, complex variants, BuyBox & Sponsored signals, deep categories.
  • Walmart: itemId/SKU, mixed 1P/3P, list vs detail field differences, ads vary.
  • Shopee: composite item_id + shop_id, price ranges & promo, shop-level signals matter.
  • Lazada: SPU/SKU, heavy promos/coupons, category trees differ by site.
  • Tokopedia: shop-rich signals, region/logistics affect exposure, review pagination quirks.
  • eBay: Item ID, auction vs buy-now, clear Listing vs Offer boundary.
  • AliExpress: frequent platform promos, translation impacts matching, review geo distribution useful.
  • Mercado Libre: LATAM, multiple currencies & taxonomies, seller reputation model differs.
  • Etsy: handmade/custom, attributes irregular, images/copy more critical.
  • Rakuten/Flipkart: strong localization, search/sort differs from US/EU platforms.
  • Temu: fast-changing page structures; heavy promo-driven low-price strategy; require cautious adapter canary.
  • Jumia: Africa regions; logistics/stock volatility strongly impact conversion; category mapping needs local nuance.
  • Noon: Middle East; Arabic/English bilingual; currencies and taxes must be cleanly modeled.
  • Daraz: South Asia; promo-heavy with shop-weight; review logic similar to SEA but details differ.
  • Kaufland/Otto/Allegro: DE/PL local platforms; EAN/GTIN matching matters; descriptions are structured but badges vary.
  • Bol.com: NL local; brand aggregation pages common; mind merged listings across sellers (identify and split correctly).

Strategy: ship Amazon/Walmart/Shopee first; then expand regionally—SEA (Lazada/Tokopedia), US/EU (eBay/AliExpress), LATAM (Mercado Libre).

3. Canonical schema

  • Product: cross-platform product dimension (title/brand/images/attributes/gtin).
  • Listing: platform exposure & ranking (platform, category_path, rank_position, badges, is_sponsored).
  • Offer: price/stock snapshots (price, currency, original_price, availability, stock, timestamp).
  • Seller: shop/seller info (seller_id, seller_name, rating, store_metrics).
  • Review: rating & review counts, recency, breakdown.
  • AdSlot: ad slots and signals (slot_type, confidence, evidence text/icon/aria/css/context).

Keep fast-changing signals (variants, prices, ad slots) out of Product; store them as separate entities for history and comparability.

New: deep category examples (Appliances, Women’s Apparel, Mother & Baby, Small Appliances, Power Tools)

Categories behave differently. Use the canonical skeleton and map category-specific attributes carefully.

Appliances (fridge/washer/microwave)

  • Key attributes: brand, capacity (L), energy rating, dimensions (L×W×H), inverter, warranty.
  • Variants: color/capacity; Amazon/Walmart aggregate variants better; SEA platforms sometimes split variants into separate listings.
  • Mapping: Product.attributes.capacity_l, energy_rating; normalize units to cm.
  • Analysis: price index by capacity bands; stock volatility and replenishment speed as quality proxies; extract review keywords: noise/energy/after-sales.

Women’s Apparel (dress)

  • Key attributes: size (US/EU/ASIA), color, material, fit, season tags.
  • Variants: size × color; avoid platforms merging multiple designs under one SKU.
  • Mapping: attributes.size_standard, attributes.material; normalize colors to a standard palette.
  • Analysis: returns-related reviews are sensitive (size/quality); ad slots impact rankings more directly in apparel; strong seasonality—use time windows.

Mother & Baby (strollers/formula)

  • Key attributes: age/weight range, standards (EN/ASTM), formula type, safety certifications.
  • Variants: pack size/color; mind regional regulations for formula claims.
  • Mapping: attributes.age_range, attributes.safety_cert; preserve certification codes in cleaned text.
  • Analysis: review stability and negative review control matter; stock gaps strongly affect rank recovery.

Small Appliances (air fryer/blender)

  • Key attributes: capacity (L), power (W), coating type, preset modes.
  • Variants: capacity and color; promos are strong—analyze price elasticity with promo badges.
  • Mapping: attributes.capacity_l, attributes.power_w; normalize feature keywords (airfry/bake/grill).
  • Analysis: review keywords “coating/odor/cleanability”; split SoV into organic vs ads.

Power Tools (drills/grinders)

  • Key attributes: power/voltage, chuck size, RPM range, brushless, accessories list.
  • Variants: battery capacity/bundle size; same SPU may split into multiple listings across platforms.
  • Mapping: attributes.voltage_v, attributes.rpm, attributes.brushless.
  • Analysis: spec-driven segment; price correlates with accessory bundles; extract “durability/battery/heat” from reviews.

Engineering tip: namespace category-specific attributes under Product.attributes (e.g., appliance.*, fashion.*, baby.*, tools.*) to keep generic fields clean.

4. API design

{
  "endpoint": "/v1/scrape",
  "method": "POST",
  "request": {
    "platform": "amazon|walmart|shopee|lazada|tokopedia|ebay|aliexpress|ml|etsy|rakuten|flipkart",
    "intent": "search|listing|product_detail|offers|reviews|ads",
    "query": "air fryer",
    "region": "US|SG|ID|MY|PH|TH|VN|BR|MX|JP|EU",
    "language": "en|zh|id|ms|es|pt|jp",
    "page": 1,
    "page_size": 24,
    "options": { "wait_render": true, "include_ads": true, "include_seller": true, "timeout_ms": 30000 }
  },
  "response": {
    "items": [ { /* Canonical fields */ } ],
    "meta": { "page": 1, "page_size": 24, "total": 500, "trace_id": "..." },
    "cache": { "hit": false, "ttl": 60 }
  }
}
  • Paging: unify page/page_size. Convert cursor-based paging to this format.
  • Idempotency: within TTL, same query+region+intent returns consistent results. Support idempotency-key.
  • Caching: short TTL for hot queries & details; avoid long caching for reviews/ads.
  • Canary: adapter versions roll out gradually; quick rollback on anomalies.

5. Anti-bot, observability & quality

  • Fingerprint pool: diversify IP/UA/screen/timezone; human-like pacing.
  • Render waits: use wait_for_selector and fallbacks for delayed signals (Sponsored, badges).
  • Retry & degrade: fast retries; degrade to low-precision parsing or cached snapshots on persistent failures.
  • Coverage & accuracy: target ≥95% overall, ≥98% for core queries; multi-signal ad detection with confidence & evidence.
  • Metrics: latency, error rate, timeouts, retry counts, cache hit rate, freshness—on dashboards and alerts.

6. ETL & analytics

  • Extract: concurrent queues per platform/intent; rate-limited; retry with backoff.
  • Transform: mapping, category/brand normalization, FX conversion, variant aggregation, text cleaning.
  • Load: warehouse partitions by platform/date; Bronze/Silver/Gold layers for raw/cleaned/curated.

Variant tip: don’t average prices across color/size. Keep main SKU and price deltas by variant—much more useful for elasticity analysis.

7. Code examples

Python: async concurrency

import asyncio, aiohttp

API_BASE = "https://api.example.com"
API_KEY = "YOUR_API_KEY"

async def scrape(session, payload):
    async with session.post(f"{API_BASE}/v1/scrape", json=payload, headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }) as resp:
        if resp.status != 200:
            raise RuntimeError(f"HTTP {resp.status}")
        return await resp.json()

async def main():
    queries = [
        {"platform": "walmart", "intent": "search", "query": "air fryer", "region": "US", "language": "en"},
        {"platform": "shopee", "intent": "product_detail", "item_id": 123456789, "shop_id": 987654321, "region": "SG", "language": "en"},
        {"platform": "lazada", "intent": "search", "query": "air fryer", "region": "SG", "language": "en"}
    ]
    async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=30)) as session:
        tasks = [scrape(session, q) for q in queries]
        results = await asyncio.gather(*tasks, return_exceptions=True)
        for r in results:
            print(type(r), str(r)[:200])

asyncio.run(main())

Node.js: streaming & dedupe

import fetch from "node-fetch";

const API_BASE = "https://api.example.com";
const API_KEY = process.env.API_KEY;

async function scrape(body) {
  const resp = await fetch(`${API_BASE}/v1/scrape`, {
    method: "POST",
    headers: { "Authorization": `Bearer ${API_KEY}`, "Content-Type": "application/json" },
    body: JSON.stringify(body)
  });
  if (!resp.ok) throw new Error(`HTTP ${resp.status}`);
  return await resp.json();
}

function dedupe(items) {
  const seen = new Set();
  return items.filter(x => {
    const key = `${x.platform}:${x.global_id}`;
    if (seen.has(key)) return false; seen.add(key); return true;
  });
}

const walmart = await scrape({ platform: "walmart", intent: "search", query: "air fryer", region: "US", language: "en", options: { include_ads: true } });
const shopee = await scrape({ platform: "shopee", intent: "product_detail", item_id: 123456789, shop_id: 987654321, region: "SG", language: "en" });

const merged = dedupe([...(walmart.items||[]), shopee]);
console.log(merged.slice(0, 3));

Walmart ad-slot snippet

{
  "items": [
    {
      "global_id": "walmart:itemId:123456",
      "platform": "walmart",
      "title": "Air Fryer Pro",
      "is_sponsored": true,
      "adslot": {
        "slot_type": "search_top",
        "confidence": 0.93,
        "evidence": { "text": "Sponsored", "aria": "sponsored", "css": ".ad-badge" }
      }
    }
  ]
}

8. Metrics & dashboards

  • Price Index: median, quantiles, promo share by platform/category.
  • Availability Volatility: stock-outs and replenishment speed by day/week.
  • Share of Voice: brand exposure & ad-share across search listings.
  • Listing Velocity: new listings and rank lift; spot rising stars and fading SKUs.

9. Case study: the same air fryer across three platforms

Pick a comparable SKU (e.g., 4L air fryer). Pull top-N search and details from Amazon/Walmart/Shopee. Then:

  1. Price & promos: Amazon promo-heavy; Walmart steadier prices; Shopee wide promo-driven ranges.
  2. Ads & ranking: Amazon Sponsored dense at the top; Walmart ads more discrete; Shopee shop-weighted rankings.
  3. Reviews & conversion: review volume and recent growth beat single-point ratings as conversion proxies.

With the canonical schema, stitch all three and make a simple call: main push on Amazon; price strategy follows Walmart median; Shopee wins via promo pricing and shop exposure. Decisions backed by data—not gut feel.

10. FAQ & compliance

  • Don’t over-scrape: rate-limit, cache and respect platform ToS; use public product data only.
  • Variants: don’t mix variant prices; avoid treating child SKUs as separate products when comparing.
  • Multilingual: regex + dictionaries; don’t rely on English-only “Sponsored”.
  • Rollback: versioned adapters, canary rollout, automatic degrade on anomalies, on-call playbooks.

For enterprise-grade onboarding (platform list, quotas, regional configs), contact us. We ship region- and category-specific solutions.

Keywords: Multi-Platform Data Collection API · Walmart Data Scraping API · Shopee Data Collection Tool · Lazada · Tokopedia · eBay · AliExpress · Mercado Libre · Etsy · Rakuten · Flipkart

© 2025 Data Intelligence. For technical research and compliant business use only.

Our solution

Protect your web crawler against blocked requests, proxy failure, IP leak, browser crash and CAPTCHAs!

With Data Pilot, easily access cross-page, endto-end data, solving data fragmentation andcomplexity, empowering quick, informedbusiness decisions.

Weekly Tutorial

Sign up for our Newsletter

Sign up now to embark on your Amazon data journey, and we will provide you with the most accurate and efficient data collection solutions.

Unlock website data now!

Submit request → Get a custom solution + Free API test.

We use TLS/SSL encryption, and your submitted information is only used for solution communication.

联系我们,您的问题,我们随时倾听

无论您在使用 Pangolin 产品的过程中遇到任何问题,或有任何需求与建议,我们都在这里为您提供支持。请填写以下信息,我们的团队将尽快与您联系,确保您获得最佳的产品体验。

Talk to our team

If you encounter any issues while using Pangolin products, please fill out the following information, and our team will contact you as soon as possible to ensure you have the best product experience.