Python Pangolin API Tutorial: At 2 AM, Zhang Hao stared at the dense error logs on his screen, sweat beading on his forehead. The scraping system he’d spent three full days building had been blocked by Amazon’s anti-bot mechanisms just two hours after launch. Worse still, his boss expected the competitor price monitoring report tomorrow morning. This wasn’t his first encounter with such predicaments—the instability of self-built scrapers, escalating maintenance costs, and continuously rising technical barriers made him reconsider his data collection strategy.
It wasn’t until he discovered Pangolin Scrape API that he realized how simple and efficient Python Pangolin API integration could be. This Python Pangolin API tutorial demonstrates through 15+ complete code examples how a few dozen lines of code can accomplish what previously required thousands of lines of scraping logic, with far superior stability and data quality. This transformation not only freed up technical resources but allowed the team to focus on data analysis and business logic rather than infrastructure maintenance.
Python Pangolin API Tutorial: Why Choose API Over Self-Built Scrapers
In the e-commerce data collection field, developers face a classic technical decision: invest resources in building a scraping system or integrate a mature API service? While the answer isn’t absolute, API solutions demonstrate clear advantages for most scenarios. Self-built scrapers appear to offer complete control and flexibility, but they hide enormous hidden costs—you need to continuously track target website structure changes, handle increasingly complex anti-scraping strategies, maintain proxy IP pool quality, and manage various exception scenarios. These tasks often consume tremendous team energy.
More critically, there are challenges of timeliness and scalability. When you need to collect data from tens of thousands of ASINs hourly, self-built solutions’ network resource costs escalate rapidly. When Amazon suddenly adjusts page structures causing parsing failures, you might need emergency overtime to fix code. Professional API services have systematically solved these problems—Pangolin Scrape API can support collection scales of tens of millions of pages daily with minute-level timeliness, having accumulated mature collection experience and parsing templates for various Amazon pages.
From a development efficiency perspective, API integration enables rapid business hypothesis validation. Rather than spending weeks building scraping infrastructure, you can complete API integration in days and immediately obtain data, investing energy in truly value-generating data analysis and application development. This agility is particularly important in the rapidly changing e-commerce environment—market opportunities are fleeting, and technical implementation speed often determines business competition outcomes.
Python Pangolin API Tutorial Step 1: Environment Setup & Dependencies
Before starting, we need to ensure our Python development environment is ready. While Pangolin API supports any programming language capable of making HTTP requests, Python has become the ideal choice due to its concise syntax, rich third-party library ecosystem, and widespread application in data processing. Python 3.8 or higher is recommended, as these versions offer significant improvements in performance and standard library functionality.
The first step in environment setup is installing necessary dependency packages. We primarily need the requests library for HTTP requests, plus some auxiliary libraries for data processing and storage. Open your terminal and execute the following command to complete installation:
pip install requests pandas python-dotenv schedule
Here, requests is the HTTP client library, pandas is for data processing and analysis, python-dotenv helps us securely manage sensitive information like API keys, and schedule enables scheduled task execution. After installation, it’s recommended to create an independent project directory with a clear file structure—separating API call logic, data processing modules, and configuration files improves code maintainability.
Next, create a .env file to store your API key—this is an important security practice. Never hardcode keys in your code or commit them to version control systems; instead, manage them through environment variables. Create a .env file in your project root and add your API credentials:
PANGOLIN_API_KEY=your_api_key_here
PANGOLIN_BASE_URL=https://api.pangolinfo.com/scrape
API Authentication: Secure and Efficient Access
Pangolin API employs a key-based authentication mechanism that ensures security without adding development complexity. For each API call, you need to include your API key in request parameters or headers, and the server validates the key’s validity before returning data. Compared to complex authentication flows like OAuth, this approach is more direct and efficient for server-to-server call scenarios.
Let’s create a basic API client class that encapsulates authentication logic and common request handling. This object-oriented design makes future feature extensions more elegant:
import os
import requests
from dotenv import load_dotenv
from typing import Dict, Optional, Any
class PangolinClient:
"""Pangolin API base client class"""
def __init__(self):
load_dotenv()
self.api_key = os.getenv('PANGOLIN_API_KEY')
self.base_url = os.getenv('PANGOLIN_BASE_URL')
if not self.api_key:
raise ValueError("API key not configured, please check .env file")
def _make_request(self, endpoint: str, params: Dict[str, Any]) -> Optional[Dict]:
"""Generic method for making API requests"""
params['api_key'] = self.api_key
try:
response = requests.get(
f"{self.base_url}/{endpoint}",
params=params,
timeout=30
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
print(f"API request failed: {str(e)}")
return None
This base client class implements several key functions: loading configuration from environment variables ensures security, a unified request method simplifies subsequent API calls, and exception catching mechanisms improve program robustness. In production, you might also need to add retry logic, request logging, response caching, and other enhancements, but this basic framework is sufficient for most scenarios.
Python Pangolin API Complete Example: Fetching Product Details
Now let’s implement our first practical function—retrieving detailed Amazon product information. This is the most common data collection requirement and the best entry point for understanding API mechanics. Pangolin API supports direct ASIN-based retrieval of complete product data, including title, price, rating, stock status, variation information, and dozens of other fields.
Add a product details retrieval method to the PangolinClient class:
def get_product_details(self, asin: str, marketplace: str = 'US') -> Optional[Dict]:
"""Fetch product details data
Args:
asin: Amazon product ASIN code
marketplace: Marketplace code like US/UK/DE
Returns:
Dictionary containing detailed product information, None if failed
"""
params = {
'type': 'product',
'asin': asin,
'marketplace': marketplace,
'parse': 'true' # Return parsed structured data
}
return self._make_request('', params)
Using this method is extremely simple—just a few lines of code to retrieve complete product data:
client = PangolinClient()
product_data = client.get_product_details('B08N5WRWNW')
if product_data:
print(f"Product Title: {product_data.get('title')}")
print(f"Current Price: {product_data.get('price', {}).get('value')}")
print(f"Rating: {product_data.get('rating')}")
print(f"Review Count: {product_data.get('reviews_total')}")
The returned data is in parsed JSON format with a clear structure that’s easy to process. You can directly extract needed fields for analysis or store the entire data object in a database for later use. Compared to writing HTML parsing code yourself, this approach offers not only higher development efficiency but also eliminates concerns about parsing failures due to page structure changes.
Error Handling Solutions: Building Robust Systems
In production environments, any external API call may encounter various exception scenarios—network fluctuations, temporary service unavailability, incorrect request parameters, quota limits, etc. A mature integration solution must properly handle these scenarios rather than simply letting the program crash or return errors. Let’s add comprehensive error handling and retry mechanisms to our API client.
First, define some custom exception classes to distinguish different error types:
class PangolinAPIError(Exception):
"""Base API call exception"""
pass
class AuthenticationError(PangolinAPIError):
"""Authentication failure exception"""
pass
class RateLimitError(PangolinAPIError):
"""Rate limit exceeded exception"""
pass
class InvalidParameterError(PangolinAPIError):
"""Invalid parameter exception"""
pass
Then improve the _make_request method with detailed error handling and intelligent retry logic:
import time
from typing import Dict, Optional, Any
def _make_request(self, endpoint: str, params: Dict[str, Any],
max_retries: int = 3) -> Optional[Dict]:
"""Make API request with retry mechanism"""
params['api_key'] = self.api_key
for attempt in range(max_retries):
try:
response = requests.get(
f"{self.base_url}/{endpoint}",
params=params,
timeout=30
)
# Handle different scenarios based on HTTP status code
if response.status_code == 200:
return response.json()
elif response.status_code == 401:
raise AuthenticationError("Invalid or expired API key")
elif response.status_code == 429:
# Rate limit exceeded, wait and retry
wait_time = int(response.headers.get('Retry-After', 60))
print(f"Rate limit exceeded, waiting {wait_time}s before retry...")
time.sleep(wait_time)
continue
elif response.status_code == 400:
error_msg = response.json().get('message', 'Parameter error')
raise InvalidParameterError(f"Invalid request parameters: {error_msg}")
else:
response.raise_for_status()
except requests.exceptions.Timeout:
print(f"Request timeout, retry attempt {attempt + 1}...")
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
continue
else:
raise PangolinAPIError("Request timeout, max retries reached")
except requests.exceptions.ConnectionError:
print(f"Network connection failed, retry attempt {attempt + 1}...")
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
continue
else:
raise PangolinAPIError("Network connection failed, max retries reached")
return None
This enhanced request method implements multi-layered protection: automatic retry with exponential backoff strategy for temporary network issues, compliance with server-returned wait times for quota limits, and throwing specific exception types for clear errors to facilitate upper-layer handling. This design enables your application to gracefully handle various exception scenarios, significantly improving system reliability.
Real-World Project 1: Amazon Bestseller Monitoring System
Having understood basic calling methods, let’s build our first complete real-world project—a Best Sellers ranking monitoring system. This system can periodically collect bestseller ranking data for specified categories, track ranking trend changes, and identify newly listed products and items with abnormal ranking fluctuations. For product selection and competitor analysis, this is an extremely valuable data source.
First, extend the PangolinClient class by adding a method to fetch bestseller data:
def get_bestsellers(self, category: str, marketplace: str = 'US',
page: int = 1) -> Optional[Dict]:
"""Fetch Best Sellers ranking data
Args:
category: Category URL or category ID
marketplace: Marketplace code
page: Page number
Returns:
Ranking data dictionary
"""
params = {
'type': 'bestsellers',
'category': category,
'marketplace': marketplace,
'page': page,
'parse': 'true'
}
return self._make_request('', params)
Next, create a ranking monitor class implementing complete logic for data collection, storage, and change analysis:
import pandas as pd
from datetime import datetime
import json
class BestSellersMonitor:
"""Best Sellers ranking monitor"""
def __init__(self, client: PangolinClient):
self.client = client
self.history_file = 'bestsellers_history.json'
self.load_history()
def load_history(self):
"""Load historical data"""
try:
with open(self.history_file, 'r', encoding='utf-8') as f:
self.history = json.load(f)
except FileNotFoundError:
self.history = {}
def save_history(self):
"""Save historical data"""
with open(self.history_file, 'w', encoding='utf-8') as f:
json.dump(self.history, f, ensure_ascii=False, indent=2)
def monitor_category(self, category: str, marketplace: str = 'US'):
"""Monitor rankings for specified category"""
print(f"Starting collection of Best Sellers for {category} category...")
data = self.client.get_bestsellers(category, marketplace)
if not data:
print("Data collection failed")
return
timestamp = datetime.now().isoformat()
products = data.get('products', [])
# Extract key information
current_ranking = {}
for product in products:
asin = product.get('asin')
current_ranking[asin] = {
'rank': product.get('rank'),
'title': product.get('title'),
'price': product.get('price', {}).get('value'),
'rating': product.get('rating'),
'reviews': product.get('reviews_count'),
'timestamp': timestamp
}
# Compare with historical data to identify changes
category_key = f"{marketplace}_{category}"
if category_key in self.history:
self.analyze_changes(category_key, current_ranking)
# Update historical records
self.history[category_key] = current_ranking
self.save_history()
print(f"Successfully collected {len(products)} product records")
def analyze_changes(self, category_key: str, current_ranking: Dict):
"""Analyze ranking changes"""
previous = self.history[category_key]
# Identify newly listed products
new_products = set(current_ranking.keys()) - set(previous.keys())
if new_products:
print(f"\nFound {len(new_products)} newly listed products:")
for asin in new_products:
product = current_ranking[asin]
print(f" - {product['title'][:50]}... (Rank: {product['rank']})")
# Identify products with significant ranking changes
print("\nProducts with significant ranking changes:")
for asin in set(current_ranking.keys()) & set(previous.keys()):
old_rank = previous[asin]['rank']
new_rank = current_ranking[asin]['rank']
rank_change = old_rank - new_rank
if abs(rank_change) >= 10: # Ranking change exceeds 10 positions
direction = "improved" if rank_change > 0 else "declined"
print(f" - {current_ranking[asin]['title'][:50]}...")
print(f" Rank {direction}: {old_rank} → {new_rank} ({abs(rank_change)} positions)")
def export_to_excel(self, category_key: str, filename: str):
"""Export ranking data to Excel"""
if category_key not in self.history:
print("No historical data for this category")
return
data = self.history[category_key]
df = pd.DataFrame.from_dict(data, orient='index')
df.to_excel(filename, index_label='ASIN')
print(f"Data exported to {filename}")
Using this monitoring system is straightforward—you can set up scheduled tasks for daily automatic collection:
import schedule
import time
client = PangolinClient()
monitor = BestSellersMonitor(client)
# Define monitoring task
def daily_monitor():
monitor.monitor_category('kitchen', 'US')
monitor.export_to_excel('US_kitchen', f'bestsellers_{datetime.now().strftime("%Y%m%d")}.xlsx')
# Execute daily at 9 AM
schedule.every().day.at("09:00").do(daily_monitor)
# Can also execute immediately once
daily_monitor()
# Keep program running
while True:
schedule.run_pending()
time.sleep(60)
Real-World Project 2: Competitor Price Tracking System
Price monitoring is one of the most common and critical requirements in e-commerce operations. By continuously tracking competitor price changes, you can adjust your pricing strategy timely, capture promotional opportunities, and analyze competitor operational rhythms. Let’s build a fully-featured price tracking system.
class PriceTracker:
"""Competitor price tracker"""
def __init__(self, client: PangolinClient):
self.client = client
self.db_file = 'price_history.db'
self.init_database()
def init_database(self):
"""Initialize SQLite database"""
import sqlite3
conn = sqlite3.connect(self.db_file)
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS price_records (
id INTEGER PRIMARY KEY AUTOINCREMENT,
asin TEXT NOT NULL,
marketplace TEXT NOT NULL,
price REAL,
currency TEXT,
availability TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
INDEX idx_asin_time (asin, timestamp)
)
''')
conn.commit()
conn.close()
def track_product(self, asin: str, marketplace: str = 'US'):
"""Track single product price"""
import sqlite3
product_data = self.client.get_product_details(asin, marketplace)
if not product_data:
return False
price_info = product_data.get('price', {})
conn = sqlite3.connect(self.db_file)
cursor = conn.cursor()
cursor.execute('''
INSERT INTO price_records (asin, marketplace, price, currency, availability)
VALUES (?, ?, ?, ?, ?)
''', (
asin,
marketplace,
price_info.get('value'),
price_info.get('currency'),
product_data.get('availability')
))
conn.commit()
conn.close()
return True
def track_multiple(self, asin_list: list, marketplace: str = 'US'):
"""Batch track multiple products"""
success_count = 0
for asin in asin_list:
if self.track_product(asin, marketplace):
success_count += 1
print(f"✓ {asin} price recorded")
else:
print(f"✗ {asin} collection failed")
time.sleep(1) # Avoid too frequent requests
print(f"\nCompleted: {success_count}/{len(asin_list)} products")
def get_price_history(self, asin: str, days: int = 30) -> pd.DataFrame:
"""Get product price history"""
import sqlite3
conn = sqlite3.connect(self.db_file)
query = '''
SELECT timestamp, price, availability
FROM price_records
WHERE asin = ?
AND timestamp >= datetime('now', '-{} days')
ORDER BY timestamp
'''.format(days)
df = pd.read_sql_query(query, conn, params=(asin,))
conn.close()
return df
def detect_price_changes(self, asin: str, threshold: float = 0.05):
"""Detect abnormal price changes"""
df = self.get_price_history(asin, days=7)
if len(df) < 2:
return None
latest_price = df.iloc[-1]['price']
previous_price = df.iloc[-2]['price']
if previous_price and latest_price:
change_rate = (latest_price - previous_price) / previous_price
if abs(change_rate) >= threshold:
return {
'asin': asin,
'previous_price': previous_price,
'current_price': latest_price,
'change_rate': change_rate,
'alert_type': 'price_drop' if change_rate < 0 else 'price_increase'
}
return None
def generate_report(self, asin_list: list):
"""Generate price monitoring report"""
alerts = []
for asin in asin_list:
alert = self.detect_price_changes(asin)
if alert:
alerts.append(alert)
if alerts:
print("\n=== Price Change Alerts ===")
for alert in alerts:
change_pct = alert['change_rate'] * 100
symbol = "↓" if alert['alert_type'] == 'price_drop' else "↑"
print(f"{symbol} {alert['asin']}: "
f"${alert['previous_price']:.2f} → ${alert['current_price']:.2f} "
f"({change_pct:+.1f}%)")
else:
print("No significant price changes detected")
return alerts
This price tracking system uses SQLite database for historical data storage, supporting batch monitoring, price change detection, and report generation. In actual use, you can configure it like this:
# Initialize tracker
tracker = PriceTracker(client)
# Define competitor ASIN list
competitor_asins = [
'B08N5WRWNW',
'B07XJ8C8F5',
'B09B8RWTK3'
]
# Execute price collection hourly
def hourly_price_check():
tracker.track_multiple(competitor_asins)
tracker.generate_report(competitor_asins)
schedule.every().hour.do(hourly_price_check)
Advanced Techniques: Data Collection Optimization Strategies
When your monitoring scale expands to hundreds or even thousands of ASINs, optimizing collection efficiency and cost becomes crucial. First is batch request processing—while APIs typically follow a single request-single response pattern, you can implement concurrency control at the application layer using Python’s concurrent.futures module to simultaneously initiate multiple requests, significantly improving throughput.
from concurrent.futures import ThreadPoolExecutor, as_completed
def batch_fetch_products(client: PangolinClient, asin_list: list, max_workers: int = 5):
"""Concurrent batch product data fetching"""
results = {}
with ThreadPoolExecutor(max_workers=max_workers) as executor:
# Submit all tasks
future_to_asin = {
executor.submit(client.get_product_details, asin): asin
for asin in asin_list
}
# Collect results
for future in as_completed(future_to_asin):
asin = future_to_asin[future]
try:
data = future.result()
if data:
results[asin] = data
print(f"✓ {asin}")
except Exception as e:
print(f"✗ {asin}: {str(e)}")
return results
Another optimization direction is intelligent caching mechanisms. For data with low change frequency, such as basic product information, there’s no need for real-time collection every time. You can set cache validity periods and directly return cached data within the valid period:
import pickle
from datetime import datetime, timedelta
class CachedPangolinClient(PangolinClient):
"""API client with caching functionality"""
def __init__(self, cache_ttl: int = 3600):
super().__init__()
self.cache = {}
self.cache_ttl = cache_ttl # Cache validity period (seconds)
def get_product_details(self, asin: str, marketplace: str = 'US',
use_cache: bool = True) -> Optional[Dict]:
"""Fetch product details with caching support"""
cache_key = f"{marketplace}_{asin}"
# Check cache
if use_cache and cache_key in self.cache:
cached_data, cached_time = self.cache[cache_key]
if datetime.now() - cached_time < timedelta(seconds=self.cache_ttl):
print(f"Returning from cache: {asin}")
return cached_data
# Call API to fetch new data
data = super().get_product_details(asin, marketplace)
# Update cache
if data:
self.cache[cache_key] = (data, datetime.now())
return data
Real-Time Data Scraping: Core Capability of Monitoring Systems
Whether for bestseller monitoring or price tracking, the foundation of all monitoring systems is periodic real-time data scraping. Data timeliness directly determines monitoring value—outdated price information may lead to incorrect pricing decisions, and delayed ranking data causes you to miss early market trend signals. This is why choosing an API service capable of providing minute-level data updates is so important.
Pangolin Scrape API demonstrates clear technical advantages in this regard. It not only supports collection scales of tens of millions of pages daily but more importantly guarantees data accuracy and completeness. For various Amazon pages, Pangolin has accumulated mature collection experience and parsing templates—from product description fields on detail pages, to complete review keywords and sentiment orientations in customer says, to sponsored ad position collection rates reaching 98%, these are levels self-built scrapers struggle to achieve.
Particularly for advertising data collection, since Amazon’s sponsored ad positions use a black-box algorithm, achieving high collection rates requires very strong comprehensive capabilities. If collection rates are low, it directly leads to inaccurate analysis of keyword traffic sources, thereby affecting advertising strategy formulation. Professional API services have solved these challenges through continuous technical investment, allowing developers to focus on data application rather than collection itself.
From Data to Insights: Building Complete Analysis Pipelines
Collecting data is just the first step—transforming raw data into business insights is where core value lies. It’s recommended to establish a layered data processing architecture: a raw data layer for storing complete JSON data returned by APIs, a cleaning layer for data standardization and quality checks, an aggregation layer for summarizing statistics by business dimensions, and an application layer for generating various reports and alerts based on aggregated data.
In practical applications, collected data can be imported into data analysis tools for deep mining. For example, using pandas for time series analysis to identify cyclical patterns in price fluctuations, using machine learning models to predict ranking trend changes, or building competitor profiles to evaluate operational strategies from multiple dimensions. The prerequisite for these advanced analytical capabilities is having high-quality, continuously updated data sources.
Another important application direction is combining monitoring data with your own operational data for analysis. Viewing competitor price changes alone might just be a signal, but if you simultaneously discover your own traffic declining and conversion rates dropping, immediate response is needed. This correlation analysis requires integrating different data sources, which is why more teams choose to build their own data platforms based on APIs, achieving full-chain integration from collection to application.
In Conclusion: Technology Empowering Business Growth
Returning to Zhang Hao’s story from the beginning, when he shifted from self-built scrapers to API integration, he not only solved system stability issues but more importantly unleashed his team’s creativity. Time previously spent maintaining scraping infrastructure can now be invested in optimizing data analysis models and innovating business logic. This transformation brings not just efficiency improvements but a shift in mindset—from focusing on technical implementation details to concentrating on business value creation.
Complete Python Pangolin API integration practices prove that choosing appropriate technical solutions can significantly reduce development costs, shorten launch cycles, and improve system reliability. Whether you’re a developer just starting with e-commerce data collection or a technical lead hoping to optimize existing systems, API integration deserves serious consideration. It represents a more professional and efficient technical path—letting specialized services handle specialized tasks while you focus on creating unique business value.
Starting today, try building your first monitoring system using the code examples provided in this article. Begin with simple product detail retrieval, gradually expand to bestseller monitoring and price tracking, and ultimately establish a complete competitor intelligence system. Technology’s value lies in application, and data’s value lies in insights. When you master these tools and methods, you possess the technical advantage to stand out in e-commerce competition.
Related Resources
- 📖 Pangolin API Complete User Guide – Official Technical Documentation
- 🌐 Pangolin Official Website – Learn More About Our Products
- 💡 API Pricing Plans – Choose the Right Package for You
- 🚀 Free Trial – Start Your Data Collection Journey Today
