Sync Pipeline
The sync pipeline is the core data collection engine. It is driven by AppSyncer and orchestrates the full data lifecycle for each app. The state of pipeline runs is persisted in the sync_statuses table (status, current_step, progress_done/total, failed_items, error_message, job_id, next_retry_at).
Pipeline Flow
Section titled “Pipeline Flow”SyncAppJob │ ▼AppSyncer::syncAll(App) │ ├─ 1. identity() → App metadata (name, publisher, category, languages) │ ├─ findOrCreate Publisher │ ├─ findOrCreate StoreCategory │ └─ Pipeline stops on failure │ ├─ 2. listings() → StoreListing + StoreListingChange per active country + locale │ └─ detectChanges() compares checksum │ ├─ 3. metrics() → AppMetric (per country + day; Android under the `zz` sentinel) │ └─ Computes rating_delta, sets `is_available` │ ├─ 4. finalize() → `apps.last_synced_at`, summary fields, `unavailable_countries` │ └─ 5. reconciling() → ReconcileFailedItemsJob retries transient failuresKeyword density is not a pipeline step — KeywordAnalyzer is called on demand from the keyword endpoint and reads from the existing StoreListing. See the Keyword Density feature page.
Phase Details
Section titled “Phase Details”1. Identity
Section titled “1. Identity”Fetches the app’s core metadata from the store.
- Tries the
usstorefront first and falls back toapps.origin_country_codeon failure - On 404: classified as
empty_response; if no storefront can resolve identity, the pipeline stops this run (so later phases don’t run against an unresolved app) - Updates:
display_name,icon_url,supported_locales,original_release_date,is_free,origin_country_code - Creates or links
PublisherandStoreCategoryrecords
2. Listings
Section titled “2. Listings”For each active country, fetches the store listing in every locale the country supports.
- Creates a
StoreListingrecord (unique(app_id, version_id, locale)) - Writes
title,subtitle,promotional_text(iOS-only),description,whats_new,screenshots,icon_url - Produces a
checksumover the listing contents - If the checksum differs from the previous one and the previous listing belongs to a different
app_versionsrow, compares each field and createsStoreListingChangerecords. Upserts within the same version are silent — this prevents phantom diffs when the scraper re-ingests the same version twice in one pass (e.g. a freshly-discovered single-version app no longer accumulates per-locale noise for that single version) - Added/removed locales from the
supported_localescomparison are marked aslocale_added/locale_removed - No record is written if the storefront does not return the locale
3. Metrics
Section titled “3. Metrics”Fetches per-country ratings and price.
- Creates an
AppMetricrecord (unique(app_id, country_code, date)) - Since Android metrics are global, they are stored under the
zzsentinel country - Persists:
rating,rating_count,rating_breakdown,price(null = unknown, 0 = free),installs_range,file_size_bytes,is_available - Computes
rating_delta(change in rating_count since the previous day) - If a 404 comes back for a country → marked as
empty_response,is_available = falseis written for that country, and it will not be retried
4. Finalize
Section titled “4. Finalize”- Update
apps.last_synced_at - Refresh summary fields and caches
- The
AppDetailResource’sunavailable_countriesfield is derived fromapp_metrics.is_available = falserows apps.is_availablereflects reachability in at least one storefront; the source of truth for per-country availability isapp_metrics
5. Reconciling
Section titled “5. Reconciling”- Examines the
failed_itemsentries previous phases wrote for this run ReconcileFailedItemsJobqueues them atnext_retry_at, honoring the configured max retry count per reason tag- Permanent reasons like
empty_responseare skipped — no infinite retries
Sync Scheduling
Section titled “Sync Scheduling”The Laravel scheduler fires appstorecat:apps:sync-tracked on both platforms every 20 minutes; it pulls stale apps and dispatches a SyncAppJob to sync-tracked-{platform} for each one. Each tick is capped at SYNC_{PLATFORM}_TRACKED_BATCH_SIZE apps (default 5).
The command picks apps in tiered priority order so idle ticks still do useful work:
- Tracked apps (via
user_apps) - Competitor apps (
app_competitors.competitor_app_id) that are not themselves tracked - Any other available app, oldest first
Within each tier, apps that have never been synced are picked before apps with a stale last_synced_at.
| App Type | Refresh Interval | Queue |
|---|---|---|
| Tracked / competitor / backlog iOS | 24 hours | sync-tracked-ios |
| Tracked / competitor / backlog Android | 24 hours | sync-tracked-android |
Apps are only re-synced if their last_synced_at is older than the configured refresh interval.
On-demand Refresh Queue
Section titled “On-demand Refresh Queue”AppController::show() and AppController::listing() dispatch a SyncAppJob to sync-on-demand-ios / sync-on-demand-android when the visited app’s data is stale. The UI polls progress via GET /apps/{platform}/{externalId}/sync-status; the user can also trigger an explicit refresh via POST /apps/{platform}/{externalId}/sync. This keeps user-triggered refreshes on their own worker pool and prevents them from waiting behind the scheduled tracked queue.
Uniqueness Safeguards
Section titled “Uniqueness Safeguards”The pipeline uses database uniqueness constraints to prevent duplicate data:
| Table | Uniqueness Criteria |
|---|---|
apps | (platform, external_id) |
app_store_listings | (app_id, version_id, locale) |
app_versions | (app_id, version) |
app_metrics | (app_id, country_code, date) |
sync_statuses | app_id |
In addition, SyncAppJob enforces ShouldBeUnique with a 1-hour window per app ID.
Error Handling
Section titled “Error Handling”- Identity failure: Pipeline stops,
sync_statuses.status = failedis written. Later phases do not run. - 404
empty_response: The country/locale is marked as permanently “unavailable”; not retried. - Transient failures (5xx, timeout): Written to
failed_itemswith a reason tag;ReconcileFailedItemsJobretries based on the max-attempts rule per reason. - Job-level retry: 3 attempts with
[30, 60, 120]second backoff. - Failed jobs: After all attempts, jobs land in the
failed_jobstable for inspection. - Throttle exceeded: The job waits for a slot (up to 300 seconds).