Submitting a Report Through FINRA fileX

A system-design walkthrough of submitting a regulatory report to FINRA through its fileX file transfer service: what fileX is, how its three access methods differ, why the HTTPS REST API is the right choice for a confirmable submission, what a complete submission means end to end, and the failure, security, audit, and operational concerns that matter in a brokerage pipeline.

Learning outcomes

This page prepares you to design, defend, and explain a real production task: submitting a regulatory report to FINRA through its fileX file transfer service. It is written for an Alpaca system design panel, where the seed prompt is deceptively small (“submit a report through fileX, which access method, and what does a complete submission look like”) and the real test is how clearly you reason about trade-offs in a regulated environment.

After studying this page, you can:

  • Explain why a self-clearing broker-dealer like Alpaca has to file reports to FINRA at all, and why fileX is the channel.
  • Describe what fileX is using a single faithful mental model (a bonded courier depot) and map every API call onto it.
  • Compare the three fileX access methods (SFTP, HTTPS REST, S3 Direct) by the guarantees each one gives, not by surface familiarity.
  • Justify choosing the HTTPS REST API for a report submission from the one property that decides it: programmatic delivery and acceptance confirmation.
  • Define a complete submission precisely as a confirmed downstream acceptance against a persisted tracking id, not a successful byte upload.
  • Walk the five-stage submission flow end to end, and reason about its failure modes, retries, idempotency, security, auditability, and operations.

Before we dive in

You need very little to start. A report here is one file: a structured data file (for example an Electronic Blue Sheets submission or a Rule 4530 filing) that a brokerage is legally required to send to FINRA. fileX is the service that receives it. Everything else on this page is about doing that one transfer so reliably and so provably that you would stake a compliance deadline on it.

Three words recur, so fix them now. An application is a FINRA system that consumes your file (CRD, ACATS, eFOCUS, Bluesheets, and so on). A sub-space is a directory inside an application: you upload into an inbound sub-space and download results from an outbound one. An entitlement is the permission, granted by FINRA, that says your account may submit, download, or both. Hold those three and the rest of the page assembles out of them.

This page assumes no prior knowledge of FINRA systems. It does assume you are comfortable with HTTP, OAuth2 bearer tokens, and the idea of a pre-signed object-storage URL.

Mental Model

The most common wrong model is that submitting a report means uploading a file. Under that model the job ends at HTTP 200, the bytes are in, you move on. That model will fail you in the interview and in production, because in a regulated pipeline a transfer that lands but is then rejected by the downstream system is not a filing, it is a missed obligation that nobody noticed.

The faithful model is a bonded courier depot. You do not walk into FINRA’s building. You authenticate at the front desk, the desk hands you a single-use locker key that expires in about a minute, you place exactly one sealed package in that locker, and the desk confirms it has the package. Then a back office collects the package, opens it, and stamps it accepted or rejected. A complete submission is the whole round trip ending in that accepted stamp, plus a receipt you kept from the moment you were handed the key. Keep this depot picture; every design decision below is a property of it.

Breaking it down

The concept builds in fourteen rungs. The first three set the context and the system, the next three make and execute the core decision, and the last eight scale it, prove it without the Tracking API, harden it, draw it as one architecture, design its data model, teach you to defend it, and rehearse the interview itself. Go, SQL, and diagrams appear from rung 6 on, written so the patterns transfer to any upload pipeline, not only this one.

1. Why a brokerage ships files to FINRA

Start with the reason the task exists, because it shapes every requirement that follows. Alpaca is a developer-first brokerage infrastructure company, and since 2024 it operates as a fully self-clearing broker-dealer (Alpaca Clearing, a member of FINRA and SIPC). Self-clearing matters here: a firm that clears and custodies its own trades carries its own regulatory reporting obligations rather than handing them to a clearing partner. Those obligations include Electronic Blue Sheets, which carry trade and account-holder data that regulators use to reconstruct market activity, plus Rule 4530 filings, FOCUS reports, short interest, and more.

FINRA receives those bulk filings through fileX, its centralized secure file transfer service, and is actively migrating filing types such as Rule 4530 onto it. So “submit a report through fileX” is not a contrived exercise. It is the daily plumbing of a self-clearing brokerage, which is exactly why an Alpaca panel uses it: the API surface is small enough to fully specify in a two-hour call, and rich enough to expose how you think about reliability, confirmation, and audit under real regulatory stakes.

flowchart LR
  G["Report generator (internal)"] --> SUB["Submission service"]
  SUB -->|"1. authenticate"| FIP["FINRA Identity Platform (OAuth2)"]
  SUB -->|"2. upload and 3. track"| FX["FINRA fileX"]
  FX --> APP["Downstream FINRA application"]
  APP -->|"accepted or rejected"| FX

The diagram names the whole job: an internal system produces a report, a submission service authenticates, uploads, and tracks it, and the real verdict comes back from the downstream application, not from fileX alone.

2. The fileX depot in one picture

With the why in place, look at the system the way the depot model predicts. fileX gives each firm its own loading dock per application. Inside a dock are sub-spaces: an inbound bay where you drop submissions, an outbound bay where results appear, and an archive bay that keeps a timestamped copy of everything for thirty calendar days. A submission is simply a write into the inbound bay of the right application.

Two depot rules carry weight later. First, the inbound bay is immediately moved: fileX hands your file to the downstream application as soon as it picks it up, so the file disappears from the bay and you cannot reach back in to change it. Second, the badge system is the FINRA entitlement service, which grants accounts one of read or download only, submit only, or submit and download. That separation of duties is a feature you will deliberately use.

The parts of a fileX dock
A FINRA system that consumes the file, such as Bluesheets, CRD, ACATS, eFOCUS, or 4530. In API paths it is the application segment. Each one defines its own inbound and outbound sub-spaces.

3. The three doors and what each guarantees

The depot has three doors, and the only question that matters is what each one guarantees, not which one feels familiar. All three move bytes securely. They differ sharply in authentication, in how many files move per operation, and, decisively, in whether you can confirm what happened to your file without phoning a human.

The three access methods
Standard SFTP on port 22, password authentication only (SSH keys are not supported yet, so set PreferredAuthentications to password and disable public-key auth). It needs firewall rules in both directions: you open outbound port 22 to FINRA's static IP addresses, and FINRA whitelists your outbound routable IPs. It supports put, get, cd, and ls, but not rename, move, or chmod, and has no append or resume. It moves any number of files per session and has no programmatic tracking: to learn a submission's fate you contact FINRA support.
Method specifics worth knowing

The pattern to notice: SFTP and S3 Direct are throughput doors, built to move many or very large files. Only the HTTPS REST door closes the loop on a single file with a confirmation you can read in code. That single asymmetry is what the next rung turns into a decision.

4. Choosing the door for a report submission

Now combine the requirement with the guarantees. The requirement is a confirmed, auditable submission of one report. The decisive guarantee is programmatic confirmation, and only the HTTPS REST API has it: the fileX guide states that the Tracking API is supported only for uploads made through the HTTPS REST method. With SFTP or S3 Direct you would have to call FINRA support to find out whether a regulatory filing was accepted, which is not a foundation you can automate a compliance deadline on.

flowchart TD
  Q1{"Need programmatic per-file delivery and acceptance proof"}
  Q1 -->|"no, legacy or human bulk"| SFTP["SFTP"]
  Q1 -->|"yes"| Q2{"Single report of modest size"}
  Q2 -->|"yes"| REST["HTTPS REST API"]
  Q2 -->|"multi-gigabyte or thousands of files"| S3["S3 Direct with multipart"]

Three supporting reasons reinforce the choice rather than drive it. The REST upload call returns a tracking id before a single byte moves, giving you a durable receipt to persist for audit. OAuth2 client credentials are built for unattended machine-to-machine services, with no interactive login and no SSH key distribution. And outbound-443-only means no static-IP whitelisting in both directions. The one REST limit, one file per call, simply does not bite when a report is one file.

State the boundary too, because range is what a panel rewards. If a single report were multi-gigabyte, or you had to push thousands of files per cycle, S3 Direct with multipart upload becomes the stronger transport, and you would rebuild the confirmation loop yourself since it has no Tracking API. The honest framing is that REST is correct for this requirement, and the requirement is what selects it.

Check yourself
An automated pipeline must submit one regulatory report and prove it was accepted by the downstream FINRA application, with no human in the loop. Which access method fits, and why?

5. What complete actually means

This rung is the one most candidates miss, and the depot model already told you the answer: complete is the accepted stamp, not the package in the locker. fileX reports a submission through two separate fields, and you must not confuse them. The status field tracks delivery: it moves from request received (the upload URL was issued) to received by FINRA (the bytes arrived through that URL). The downstream status field tracks the verdict from the application: it moves from received to one of accepted, rejected, or partially rejected.

A complete submission is downstream status accepted, recorded against a tracking id you persisted. An HTTP 200 on the byte upload only proves delivery. Saying this distinction out loud, before anyone asks, is the highest-signal move you can make in the interview.

Delivered versus complete
The byte upload returned 200 and the tracking status reads received by FINRA. The package is in the locker. Nothing yet says the filing is valid, and treating this as done is how a rejected regulatory report goes unnoticed.
stateDiagram-v2
  [*] --> RequestReceived: upload URL issued, trackingId returned
  RequestReceived --> ReceivedByFINRA: bytes uploaded, 200 OK
  ReceivedByFINRA --> Received: fileX hands file to the application
  Received --> Accepted: processed clean
  Received --> Rejected: errors, resubmit the whole file
  Received --> PartiallyRejected: errors, resubmit a corrected subset
  Accepted --> [*]
  Rejected --> RequestReceived: new submission, new trackingId
  PartiallyRejected --> RequestReceived: corrected subset, new trackingId

Notice the two reject paths return to the start as brand new submissions. fileX treats every upload as a separate submission with its own tracking id; there is no in-place update and no resume. That fact lands hard in the next rungs on idempotency.

6. The submission flow end to end

Now walk the whole round trip. It is five stages, and each maps onto the depot.

The five-stage submission
1. AuthenticatePOST to the FINRA Identity Platform with HTTP Basic client credentials and grant_type client_credentials. You receive a bearer token valid for about 12 hours. Cache it across submissions and refresh on expiry or on a 401.
Step 1 of 5

The sequence diagram shows the same flow with the actors and the confirmation loop made explicit.

sequenceDiagram
  participant SVC as Submission service
  participant FIP as FINRA Identity Platform
  participant FX as fileX REST API
  participant URL as Pre-signed S3 URL
  participant APP as Downstream application
  SVC->>FIP: POST oauth2 access_token (Basic client creds)
  FIP-->>SVC: bearer token, valid about 12h
  SVC->>FX: PUT upload request to inbound sub-space
  FX-->>SVC: url, expirationTime about 1 min, trackingId
  SVC->>URL: PUT bytes (Content-Length, octet-stream)
  URL-->>SVC: 200 OK
  loop poll with backoff
    SVC->>FX: GET tracking by id
    FX-->>SVC: status and downstreamStatus
  end
  APP-->>FX: processes file and sets downstreamStatus
  Note over SVC: complete when downstreamStatus is accepted

The animated view lets you walk the path frame by frame, including the tracking subpath where the downstream application reports its verdict back. Watch the flow: drag to pan, zoom, or open it fullscreen to study the confirm loop and the reject branch.

The literal calls, with hostnames left as placeholders so nothing here is a live link, look like this.

Stage 1  POST  {fip}/fip/rest/ews/oauth2/access_token?grant_type=client_credentials
               Authorization: Basic base64(clientId:clientSecret)
         200   { "access_token": "...", "expires_in": 43170, "token_type": "Bearer" }

Stage 2  PUT   {filex}/files/{orgId}/{application}/in/{fileName}
               Authorization: Bearer {access_token}
         200   { "url": "...", "expirationTime": "...", "trackingId": "343dab32-516b-4e04-..." }

Stage 3  PUT   {url}
               Content-Length: {bytes}
               Content-Type: application/octet-stream
         200   OK

Stage 4  GET   {filex}/tracking?id={trackingId}
               Authorization: Bearer {access_token}
         200   { "status": "Received by FINRA", "downstreamStatus": "ACCEPTED", ... }

That wire format becomes a small Go client. The first piece caches the bearer token and refreshes it just before the twelve-hour expiry, a pattern you reuse for any token-authenticated upstream.

// Client talks to fileX and caches the OAuth2 bearer token from the FINRA
// Identity Platform. One Client is shared across every file in a cycle.
type Client struct {
	http      *http.Client
	fipBase   string
	filexBase string
	clientID  string
	secret    string

	mu    sync.Mutex
	token string
	exp   time.Time
}

// bearer returns a cached token, refreshing a minute before expiry so the
// twelve-hour FIP token is reused across thousands of uploads.
func (c *Client) bearer(ctx context.Context) (string, error) {
	c.mu.Lock()
	defer c.mu.Unlock()
	if c.token != "" && time.Now().Before(c.exp.Add(-time.Minute)) {
		return c.token, nil
	}
	u := c.fipBase + "/fip/rest/ews/oauth2/access_token?grant_type=client_credentials"
	req, _ := http.NewRequestWithContext(ctx, http.MethodPost, u, nil)
	req.SetBasicAuth(c.clientID, c.secret)
	resp, err := c.http.Do(req)
	if err != nil {
		return "", err
	}
	defer resp.Body.Close()
	if resp.StatusCode != http.StatusOK {
		return "", fmt.Errorf("fip token: status %d", resp.StatusCode)
	}
	var body struct {
		AccessToken string `json:"access_token"`
		ExpiresIn   int    `json:"expires_in"`
	}
	if err := json.NewDecoder(resp.Body).Decode(&body); err != nil {
		return "", err
	}
	c.token = body.AccessToken
	c.exp = time.Now().Add(time.Duration(body.ExpiresIn) * time.Second)
	return c.token, nil
}

The submit path requests the one-minute URL, persists the tracking id receipt before sending a byte, then streams the file from disk. It records intent in a Ledger, the small dedup store built in rung 8, so a crash still leaves a receipt to reconcile.

type Report struct {
	InternalID string // our own id, the basis for deduplication
	Org, App   string
	FileName   string
	Path       string // streamed from disk, never buffered whole
}

type grant struct {
	URL        string `json:"url"`
	TrackingID string `json:"trackingId"`
}

// submitOne runs stages 2 and 3 with the receipt persisted up front.
func (c *Client) submitOne(ctx context.Context, led Ledger, r Report) (string, error) {
	tok, err := c.bearer(ctx)
	if err != nil {
		return "", err
	}
	u := fmt.Sprintf("%s/files/%s/%s/in/%s", c.filexBase, r.Org, r.App, r.FileName)
	req, _ := http.NewRequestWithContext(ctx, http.MethodPut, u, nil)
	req.Header.Set("Authorization", "Bearer "+tok)
	req.Header.Set("Accept", "application/json")
	resp, err := c.http.Do(req)
	if err != nil {
		return "", err
	}
	var g grant
	err = json.NewDecoder(resp.Body).Decode(&g)
	resp.Body.Close()
	if err != nil {
		return "", err
	}
	if err := led.Bind(ctx, r.InternalID, g.TrackingID); err != nil { // receipt before bytes
		return g.TrackingID, err
	}
	return g.TrackingID, putFile(ctx, c.http, g.URL, r.Path)
}

// putFile streams the file to the pre-signed URL within its one-minute window.
func putFile(ctx context.Context, hc *http.Client, url, path string) error {
	f, err := os.Open(path)
	if err != nil {
		return err
	}
	defer f.Close()
	info, err := f.Stat()
	if err != nil {
		return err
	}
	req, _ := http.NewRequestWithContext(ctx, http.MethodPut, url, f)
	req.ContentLength = info.Size()
	req.Header.Set("Content-Type", "application/octet-stream")
	resp, err := hc.Do(req)
	if err != nil {
		return err
	}
	defer resp.Body.Close()
	if resp.StatusCode != http.StatusOK {
		return fmt.Errorf("upload: status %d", resp.StatusCode)
	}
	return nil
}

Confirming is a poll for the downstream verdict, bounded by a deadline-aware context so it cannot loop past the filing SLA. Only ACCEPTED is complete.

type Status struct {
	Status           string `json:"status"`
	DownstreamStatus string `json:"downstreamStatus"`
}

// PollUntilTerminal polls the Tracking API until the downstream verdict is
// terminal or ctx (carrying the SLA deadline) is done. The caller treats
// ACCEPTED as complete and routes REJECTED or PARTIALLY_REJECTED to a correction.
func (c *Client) PollUntilTerminal(ctx context.Context, id string, every time.Duration) (Status, error) {
	t := time.NewTicker(every)
	defer t.Stop()
	for {
		tok, err := c.bearer(ctx)
		if err != nil {
			return Status{}, err
		}
		u := fmt.Sprintf("%s/tracking?id=%s", c.filexBase, id)
		req, _ := http.NewRequestWithContext(ctx, http.MethodGet, u, nil)
		req.Header.Set("Authorization", "Bearer "+tok)
		resp, err := c.http.Do(req)
		if err != nil {
			return Status{}, err
		}
		var s Status
		err = json.NewDecoder(resp.Body).Decode(&s)
		resp.Body.Close()
		if err != nil {
			return s, err
		}
		switch s.DownstreamStatus {
		case "ACCEPTED", "REJECTED", "PARTIALLY_REJECTED":
			return s, nil
		}
		select {
		case <-t.C:
		case <-ctx.Done():
			return s, ctx.Err()
		}
	}
}

7. Scaling to high volume and gigantic files

Rungs 4 through 6 optimized for one modest report and chose REST for its confirmation. Now stress the requirement the way a real Alpaca cycle does: a high number of files per cycle and individual files in the multi-gigabyte range. Two different pressures appear, and they pull the design in opposite directions, so treat them separately.

The first pressure is gigantic single files, and it has a sharp boundary that is easy to miss. Notice first that the REST byte upload already goes to a pre-signed S3 URL, so for an everyday report the bytes land in S3 either way and S3 Direct buys you nothing. The catch is that a pre-signed URL is a single PutObject, and S3 caps a single PUT at 5 GB. Above 5 GB the object cannot be uploaded in one PUT at all; it must use multipart, and multipart needs real S3 credentials to create the upload, send each part, and complete it, which a single fileX pre-signed URL does not give you. That is the actual reason to mint an S3 Direct token: not to reach S3, since REST already does, but to unlock multipart.

With the one-hour S3 token you upload a multipart object: the file is split into parts (5 MiB to 5 GiB each, up to 10000 parts per object), the parts upload concurrently, each part retries on its own, and the object is assembled server-side. A failed part costs one part, not the whole file. Even under 5 GB this is worth it when one stream is too slow or too fragile, because a lone pre-signed PUT is a single stream with a one-minute window to start and no resume.

Two operational limits shape this. fileX sets no hard size cap, but it recommends keeping a file under 60 GB and arranging anything larger with FINRA support, so a truly enormous filing is a conversation, not just a bigger upload. And the S3 security token expires after one hour and is scoped to one application, so a long bulk cycle must refresh it by calling the token endpoint again mid-run rather than minting a single token for everything.

Drag the model to feel why a single stream stops being viable. Switch between a single REST PUT and parallel multipart, and watch the part count and the upload time as the file grows. Two ceilings show up: the verdict turns at 5 GB, the single-PUT limit above which multipart is mandatory, and the part size has to grow past a few hundred gigabytes to stay under the 10000-part cap.

Sizing a gigantic-file upload: one stream versus parallel multipart
File size50 GB
1 GB500 GB
Multipart part size64 MB
8 MB1024 MB
Parallel part uploads16 streams
1 streams64 streams
Throughput per stream100 MB/s
25 MB/s400 MB/s
Upload time, one stream8.53 min
Use S3 multipart with parallel parts

The second pressure is the high file count. One file is a function call; thousands of files is a throughput system. Feed them through a queue (RabbitMQ fits Alpaca’s stack) into a worker pool with bounded concurrency, and bound that concurrency on purpose: too many parallel connections is a real fileX limit (its SFTP method returns a too-many-connections error), and gigantic files must stream from disk rather than load into memory or the host runs out of room. Bounded concurrency is the backpressure that protects FINRA and your own process at once. There is also a control-plane cost to weigh: REST mints a fresh pre-signed URL per file, one file per call, so at thousands of files the single hour-long S3 token removes a fileX round-trip per file and the rate-limit pressure that comes with it.

The cost of the S3 Direct transport is the very property rung 4 prized: it has no Tracking API, so you lose programmatic confirmation. At scale you rebuild confirmation out of band. Keep your own submission ledger keyed by internal id, filename, and byte hash, then verify delivery by listing the archive sub-space through the REST list API (every upload, whatever its transport, lands a timestamped copy in the matching archive space for thirty days), and verify acceptance by reading the application’s results from its outbound sub-space. The shape becomes a hybrid: S3 Direct moves the bytes, while REST list calls and your ledger close the loop.

flowchart TD
  P["Report producer"] --> Q["Queue (bounded)"]
  Q --> W["Worker pool (bounded concurrency)"]
  W -->|"small file"| R["REST submit and track (rung 6)"]
  W -->|"gigantic file"| M["S3 multipart upload"]
  R --> L["Submission ledger"]
  M --> L
  L --> RC["Reconciler"]
  RC -->|"list archive sub-space"| FX["fileX REST list API"]
  RC -->|"read results"| OUT["Outbound sub-space"]
  RC --> DONE["Mark complete"]

The architecture falls out of those forces. A producer drops jobs on a queue. A bounded worker pool pulls them and routes each by size: a small file takes the REST submit-and-track path from rung 6, a gigantic file takes the S3 multipart path. A reconciler then sweeps the ledger against the archive listing and the results to mark each submission complete.

Now the Go. First, bound the fan-out so a big cycle never overwhelms either end. The same shape fits any high-fan-out IO task.

import (
	"context"

	"golang.org/x/sync/errgroup"
)

// SubmitAll processes a whole cycle with bounded concurrency. SetLimit caps how
// many uploads run at once: the backpressure that keeps us under fileX connection
// limits and keeps gigantic files from exhausting host memory.
func SubmitAll(ctx context.Context, reports []Report, maxInFlight int, submit func(context.Context, Report) error) error {
	g, ctx := errgroup.WithContext(ctx)
	g.SetLimit(maxInFlight)
	for _, r := range reports {
		r := r
		g.Go(func() error {
			return submit(ctx, r) // route by size inside submit: REST or S3 multipart
		})
	}
	return g.Wait() // the first error cancels the rest through ctx
}

Next, the gigantic-file transport. The AWS SDK uploader turns one call into a concurrent, resumable multipart upload streamed from disk, the standard answer whenever a single object is too large for one request.

import (
	"context"
	"os"

	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/credentials"
	"github.com/aws/aws-sdk-go-v2/feature/s3/manager"
	"github.com/aws/aws-sdk-go-v2/service/s3"
)

// S3Token mirrors the credentials fileX returns from its S3TransferTokens
// endpoint: temporary keys scoped to the application's write paths, valid 1 hour.
type S3Token struct {
	Region, AccessKeyID, SecretAccessKey, SessionToken string
}

func s3FromFileXToken(t S3Token) *s3.Client {
	creds := credentials.NewStaticCredentialsProvider(t.AccessKeyID, t.SecretAccessKey, t.SessionToken)
	return s3.New(s3.Options{Region: t.Region, Credentials: aws.NewCredentialsCache(creds)})
}

// uploadGiant streams a multi-gigabyte file as a concurrent multipart upload: the
// manager splits it into parts, uploads them in parallel, retries any single
// failed part, and assembles them server-side. Body streams, so the file is
// never fully held in memory.
func uploadGiant(ctx context.Context, c *s3.Client, bucket, key, path string) error {
	f, err := os.Open(path)
	if err != nil {
		return err
	}
	defer f.Close()

	up := manager.NewUploader(c, func(u *manager.Uploader) {
		u.PartSize = 64 * 1024 * 1024 // 64 MiB; raise it so parts stay under the 10000 limit
		u.Concurrency = 16            // parallel parts in flight
	})
	_, err = up.Upload(ctx, &s3.PutObjectInput{
		Bucket: aws.String(bucket),
		Key:    aws.String(key),
		Body:   f,
	})
	return err
}

Routing is then trivial: a worker calls submitOne for a small file or uploadGiant for a large one, the bounded pool caps the fan-out, and the reconciler restores the confirmation that S3 Direct does not give you. You get scale without losing the audit trail.

8. Confirmation without the Tracking API

Rung 7 said you “rebuild confirmation out of band” for S3 Direct. This rung makes that concrete, because it is the question that separates a glib answer from a real one. Your instinct is correct: the tracking id is returned only by the REST upload request, and the Tracking API reports only REST uploads, so an S3 upload (and an SFTP upload) has no tracking id and cannot be queried through that API at all. FINRA’s own guidance for SFTP and S3 is to contact support to check a file’s status. So the question is not “how do I call tracking for S3” (you cannot), it is “what does fileX let me observe instead”.

What fileX lets you observe is files, not a status endpoint. Recall the token you mint for S3 Direct: its response carries writePaths and readPaths. The token grants write to the inbound path (in) and, crucially, read to two locations FINRA writes back into: the application’s results path (out) and its archive path (in_arcv). That read grant is the entire feedback surface. You are not polling a status, you are reading the artifacts FINRA produces as your file moves through the system.

Those artifacts map cleanly onto the two REST status fields from rung 5, which is the key mental bridge.

Two reads of the same verdict
One Tracking API call by trackingId returns status and downstreamStatus: received by FINRA, then accepted, rejected, or partially rejected. Available only for REST uploads.
flowchart LR
  W["Your S3 write to in/"] --> MV["fileX picks it up (immediately moved)"]
  MV --> ARC["Copy lands in in_arcv (30 days, timestamp prefix)"]
  MV --> APP["Downstream application processes"]
  APP --> OUT["Result lands in out/"]
  ARC -. "read (delivery)" .-> RC["Your reconciler"]
  OUT -. "read (acceptance)" .-> RC
  RC --> LED["Ledger: delivered, then accepted or rejected"]

The diagram shows why delivery is reliable but acceptance is conditional. Every upload by any method is archived to in_arcv for thirty days, so the delivery signal is universal. Acceptance, though, depends on the application actually writing a machine-readable result to out, and that varies: CRD writes results and reports, eFOCUS writes to its out space, but ACATS exposes only an inbound space with no out at all. Where there is no result artifact, S3 cannot give you acceptance, and you fall back to REST for that filing or to a FINRA support request.

One correctness trap makes or breaks this loop: correlation. The archive only prepends a timestamp to your original filename, and re-uploads are separate submissions, so to map an archived or result file back to one submission you must make the name itself carry your own unique id. Bake the internal submission id into the filename (underscores are allowed, the # character is not), then a substring match on the timestamped archive name recovers the submission unambiguously.

In Go, the delivery check is a list of the read-only archive prefix, matched on that unique id.

import (
	"context"
	"strings"

	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/service/s3"
)

// confirmDelivered scans the archive path (in_arcv, granted by the token's
// readPaths) for a copy of our object. fileX archives every received file with a
// timestamp prefix, so we match on the unique id baked into the name. A hit is
// the S3 analogue of "Received by FINRA".
func confirmDelivered(ctx context.Context, c *s3.Client, bucket, archivePrefix, uniqueID string) (bool, error) {
	p := s3.NewListObjectsV2Paginator(c, &s3.ListObjectsV2Input{
		Bucket: aws.String(bucket),
		Prefix: aws.String(archivePrefix), // e.g. "0001/app1/in_arcv/"
	})
	for p.HasMorePages() {
		page, err := p.NextPage(ctx)
		if err != nil {
			return false, err
		}
		for _, obj := range page.Contents {
			if strings.Contains(aws.ToString(obj.Key), uniqueID) {
				return true, nil // received and archived
			}
		}
	}
	return false, nil
}

Acceptance is a read of the application’s result object, then a parse you write per application, because the format is the application’s, not fileX’s.

import (
	"context"
	"io"

	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/service/s3"
)

// fetchResult downloads an application's result from the out path (also in the
// token's readPaths) so the caller can parse the per-submission verdict. Not
// every application emits one: ACATS, for example, exposes only an inbound space.
func fetchResult(ctx context.Context, c *s3.Client, bucket, key string) ([]byte, error) {
	out, err := c.GetObject(ctx, &s3.GetObjectInput{
		Bucket: aws.String(bucket),
		Key:    aws.String(key),
	})
	if err != nil {
		return nil, err
	}
	defer out.Body.Close()
	return io.ReadAll(out.Body)
}

So, to answer the reviewer directly: yes, you can build a feedback loop around S3, but it is not the Tracking API and it is not a tracking id. It is your reconciler reading the archive and results that fileX writes into paths your S3 token can read, correlated by an id you control, with REST list and download as the equivalent reads when you prefer them and FINRA support as the documented backstop. Keep the support escalation real: have the org id, user account, access method, application space, and filename ready, because that is exactly what FINRA support asks for.

9. Failure modes retries and idempotency

A round trip with this many hops fails in predictable ways, and the depot tells you how to handle each. Open each failure and notice that the fix is either a tighter sequence or your own bookkeeping, never a hope that fileX will save you.

Failure modes and their handling

Idempotency deserves its own honest sentence, because a panel will push on it. True exactly-once delivery is impossible here: fileX has no idempotency key and treats every PUT as a new submission. The achievable target is at-least-once delivery with internal deduplication. Persist your intent with an internal submission id before uploading, stamp that id onto the file as optional fileX metadata, and reconcile against the returned tracking id, so a double-send is at least detectable and reversible by escalation even though it cannot be prevented at the API. Retry transport failures with backoff; never auto-retry a content rejection.

Two small, reusable pieces carry this in Go. The first is a generic retry that backs off with full jitter and retries only errors marked retryable, so a content rejection never loops.

import (
	"context"
	"errors"
	"math/rand"
	"time"
)

// retryable is implemented by transport errors safe to retry. Content rejections
// (a REJECTED verdict) deliberately do not implement it, so they never loop.
type retryable interface{ Retryable() bool }

func isRetryable(err error) bool {
	var r retryable
	return errors.As(err, &r) && r.Retryable()
}

// Retry runs op up to attempts times with exponential backoff and full jitter,
// stopping early on success, on a non-retryable error, or on context cancel.
func Retry(ctx context.Context, attempts int, base time.Duration, op func() error) error {
	var err error
	for i := 0; i < attempts; i++ {
		if err = op(); err == nil || !isRetryable(err) {
			return err
		}
		window := base << i                                // exponential
		delay := time.Duration(rand.Int63n(int64(window))) // full jitter
		select {
		case <-time.After(delay):
		case <-ctx.Done():
			return ctx.Err()
		}
	}
	return err
}

The second is the ledger that makes a duplicate detectable, since fileX cannot prevent one. Record intent before the upload, bind the tracking id after, and treat a second tracking id for one internal id as an alert.

// Ledger is the at-least-once safety net: it records intent before an upload so a
// retried send is detectable, because fileX has no idempotency key of its own.
type Ledger interface {
	// Reserve records the internal id and byte hash once; firstTime is false if
	// the id already exists, which flags a retry of an in-flight submission.
	Reserve(ctx context.Context, internalID, fileHash string) (firstTime bool, err error)
	// Bind links a successful upload's trackingId back to the internal id. A
	// second trackingId for one internal id means a duplicate reached FINRA: alert.
	Bind(ctx context.Context, internalID, trackingID string) error
}

10. Security auditability and operations

Because this is a regulated brokerage pipeline, the non-functional concerns are first-class, not an afterthought. Group them so they are easy to recite.

For security, authenticate with OAuth2 client credentials over TLS 1.2 and keep credentials in a managed secret store, never in code or logs. Use a submit-only entitlement for this pipeline, so a leaked credential cannot exfiltrate downloads; this is the separation-of-duties feature from rung 2 put to work. Crucially, log no file contents: Blue Sheets carry account-holder and trade data, so you log the tracking id, a byte hash, and status transitions, not the payload.

For auditability, the artifact a compliance team or regulator will ask for is a record per submission: the internal id, the tracking id, the application and sub-space, the filename and a hash of the exact bytes sent, the submitter identity, every status transition with timestamps, and the final verdict. fileX keeps its own archived copy in the matching archive sub-space for thirty days, and tracking answers for ninety, but your durable record is the source of truth.

For observability and operations, emit metrics for attempts, accept and reject rates, time-to-accepted, and retry counts; alert on rejections, on stuck submissions past the SLA, and on auth failures. Validate against the fileX customer-test and QA tiers, which have separate hostnames, before touching production; this also answers the inevitable question of how you test without hitting live FINRA systems. Since REST moves one file per call and URLs are short-lived, a queue with bounded concurrency per sub-space is the natural shape, and a periodic reconciliation job over non-terminal submissions closes the gap when an inline poll is lost.

Three more operational realities round this out. Credentials are not instant: a machine-to-machine fileX account is provisioned through FINRA’s entitlement program by the firm’s security administrator as a paper-based request, so the access your pipeline needs has a lead time to plan around, not a self-serve toggle. Service health is observable: fileX publishes a health endpoint you can gate retries on and use to surface upstream outages. And the support path is part of the design rather than an afterthought, because it is the only status route for SFTP and S3: when you escalate, FINRA support expects the org id, user account, access method, application space, and filename, so log those on every submission and the escalation becomes a lookup rather than an investigation.

11. Architecture and swimlanes end to end

Every previous rung built one part. This rung steps back and draws the whole system at high level, then as swimlanes that show who does what, in what order, and where the two transport paths diverge. Treat these as the diagrams you would whiteboard in the panel, zooming from planes to lanes to pods.

Start with the logical architecture. Three planes: your internal plane, the FINRA plane, and the fileX storage plane on S3. The submission service authenticates once, the worker pool routes each file by size, and the ledger plus the reconciler are what make confirmation durable across both transport paths.

flowchart TB
  subgraph internal["Alpaca internal plane"]
    GEN["Report generators EBS 4530 FOCUS"]
    Q["Submission queue RabbitMQ"]
    SVC["Submission service"]
    POOL["Worker pool bounded concurrency"]
    LED["Submission ledger PostgreSQL"]
    REC["Reconciler scheduled"]
    SEC["Secret manager"]
    OBS["Metrics logs alerts"]
  end
  subgraph finra["FINRA plane"]
    FIP["FINRA Identity Platform OAuth2"]
    REST["fileX REST API"]
    TRK["Tracking API REST uploads only"]
    APP["Downstream applications"]
  end
  subgraph storage["fileX storage plane on AWS S3"]
    INB["in inbound"]
    OUTB["out results"]
    ARCV["in_arcv archive 30 days"]
  end
  GEN --> Q --> SVC --> POOL
  SEC -. "creds" .-> SVC
  POOL -->|small REST| REST
  POOL -->|gigantic S3| INB
  SVC -->|token| FIP
  REST --> INB
  INB --> APP
  APP --> OUTB
  INB --> ARCV
  POOL --> LED
  REC --> LED
  REC -->|list and download| OUTB
  REC -->|read| ARCV
  TRK -. "status" .-> REC
  REST -. "trackingId" .-> LED
  POOL -. "metrics" .-> OBS
  REC -. "metrics" .-> OBS

The REST path as a swimlane makes the ordering explicit: the receipt (trackingId) is bound in the ledger before any bytes move, the poll runs until a terminal verdict or the SLA, and only ACCEPTED marks complete while a reject branches to correction and an alert.

sequenceDiagram
  autonumber
  participant P as Submission service
  participant L as Ledger
  participant F as FINRA Identity Platform
  participant X as fileX REST API
  participant S as Pre-signed S3 URL
  participant D as Downstream application
  participant O as On-call
  P->>L: reserve internal id and byte hash
  P->>F: request OAuth2 token, cached ~12h
  F-->>P: bearer token
  P->>X: request upload URL for in sub-space
  X-->>P: url, expirationTime ~1m, trackingId
  P->>L: bind trackingId, receipt before bytes
  P->>S: PUT bytes octet-stream
  S-->>P: 200 OK
  loop poll with backoff until terminal or SLA
    P->>X: GET tracking by id
    X-->>P: status and downstreamStatus
  end
  D-->>X: sets downstreamStatus
  alt downstreamStatus ACCEPTED
    P->>L: mark complete
  else REJECTED or PARTIALLY_REJECTED
    P->>L: mark for correction
    P->>O: alert resubmit needed
  end

The S3 Direct path for bulk and gigantic files is a different swimlane, because there is no trackingId and no Tracking API. The worker mints a per-application token, uploads multipart, and the reconciler later reads the archive and results that fileX writes back into the paths the token can read.

sequenceDiagram
  autonumber
  participant W as Bulk worker
  participant F as FINRA Identity Platform
  participant T as fileX token API
  participant B as fileX S3 bucket
  participant D as Downstream application
  participant R as Reconciler
  participant L as Ledger
  W->>F: OAuth2 token
  F-->>W: bearer token
  W->>T: GET S3 transfer token, per app ~1h
  T-->>W: STS creds, read out and in_arcv, write in
  W->>B: multipart upload to in, parallel parts
  B-->>W: upload complete
  W->>L: mark uploaded, no trackingId for S3
  Note over B,D: fileX moves the file from in and archives a copy to in_arcv
  D->>B: writes result to out
  loop reconciliation sweep
    R->>B: list in_arcv for delivery and get out for acceptance
    R->>L: update delivered then accepted
  end

Put both paths on one swimlane canvas and the shared spine appears: a producer lane, a submission and worker lane, a fileX lane, a downstream application lane, and a reconciler lane, with the small-file and gigantic-file routes rejoining at confirmation.

flowchart LR
  subgraph PR["Producer lane"]
    A1["Generate report file"]
    A2["Enqueue job"]
  end
  subgraph SVL["Submission and worker lane"]
    B1["Reserve in ledger"]
    B2["Authenticate via FIP"]
    B3["Route by size"]
    B4["REST submit and track"]
    B5["S3 multipart upload"]
  end
  subgraph FXL["fileX lane"]
    C1["Issue upload URL or token"]
    C2["Receive file in in"]
    C3["Archive to in_arcv"]
    C4["Hand to application"]
  end
  subgraph DSL["Downstream app lane"]
    D1["Process file"]
    D2["Write result to out"]
  end
  subgraph RCL["Reconciler lane"]
    E1["Read in_arcv and out"]
    E2["Mark delivered or accepted"]
  end
  A1 --> A2 --> B1 --> B2 --> B3
  B3 -->|small| B4 --> C1
  B3 -->|gigantic| B5 --> C2
  C1 --> C2 --> C3
  C2 --> C4 --> D1 --> D2
  C3 --> E1
  D2 --> E1 --> E2
  B4 -. "trackingId" .-> E2

Finally the runtime view, mapped onto Alpaca’s stack: Go services on GKE, RabbitMQ for the queue, PostgreSQL for the ledger and audit trail, a secret manager for credentials, and a single static egress so FINRA can whitelist it, all reaching FINRA and S3 over outbound 443.

flowchart TB
  subgraph gcp["Alpaca runtime on GCP GKE"]
    subgraph k8s["Kubernetes cluster"]
      API["Submission API Go"]
      WRK["Upload workers Go autoscaled"]
      RECP["Reconciler CronJob"]
    end
    MQ["RabbitMQ"]
    PG["PostgreSQL ledger and audit"]
    SM["Secret Manager"]
    MON["Prometheus and alerting"]
  end
  subgraph edge["Egress"]
    NAT["Static egress NAT"]
  end
  subgraph ext["FINRA and AWS"]
    FIP2["FINRA Identity Platform"]
    FXR["fileX REST API"]
    S3B["fileX S3 bucket"]
  end
  API --> MQ --> WRK
  API --> PG
  WRK --> PG
  RECP --> PG
  SM -. "creds" .-> API
  SM -. "creds" .-> WRK
  API --> NAT
  WRK --> NAT
  RECP --> NAT
  NAT -->|443| FIP2
  NAT -->|443| FXR
  NAT -->|443| S3B
  WRK -. "metrics" .-> MON
  RECP -. "metrics" .-> MON

Read together, these five views are the same system at five zoom levels, and each answers a different panel question: what are the parts, who acts when on the REST path, how the S3 path differs, where the two transports rejoin, and what it all runs on.

12. Designing the data model and components

Everything so far implies a small amount of state you must own, because fileX owns almost none of it for you. The spine is a submission ledger: one row per logical submission, an append-only event log for the audit trail, and a few indexes that make the reconciler cheap. Model it before you write handlers, because the schema is where idempotency, auditability, and the SLA all become concrete.

erDiagram
  SUBMISSION ||--o{ SUBMISSION_EVENT : has
  SUBMISSION {
    uuid internal_id PK
    text application
    text file_name
    bytea byte_hash
    text access_method
    text tracking_id
    text object_key
    text state
    text downstream_status
    timestamptz sla_deadline
    timestamptz completed_at
  }
  SUBMISSION_EVENT {
    bigint id PK
    uuid internal_id FK
    timestamptz at
    text kind
    text source
    jsonb detail
  }

The ledger row carries identity and current state; the event table is the immutable history a regulator can read. Two constraints do the heavy lifting: a unique internal id makes a retried submit collide at the database rather than duplicate at FINRA, and a partial index over non-terminal states keeps the reconciler sweep fast even as completed rows pile up.

-- One row per logical submission. The unique internal_id is the idempotency key
-- fileX does not provide, so a retried submit collides here, not at FINRA.
CREATE TABLE submission (
    internal_id       uuid PRIMARY KEY,
    application       text        NOT NULL,            -- bluesheets, crd, 4530, ...
    sub_space         text        NOT NULL DEFAULT 'in',
    file_name         text        NOT NULL,            -- carries internal_id for correlation
    byte_hash         bytea       NOT NULL,            -- sha-256 of the exact bytes sent
    size_bytes        bigint      NOT NULL,
    access_method     text        NOT NULL CHECK (access_method IN ('rest','s3','sftp')),
    object_key        text,                            -- S3 key when access_method = 's3'
    tracking_id       text,                            -- present only for REST uploads
    state             text        NOT NULL,            -- see the state machine below
    downstream_status text,                            -- RECEIVED/ACCEPTED/REJECTED/PARTIALLY_REJECTED
    sla_deadline      timestamptz NOT NULL,
    created_at        timestamptz NOT NULL DEFAULT now(),
    completed_at      timestamptz,
    UNIQUE (application, byte_hash)                     -- optional: also catch identical re-sends
);

-- Append-only audit trail: every transition, when it happened, and where the
-- signal came from (tracking API, archive listing, or results read).
CREATE TABLE submission_event (
    id          bigserial PRIMARY KEY,
    internal_id uuid        NOT NULL REFERENCES submission(internal_id),
    at          timestamptz NOT NULL DEFAULT now(),
    kind        text        NOT NULL,                  -- reserved, uploaded, delivered, accepted, ...
    source      text        NOT NULL,                  -- tracking | archive | results | client
    detail      jsonb       NOT NULL DEFAULT '{}'
);

-- The reconciler only scans open submissions, so index those, not the whole
-- table. A partial index stays small as completed rows accumulate.
CREATE INDEX submission_open_idx ON submission (sla_deadline)
    WHERE state NOT IN ('complete','needs_correction','escalated');
CREATE INDEX submission_tracking_idx ON submission (tracking_id);
CREATE INDEX submission_event_by_sub_idx ON submission_event (internal_id, at);

A submission moves through one state machine regardless of transport, which is exactly what lets the REST and S3 paths share a ledger.

stateDiagram-v2
  [*] --> Reserved: row written, internal id and hash
  Reserved --> Uploading: bytes in flight (REST PUT or S3 multipart)
  Uploading --> Uploaded: 200 OK or multipart complete
  Uploaded --> Delivered: archive copy seen or tracking received
  Delivered --> Accepted: downstream ACCEPTED
  Delivered --> Rejected: downstream REJECTED or PARTIALLY_REJECTED
  Accepted --> Complete
  Rejected --> NeedsCorrection
  Uploaded --> Escalated: no delivery before SLA
  Delivered --> Escalated: no verdict before SLA
  Complete --> [*]

The reconciler is then just a claim query plus the reads from rung 8. Many reconciler workers can run at once because the claim skips locked rows.

-- Claim a batch of open submissions whose deadline is approaching. SKIP LOCKED
-- lets many workers sweep in parallel without stepping on each other.
SELECT internal_id, application, file_name, object_key, access_method, tracking_id
FROM   submission
WHERE  state NOT IN ('complete','needs_correction','escalated')
  AND  sla_deadline < now() + interval '30 minutes'
ORDER  BY sla_deadline
LIMIT  100
FOR UPDATE SKIP LOCKED;

On the component side, hide fileX behind one interface so handlers depend on a seam, tests use a fake, and only a few integration tests touch the customer-test tier. The two implementations (REST and S3 Direct) share the ledger and the state machine above.

import (
	"context"
	"time"

	"golang.org/x/time/rate"
)

// FileX is the seam every handler depends on. One implementation wraps REST,
// another wraps S3 Direct; both write the same ledger and state machine.
type FileX interface {
	RequestUpload(ctx context.Context, r Report) (grant, error)        // REST: url + trackingId
	PutBytes(ctx context.Context, url, path string) error              // REST: pre-signed PUT
	Track(ctx context.Context, trackingID string) (Status, error)      // REST only
	S3Token(ctx context.Context, app string) (S3Token, error)          // S3 Direct
	ListArchive(ctx context.Context, prefix, uniqueID string) (bool, error)
}

// guarded wraps any call with a shared rate limiter and the Retry helper from
// rung 9, so every fileX interaction respects FINRA's connection limits and
// backs off on transient failures without hammering the service.
func guarded[T any](ctx context.Context, lim *rate.Limiter, call func() (T, error)) (T, error) {
	var out T
	if err := lim.Wait(ctx); err != nil { // blocks until a token frees or ctx ends
		return out, err
	}
	err := Retry(ctx, 4, 200*time.Millisecond, func() error {
		var e error
		out, e = call()
		return e
	})
	return out, err
}

The object-key scheme is the last design choice, and it is what makes the rung 8 feedback loop work: bake the internal id into the filename so the timestamp-prefixed archive copy and any result file stay correlatable, for example EBS_<internalID>_<yyyymmdd>.zip. Underscores are allowed; the # character is not. With that one convention, a substring match on the archive listing recovers the submission, and the ledger ties every artifact back to one row.

What the fileX client owns

13. Defending the design under questioning

The last rung is how you talk, because the panel grades reasoning, not recall. Use one template for every decision: claim, because evidence, with this trade-off, and here is when I would change my mind. The fileX choice becomes: I chose REST because it is the only method with a programmatic Tracking API and the requirement is confirmed auditable submission; the trade-off is one file per call and a one-minute URL window; I would switch to S3 Direct for multi-gigabyte or bulk and rebuild confirmation myself. The completeness claim becomes: complete is downstream accepted because a 200 only proves bytes landed in fileX, not that the application accepted the filing.

Before committing to any design, ask the questions that genuinely change it: which application and report (it sets the sub-space and naming rules), the file size and volume and cadence (the REST-versus-S3 hinge), the regulatory deadline and how much margin a reject-and-correct cycle needs, whether you also own downloading and reconciling result files, and whether the target application even emits a machine-readable result you can read back, since some applications such as ACATS expose only an inbound space and give you no acceptance artifact to confirm over S3. When you do not know something, state the assumption, design for it, and name the trigger that would change the design. That habit, more than any single API fact, is what reads as senior.

14. A simulated panel interview

A two-hour panel is a conversation, not a quiz, so rehearse it as one. First walk the arc of the call, then answer each question out loud before you reveal the model answer. The questions below are the ones this page prepares you for, including the one a sharp interviewer asks to test whether you really understand the transport.

The two-hour arc
0:00 Warm-upBackground, a recent project, why Alpaca. Keep it to a few minutes and steer toward systems you have actually shipped.
Step 1 of 7

Now rehearse. Read each question, answer it aloud, then open the answer and compare.

Rehearse: read the question, answer aloud, then reveal

What the panel scores is visible in those answers: you defined done rigorously, you conceded a premise honestly and then drew the real boundary, you reached for the database to fix what the API cannot, and you asked before you designed. That judgment, more than any single fileX fact, is the signal.

Mastery Questions

  1. A teammate says the submission service is done because the byte upload returns 200 and the tracking status reads received by FINRA. Why is this wrong, and what would you check instead?

    Answer. Received by FINRA only means the bytes arrived in fileX, which is delivery, not acceptance. fileX reports the verdict in a separate field, the downstream status, which moves from received to accepted, rejected, or partially rejected once the downstream application processes the file. A complete submission is downstream status accepted, recorded against the persisted tracking id. If you stop at 200, a rejected regulatory filing looks successful and the obligation is silently missed. You would poll the Tracking API by id until the downstream status is terminal, treat only accepted as done, and route rejected or partially rejected to a correction and a new submission.

  2. The panel asks why you did not use SFTP or S3 Direct, given they move many files and large files better. How do you answer without sounding dogmatic?

    Answer. Lead with the requirement, then the deciding property. The requirement is a confirmed, auditable, automated submission of one report, and the deciding property is programmatic confirmation, which only the HTTPS REST method provides through its Tracking API. SFTP and S3 Direct have no tracking, so proving acceptance would mean contacting FINRA support, which you cannot build an automated compliance deadline on. REST also fits the shape: OAuth2 for unattended machine-to-machine work, outbound 443 only with no IP whitelisting, and a tracking id receipt issued before any bytes move. The one-file-per-call limit is irrelevant for a single report. Then show range: if the file were multi-gigabyte or you had to push thousands per cycle, S3 Direct with multipart would win on transport and you would rebuild the confirmation loop yourself. REST is correct for this requirement, not in the abstract.

  3. fileX has no idempotency key and treats every upload as a new submission. A transient network error causes your service to retry an upload that had actually succeeded. What goes wrong, and how do you bound the damage?

    Answer. The retry creates a second, independent submission with its own tracking id, so FINRA receives the report twice and processes both. You cannot prevent this at the API, so you aim for at-least-once delivery with your own deduplication rather than pretending you have exactly-once. Persist the intent to submit under an internal id before the first upload, stamp that id onto the file as optional fileX metadata, and reconcile every returned tracking id back to the internal id. A duplicate then shows up as two tracking ids for one internal id, which an operator can detect and escalate to FINRA to reverse. You also separate retry policy by cause: retry transport failures with bounded backoff and jitter, but never auto-retry a content rejection, since that is a verdict, not a glitch.

  4. How would you design the confirmation step so it neither hammers the Tracking API nor lets a stuck submission slip past a filing deadline?

    Answer. Poll with exponential backoff and jitter rather than a tight loop, since downstream processing takes time and the API should not be hammered. Cap the polling at an SLA threshold derived from the regulatory deadline minus enough margin for a reject-and-correct cycle; when the cap is reached without a terminal status, stop polling and alert on-call instead of looping forever. Back the inline poller with a periodic reconciliation job that sweeps all non-terminal submissions and re-queries tracking, so a submission whose inline poll was lost to a restart is still resolved. Persist every status transition with timestamps, both for the deadline math and for the audit trail. The combination gives timely confirmation, bounded load, and no silent misses.

  5. Alpaca needs to push thousands of files per cycle, several of them tens of gigabytes. How does your design change, and what do you give up?

    Answer. Two pressures, two answers. For the gigantic files, REST is the wrong transport: one file per call, a one-minute URL window, and no resume mean a long stream that fails near the end starts over. Switch those to S3 Direct, mint the one-hour S3 token, and upload multipart so the file splits into parts that upload in parallel and retry independently, keeping the part size large enough to stay under the 10000-part limit. For the high count, put the work behind a queue and a worker pool with bounded concurrency, streaming each file from disk so memory stays flat and FINRA never sees too many connections. What you give up is the Tracking API, which only covers REST uploads, so you rebuild confirmation out of band: a submission ledger keyed by internal id and byte hash, delivery verified by listing the archive sub-space through the REST list API, and acceptance verified from the application’s results. The honest summary is a hybrid: S3 Direct for throughput, REST and your own ledger to still prove every file landed and was accepted.

  6. Your team wants to move gigantic filings over S3 Direct. A reviewer asks: if the tracking id only comes from the REST upload, how do you confirm an S3 submission was received and accepted?

    Answer. Accept the premise, because it is right: the tracking id is returned only by the REST upload request, and the Tracking API reports only REST uploads, so an S3 upload has no tracking id and cannot be queried that way. fileX still gives you a feedback surface, just not a status API. The S3 token’s readPaths grant read access to two write-back locations: the application’s in_arcv archive and its out results. The archived copy appearing in in_arcv, retained thirty days with a timestamp-prefixed name, is the delivery signal, the S3 analogue of received by FINRA; the application’s result file in out is the acceptance signal, the analogue of downstream accepted. You correlate both back to your submission by baking a unique id into the filename, since re-uploads are separate submissions and the archive only adds a timestamp. The honest caveats: acceptance is readable only where the application emits a result (ACATS, for instance, exposes only an inbound space), so where it does not, you submit that file over REST to get real tracking or fall back to a FINRA support request, which is FINRA’s documented status path for S3 and SFTP. Because the same files are visible across methods, the clean design is a hybrid: S3 for throughput, then REST list on in_arcv and REST download on out, plus your own ledger, to close the loop.

Sources & evidence15 claims · 7 cited

fileX mechanics (access methods, OAuth2 via FINRA Identity Platform, pre-signed URLs, the status and downstreamStatus model, the Tracking API being REST-only, entitlements, retention) are sourced to the FINRA fileX User Guide v1.4.0 provided for the exercise. The regulatory context (Alpaca as a self-clearing broker-dealer since 2024, Blue Sheets and Rule 4530 filed through fileX) and the interview-format expectations are web-verified against FINRA.org, Alpaca, and public reporting. Interview-conduct details are reasonable inferences from those sources and are framed as such in the body.

  • Alpaca operates as a fully self-clearing broker-dealer since 2024 (Alpaca Clearing, a member of FINRA and SIPC), so it carries its own regulatory reporting obligations.verified
  • FINRA receives bulk regulatory filings such as Electronic Blue Sheets and Rule 4530 filings through fileX, its centralized secure file transfer service, and is migrating filing types onto it.verified
  • fileX offers three access methods: SFTP on port 22 with password authentication, an HTTPS REST API on port 443 authenticated by an OAuth2 bearer token, and AWS S3 Direct using a one-hour S3 security token.verified
  • The fileX Tracking API is supported only for uploads made through the HTTPS REST method; SFTP and S3 Direct require contacting FINRA support to check a submission's status.verified
  • An HTTPS REST upload is two steps: a request that returns a url, an expirationTime of about one minute, and a trackingId, then a PUT of the bytes; the OAuth2 bearer token from the FINRA Identity Platform lasts about 12 hours.verified
  • fileX reports a submission through a status field (Request Received, then Received by FINRA) and a separate downstreamStatus field (RECEIVED, ACCEPTED, REJECTED, PARTIALLY_REJECTED); a complete submission is downstreamStatus ACCEPTED.verified
  • fileX has no idempotency key and treats every upload as a separate submission with its own trackingId; uploaded files are archived for 30 calendar days and tracking covers the last 90 days.verified
  • FINRA entitlements grant a fileX account one of read or download only, submit only, or submit and download, enabling separation of duties such as a submit-only pipeline account.verified
  • Alpaca's technical assessment includes a roughly two-hour panel of questions plus an exercise, with the broader process spanning a recruiter screen, hiring-manager round, a technical task, and a panel covering coding, test scenarios, and CI/CD.verified
  • For high-volume or gigantic-file cycles, S3 Direct with multipart upload is the stronger transport: fileX supports S3 multipart upload, and AWS S3 allows up to 10000 parts per object with each part between 5 MiB and 5 GiB, uploaded and retried independently. S3 Direct exposes no Tracking API, so confirmation must be rebuilt out of band.verified
  • The fileX S3 Direct token (from S3TransferTokens) is scoped to a single application and valid one hour; its response grants writePaths to the inbound (in) path and readPaths to the application's results (out) and archive (in_arcv) paths, which is the feedback surface available to S3 uploads.verified
  • fileX archives every received file, for all access methods, to a matching in_arcv sub-space for 30 calendar days with the original filename prefixed by a US-Eastern timestamp; the Tracking API covers only REST uploads, so for SFTP and S3 Direct FINRA directs firms to contact support to check status.verified
  • fileX sets no hard file-size limit but recommends keeping files under 60 GB and arranging larger transfers with FINRA support; an account is locked after connecting with invalid credentials and must be unlocked through support.verified
  • SFTP uses password authentication only (no SSH keys) over port 22 with FINRA-whitelisted static IPs and supports put, get, cd, and ls but not rename, move, or chmod; the HTTPS REST API returns JSON or XML and uses standard codes where 401 means missing entitlement or invalid credentials and 404 means the file or directory was not found.verified
  • The HTTPS REST upload transfers bytes via a pre-signed S3 PutObject URL, which is a single PUT; Amazon S3 caps a single PUT at 5 GB, so objects larger than 5 GB require multipart upload and multipart needs full S3 credentials. This is why fileX S3 Direct, not the REST pre-signed URL, is what enables uploads above the single-PUT limit.verified