Disney+Hotstar is the most important OTT supplier in India and powers the Disney+ app within the MENA, SEA, and SAARC areas.
One of many key challenges confronted by the platform is authenticating requests to origin APIs, whereas additionally stopping any potential exploits by hackers or malicious customers. Authentication exploits may end up in monetary losses, availability points, and a damaging affect on the consumer expertise.
On this weblog, we’ll discover our journey of constructing a centralized and strong authentication mechanism utilizing the Emissary open-source Kubernetes-native API gateway (previously referred to as Ambassador). We’ll focus on how our resolution has developed and the way it can successfully authenticate requests from tens of millions of Hotstar customers.
Let’s stroll via our previous-gen resolution for request authentication and find out how requests circulation via our programs.
At Disney+Hotstar, we make the most of JWT tokens for request authentication. Beforehand, our providers had been uncovered to the shopper by way of AWS Load Balancer (ALB), which resulted in all requests hitting the origin with out authentication. Because of this, our origin providers needed to combine with our in-house token SDK to authenticate and decode the JWT token.
Limitations & Challenges
- Auth is each service’s duty : On this setup, every client-facing service was required to own an intensive understanding of authentication, which created a possible safety threat. Moreover, distributing token secrets and techniques to quite a few providers violated the precept of “Least Privilege”. Any oversight on this course of may probably result in safety breaches.
- Inconsistencies on account of SDK variations: Inconsistencies within the model of the token library throughout providers may create difficulties in rolling out token upgrades throughout groups and providers, together with signing key rotation.
- Inconsistent Error Responses: Unauthenticated error responses might be inconsistent throughout providers, posing a problem in sustaining the enterprise contract between purchasers and providers.
Given these limitations, we determined to relook at our design and discover a resolution that may permit us to beat these gaps.
To mitigate these challenges, we opted for a single ingress authentication that would function a safeguard to all external-facing APIs. After cautious consideration, we selected Emissary Ingress, which is predicated on the high-performance Envoy and affords a variety of versatile plugins similar to ExtAuth, RateLimit, Tracing, and extra. This alternative was well-suited to our use circumstances and offered us with a excessive stage of extensibility.
Structure
To realize granular management over APIs, we carried out authentication checks as plugins within the Emissary API Gateway. This allowed us to invoke the plugin solely when particular standards had been met within the incoming request path, guaranteeing that every API had the suitable stage of authentication. Because of this, we not solely improved our safety measures but in addition gained better flexibility for custom-made authentication.
- Token Authentication: Fundamental authentication of JWT token by checking the token signature, expiration time, and different related info.
- third Social gathering Auth Integration: Pluggable authentication for requests from third-party platforms, permitting for versatile customization of the authentication course of.
- Silent token refresh: Token’s life cycle fully managed by a single Authentication service, clear to origin providers and shopper.
- Person session identification: To keep away from the necessity to cross consumer tokens and carry out validation throughout a number of providers, we launched a brand new identification construction referred to as “Envelope”. This construction is generated as soon as on the Gateway and could be consumed by all origin providers on request chain for widespread information entry.
These enhancements permit us to securely handle consumer identification tokens and defend origin providers from invalid requests.
Subsequent, we are going to dive into how we solved 4 main challenges with the Gateway Authentication resolution.
To securely propagate consumer identification info to our providers with out counting on the possibly fragile JWT token-based propagation, we launched a brand new identification construction known as “Envelope”. This construction is modeled as a Protocol Buffer and offers a uniform and safe approach to propagate private identification info to origin providers.
- The Envelope can serve all the data contained within the token and likewise offers flexibility for serving enriched information primarily based on our enterprise necessities.
- Every Envelope is a short-lived identification token that’s scoped to the lifetime of the shopper request, fully consumed and propagated amongst inner providers in our system.
- Downstream providers can fetch the properties within the Envelope conveniently by integrating with our Envelope SDK supported in numerous languages.
There are a number of use-cases the place downstream providers may have related buyer information to serve a wealthy consumer expertise. Person Cohorts is one such piece which performs a crucial function in Hotstar ecosystem. We use cohort information to bucket teams of shoppers that showcase related patterns, and we will then design efficient engagement methods per distinctive cohort.
Let’s take a sensible instance to know this higher. We tag customers who’ve a choice for watching sure sports activities into one cohort group, and push notification to them at any time when a event related to them is being streamed on Hotstar. This ensures that our prospects don’t miss out on their favorite content material.
One other use-case is prospects whose subscription plan simply expired or is because of expire shortly — they are going to be tagged into one other cohort group, after which will probably be reminded to resume the subscription periodically.
We acknowledged that we may considerably enhance the system NFRs (Non-functional Necessities) by enriching these properties as soon as on the edge whereas producing the Envelope.
Context
At instances, it’s essential to dam consumer periods by invalidating their tokens after they sign off or in the event that they’re flagged as malicious by our RiskEngine (learn extra in our RiskEngineBlog). To perform this, we’d like a contract between the Authentication Service and different elements of our system for token drive blocks.
There are additionally circumstances the place sure occasions, similar to a consumer buying a brand new subscription plan or upgrading their current plan, require asynchronous updates to their token properties. That is the place token drive refresh turns into essential. By implementing token drive block and refresh, we be certain that our system stays safe and our customers’ entry stays up-to-date.
Resolution
To resolve this, naturally we’d consider two approaches, both storing the invalidation and refresh listing in an information storage like Redis or caching regionally. Nonetheless there are drawbacks with each options when it goes to manufacturing, it’s pricey to test Redis for each request with the primary method as visitors quantity grows, and the house utilization is unquestionably a giant concern with the second method because the set saved regionally might be very giant.
To handle these issues, we launched Bloom Filter that may hearken to token lifecycle occasions hold itself up-to-date. Bloom filter is a space-efficient probabilistic information construction used to test whether or not a component is a member within the set. Checks at Bloom Filter can solely return both “extremely doable in set” or “undoubtedly not in set”.
“Extremely doable in set” means there’s a chance {that a} blocked consumer session is in BloomFilter, however truly not. If we block a unsuitable consumer, there will probably be a damaging affect to our consumer expertise. Due to this fact, we’d nonetheless do a deep test in Redis to rule out false positives. Because the blocked and refreshed periods take a tiny proportion of the overall visitors, the vast majority of the circumstances will probably be filtered out by Bloom Filter with out querying Redis.
Through the use of Bloom Filter, we had been in a position to scale back our house consumption by 40x.
On this weblog, we talked about why we re-architected the consumer authentication circulation and the way we constructed a brand new age authentication system from scratch that takes care of consumer token validation, refresh, user-logouts, modifications in subscription, consumer information enrichment in envelope and apply security measures at gateway. We additionally talked about fascinating sub-problems round simplifying house constraints to carry out excessive scale authentication checks.
Wish to construct stuff like this? We’re hiring & we’re at all times in search of sensible engineers who love fixing exhausting issues. Take a look at open roles at https://tech.hotstar.com.