Caching Strategies for Mobile Apps

By Ishan Khanna LinkedInUpdated May 25, 2026
mobile-system-designcachingperformanceiosandroid

I was once debugging a feed-loading issue on an Android app. Users were complaining that the app showed stale content from three days ago, but only sometimes, and only after they reopened the app from the background. The root cause? Our in-memory cache was getting wiped during OS-initiated process death, and the disk cache fallback had a broken TTL check that never expired entries. Two caching layers, both failing in different ways, combining into a bug that was nearly impossible to reproduce on a developer's device.

Caching comes up in every single mobile system design interview. But most candidates treat it as an afterthought — "and then we'll add a cache." That's not enough. Interviewers want to see that you understand which cache, where in the data flow, how it gets invalidated, and what happens when the OS kills your app mid-write. This article will get you there.

Why Caching Is Different on Mobile#

If you've worked on backend systems, forget most of what you know about caching. Server-side caches run on machines with gigabytes of RAM, persistent processes, and reliable storage. Mobile is a different world.

The OS is hostile to your cache. On iOS, the system can terminate your app's process at any time when it's in the background. On Android, the OS aggressively kills background processes under memory pressure. Your carefully populated in-memory cache? Gone. No warning, no callback, no graceful shutdown.

Storage is shared and limited. Your app doesn't own the device. The user has photos, music, other apps — all competing for the same storage. iOS can purge your app's Caches directory at any time when the device runs low on disk space. Android has similar behavior with its cache partition.

Memory pressure is real. An iPhone might have 6GB of RAM, but your app gets a fraction of that. Hold too much in memory, and the OS will kill you. didReceiveMemoryWarning on iOS isn't a suggestion — it's a threat.

Network is unreliable and expensive. Users switch between Wi-Fi and cellular. They go through tunnels. They have data caps. A good caching strategy directly reduces network usage and makes your app feel fast even on a 3G connection.

Interview Tip: When discussing caching in an interview, start by acknowledging these constraints. Saying "mobile caching is tricky because the OS can kill your process and wipe your in-memory state at any time" immediately signals that you understand the platform.

Memory Cache vs Disk Cache#

Every mobile caching architecture has two layers: memory (fast, volatile) and disk (slower, persistent). Understanding when to use each — and how they interact — is foundational.

Memory CacheDisk Cache
SpeedNanosecondsMilliseconds
Survives app killNoYes
Survives device rebootNoYes
Size limit~50-100MB practical maxHundreds of MB, but OS can purge
Best forDecoded images, parsed models, hot dataSerialized responses, image files, large datasets
iOS toolingNSCache, DictionaryFileManager, Core Data, SQLite
Android toolingLruCache, HashMapRoom, SQLite, File storage
EvictionAutomatic under memory pressure (NSCache) or manualManual, or OS purge on low disk

What process death actually does

The same three entries live in both tiers — until the OS needs your RAM

Interactive
App process: running
Memory (RAM)
volatile
  • user_42parsed model
  • feed_page_1API response
  • avatar_a8f.jpgdecoded image
Disk (storage)
persistent
  • user_42parsed model
  • feed_page_1API response
  • avatar_a8f.jpgdecoded image

Both caches are warm. Now press the button — this is what the OS does to your app dozens of times a day.

On iOS this is a background termination; on Android it's the low-memory killer. Either way, your app gets no warning. Anything worth keeping must already be on disk.

Memory Cache#

On iOS, NSCache is your best friend. It automatically evicts entries under memory pressure, it's thread-safe, and it doesn't retain keys strongly. On Android, LruCache gives you an LRU eviction policy with a configurable size limit.

// iOS: Simple memory cache using NSCache
final class MemoryCache<Key: Hashable, Value> {
    private let cache = NSCache<WrappedKey, WrappedValue>()

    init(countLimit: Int = 100) {
        cache.countLimit = countLimit
    }

    func get(_ key: Key) -> Value? {
        cache.object(forKey: WrappedKey(key))?.value
    }

    func set(_ value: Value, forKey key: Key) {
        cache.setObject(WrappedValue(value), forKey: WrappedKey(key))
    }

    func remove(_ key: Key) {
        cache.removeObject(forKey: WrappedKey(key))
    }

    // NSCache requires NSObject keys and values
    private class WrappedKey: NSObject {
        let key: Key
        init(_ key: Key) { self.key = key }
        override var hash: Int { key.hashValue }
        override func isEqual(_ object: Any?) -> Bool {
            guard let other = object as? WrappedKey else { return false }
            return key == other.key
        }
    }

    private class WrappedValue {
        let value: Value
        init(_ value: Value) { self.value = value }
    }
}
// Android: Memory cache using LruCache
class MemoryCache<K, V>(maxSize: Int) {

    private val cache = object : LruCache<K, V>(maxSize) {
        override fun sizeOf(key: K & Any, value: V & Any): Int {
            // Override this to measure actual size in KB
            return 1
        }
    }

    fun get(key: K): V? = cache.get(key)

    fun put(key: K, value: V) {
        cache.put(key, value)
    }

    fun remove(key: K) {
        cache.remove(key)
    }

    fun clear() {
        cache.evictAll()
    }
}

Disk Cache#

Disk caches persist across app restarts. Use them for data that's expensive to re-fetch: API responses, images, computed results. The trade-off is speed — reading from disk involves I/O, deserialization, and potentially database queries.

On iOS, write to the Caches directory (the OS can purge it, but it won't be backed up to iCloud). On Android, use context.cacheDir for the same semantics.

Interview Tip: Always mention the two-tier approach in interviews: "I'd use an in-memory LRU cache for hot data backed by a disk cache for persistence. On a cache miss in memory, we check disk before going to the network."

Cache-Aside (Lazy Loading)#

Cache-aside is the most common caching pattern on mobile. The application code manages the cache explicitly: check the cache first, fetch from the network on a miss, then populate the cache.

The flow is simple:

  1. Check memory cache
  2. On miss, check disk cache
  3. On miss, fetch from network
  4. Store result in both memory and disk cache
  5. Return the data

Cache-aside, step by step

Watch one request travel through the tiers — replay each scenario

Interactive

App UI

Memory

~ns

Disk

~ms

Network

100s of ms

Pick a scenario above to watch the lookup flow. The green dot on a tier means data is stored there.

The further right a request travels, the more it costs: memory is nanoseconds, disk is milliseconds, network is hundreds of milliseconds. Cache-aside's job is to stop the request as early as possible.
// iOS: Cache-aside pattern with two-tier caching
final class UserRepository {
    private let memoryCache = MemoryCache<String, UserProfile>()
    private let diskCache: DiskCache<UserProfile>
    private let apiClient: APIClient

    init(apiClient: APIClient, diskCache: DiskCache<UserProfile>) {
        self.apiClient = apiClient
        self.diskCache = diskCache
    }

    func getUser(id: String) async throws -> UserProfile {
        // 1. Check memory cache
        if let cached = memoryCache.get(id) {
            return cached
        }

        // 2. Check disk cache
        if let diskCached = try diskCache.read(key: id) {
            memoryCache.set(diskCached, forKey: id)
            return diskCached
        }

        // 3. Fetch from network
        let user = try await apiClient.fetchUser(id: id)

        // 4. Populate both caches
        memoryCache.set(user, forKey: id)
        try diskCache.write(user, key: id)

        return user
    }
}
// Android: Cache-aside with coroutines
class UserRepository(
    private val memoryCache: MemoryCache<String, UserProfile>,
    private val userDao: UserDao,
    private val apiService: ApiService
) {
    suspend fun getUser(id: String): UserProfile {
        // 1. Check memory cache
        memoryCache.get(id)?.let { return it }

        // 2. Check disk cache (Room)
        userDao.getById(id)?.let { entity ->
            val profile = entity.toUserProfile()
            memoryCache.put(id, profile)
            return profile
        }

        // 3. Fetch from network
        val response = apiService.getUser(id)
        val profile = response.toUserProfile()

        // 4. Populate both caches
        memoryCache.put(id, profile)
        userDao.insert(profile.toEntity())

        return profile
    }
}

This pattern is straightforward, but watch out for a common mistake: thundering herd. If 10 UI components all request the same user at the same time, you get 10 network calls. Use a mechanism like actor isolation on iOS or a Mutex/SingleFlightCache on Android to deduplicate in-flight requests.

Write-Through and Write-Behind#

Cache-aside handles reads. But what about writes? Two patterns dominate here, and the right choice depends on your consistency requirements.

Write-Through#

With write-through, every write goes to both the cache and the backend simultaneously. The write isn't considered complete until both succeed. This gives you strong consistency, but writes are slower because you're waiting on the network.

When to use it: Banking apps, payment flows, anything where showing stale data could cause real harm. If a user transfers money, you need the backend to confirm before updating the local state.

// Android: Write-through for a banking app
suspend fun transferFunds(from: Account, to: Account, amount: BigDecimal) {
    // Write to backend first — this is the source of truth
    val result = apiService.transfer(
        fromId = from.id,
        toId = to.id,
        amount = amount
    )

    // Only update cache after backend confirms
    val updatedFrom = from.copy(balance = from.balance - amount)
    val updatedTo = to.copy(balance = to.balance + amount)

    memoryCache.put(from.id, updatedFrom)
    memoryCache.put(to.id, updatedTo)
    accountDao.update(updatedFrom.toEntity())
    accountDao.update(updatedTo.toEntity())
}
// iOS: Write-through for a banking app
func transferFunds(from: Account, to: Account, amount: Decimal) async throws {
    // Write to backend first — this is the source of truth
    let result = try await apiClient.transfer(
        fromId: from.id,
        toId: to.id,
        amount: amount
    )

    // Only update cache after backend confirms
    let updatedFrom = from.withBalance(from.balance - amount)
    let updatedTo = to.withBalance(to.balance + amount)

    memoryCache.set(updatedFrom, forKey: from.id)
    memoryCache.set(updatedTo, forKey: to.id)
    try diskCache.write(updatedFrom, key: from.id)
    try diskCache.write(updatedTo, key: to.id)
}

Write-Behind (Write-Back)#

With write-behind, you update the cache immediately and return to the caller. The backend write happens asynchronously, often batched. This gives instant UI feedback but introduces eventual consistency.

When to use it: Social media likes, reactions, draft saves, analytics events. The user taps "like" and sees the heart fill immediately. The actual API call can happen a second later, or be batched with other pending writes.

// Android: Write-behind for social media likes
class LikeRepository(
    private val likeDao: LikeDao,
    private val apiService: ApiService,
    private val workManager: WorkManager
) {
    suspend fun toggleLike(postId: String, isLiked: Boolean) {
        // 1. Update cache immediately — UI responds instantly
        likeDao.upsert(LikeEntity(postId = postId, isLiked = isLiked, synced = false))

        // 2. Schedule background sync — batched, retried on failure
        val workRequest = OneTimeWorkRequestBuilder<LikeSyncWorker>()
            .setConstraints(
                Constraints.Builder()
                    .setRequiredNetworkType(NetworkType.CONNECTED)
                    .build()
            )
            .build()

        workManager.enqueueUniqueWork(
            "like_sync_$postId",
            ExistingWorkPolicy.REPLACE,
            workRequest
        )
    }
}
// iOS: Write-behind for social media likes
final class LikeRepository {
    private let likeStore: LikeStore   // local persistence (Core Data / SQLite)
    private let apiClient: APIClient

    init(likeStore: LikeStore, apiClient: APIClient) {
        self.likeStore = likeStore
        self.apiClient = apiClient
    }

    func toggleLike(postId: String, isLiked: Bool) async throws {
        // 1. Update cache immediately — UI responds instantly
        try await likeStore.upsert(
            PendingLike(postId: postId, isLiked: isLiked, synced: false)
        )

        // 2. Schedule background sync — batched, retried on failure
        let request = BGProcessingTaskRequest(identifier: "com.app.likeSync")
        request.requiresNetworkConnectivity = true
        try BGTaskScheduler.shared.submit(request)
    }
}

One like tap, two write strategies

Both columns receive the same tap at the same instant — watch when each UI confirms

Interactive

Write-through

strong consistency
Like
Backend (source of truth)
idle
Local cache
untouched

Cache written only after the server confirms

Write-behind

eventual consistency
Like
Local cache
idle
Backend (synced later)
idle

Cache written first, backend sync deferred

Tap the button — the same write hits both architectures simultaneously.

Write-through buys strong consistency by making the user wait on the network. Write-behind buys instant feedback by accepting that the backend is briefly behind. Pick per feature, not per app.

Interview Tip: When an interviewer asks about caching for writes, name the pattern explicitly. "For the like button, I'd use write-behind caching — update the local state immediately so the UI is responsive, then sync to the backend asynchronously using WorkManager on Android or BGTaskScheduler on iOS." That kind of precision stands out.

Cache Invalidation#

"There are only two hard things in computer science: cache invalidation and naming things." — Phil Karlton

It's a cliche because it's true. A cache that never expires serves stale data forever. A cache that expires too aggressively defeats the purpose of caching. Here are the three strategies that matter on mobile.

TTL-Based (Time-To-Live)#

The simplest approach: each cached entry has an expiration timestamp. After that time, the entry is considered stale and must be re-fetched.

// Android: TTL-based cache entry
data class CacheEntry<T>(
    val value: T,
    val cachedAtMillis: Long,
    val ttlMillis: Long
) {
    val isExpired: Boolean
        get() = System.currentTimeMillis() - cachedAtMillis > ttlMillis
}

// Usage in repository
suspend fun getUser(id: String): UserProfile {
    userDao.getEntry(id)?.let { entry ->
        if (!entry.isExpired) return entry.value
    }

    val user = apiService.fetchUser(id)
    userDao.upsert(
        CacheEntry(
            value = user,
            cachedAtMillis = System.currentTimeMillis(),
            ttlMillis = 300_000L // 5 min TTL
        )
    )
    return user
}
// iOS: TTL-based cache entry
struct CacheEntry<T: Codable>: Codable {
    let value: T
    let cachedAt: Date
    let ttlSeconds: TimeInterval

    var isExpired: Bool {
        Date().timeIntervalSince(cachedAt) > ttlSeconds
    }
}

// Usage in repository
func getUser(id: String) async throws -> UserProfile {
    if let entry: CacheEntry<UserProfile> = try diskCache.read(key: id),
       !entry.isExpired {
        return entry.value
    }

    let user = try await apiClient.fetchUser(id: id)
    let entry = CacheEntry(value: user, cachedAt: Date(), ttlSeconds: 300) // 5 min TTL
    try diskCache.write(entry, key: id)
    return user
}

TTL works well when staleness is tolerable within a known window. User profiles? 5 minutes is fine. Stock prices? 10 seconds or less. A feed? Maybe 60 seconds.

TTL in action

12 seconds here ≈ a 5-minute production TTL — keep tapping “Request” as it drains

Interactive
— no entry cached —
empty
Each miss costs a network round-trip

Tap “Request user_42” — the first one will be a cache miss.

The art is picking the window: too long and users see stale data, too short and you're hitting the network anyway. Profile data tolerates minutes; stock prices tolerate seconds.

Event-Based Invalidation#

Instead of guessing when data becomes stale, the server tells you. This is common with WebSocket connections or push notifications.

For example, in a chat app, when a message is edited on another device, the server pushes an event. The client receives it and invalidates the cached version of that message. This gives you real-time consistency without polling.

Version-Based Invalidation#

Each piece of data has a version number or ETag. When fetching, you send your cached version. The server responds with either "304 Not Modified" (your cache is current) or the new data. This avoids transferring data when nothing has changed.

// Android: Version-based cache check
class FeedRepository(
    private val feedDao: FeedDao,
    private val apiService: ApiService
) {
    suspend fun getFeed(): List<FeedItem> {
        val cachedVersion = feedDao.getCurrentVersion()

        return try {
            val response = apiService.getFeed(ifNoneMatch = cachedVersion)

            if (response.code() == 304) {
                // Cache is still valid
                feedDao.getAll().map { it.toFeedItem() }
            } else {
                val items = response.body()!!
                val newVersion = response.headers()["ETag"]
                feedDao.replaceAll(items.map { it.toEntity() }, newVersion)
                items
            }
        } catch (e: IOException) {
            // Network error — fall back to cache
            feedDao.getAll().map { it.toFeedItem() }
        }
    }
}
// iOS: Version-based cache check
final class FeedRepository {
    private let feedStore: FeedStore
    private let apiClient: APIClient

    init(feedStore: FeedStore, apiClient: APIClient) {
        self.feedStore = feedStore
        self.apiClient = apiClient
    }

    func getFeed() async throws -> [FeedItem] {
        let cachedVersion = try feedStore.currentVersion()

        do {
            let (items, response) = try await apiClient.getFeed(ifNoneMatch: cachedVersion)

            if response.statusCode == 304 {
                // Cache is still valid
                return try feedStore.getAll()
            } else {
                let newVersion = response.value(forHTTPHeaderField: "ETag")
                try feedStore.replaceAll(items, version: newVersion)
                return items
            }
        } catch is URLError {
            // Network error — fall back to cache
            return try feedStore.getAll()
        }
    }
}

In practice, you often combine strategies. Use TTL as a baseline (don't hit the network more than once per minute), event-based for real-time features (chat messages, notifications), and version-based for large datasets where you want to avoid unnecessary transfers (feed, product catalog).

Image Caching#

Image caching is the most common mobile caching use case. It's also the one interviewers expect you to know cold, because almost every mobile app displays images.

Libraries like Kingfisher and SDWebImage (iOS) or Coil and Glide (Android) all follow the same fundamental pattern: a three-tier waterfall.

Memory cache (decoded UIImage/Bitmap objects, ready to display) -> Disk cache (compressed image files on the file system) -> Network (download from the URL).

Here's what happens under the hood when you load an image:

  1. Hash the URL to create a cache key.
  2. Check memory cache — if the decoded image is there, return it immediately. This is why scrolling back up in a list feels instant.
  3. Check disk cache — if the compressed file exists, read it from disk, decode it into a displayable image, store the decoded image in the memory cache, return it.
  4. Download from network — fetch the image, write the compressed data to disk, decode it, store the decoded image in memory, return it.
// Android: Simplified image caching pipeline
object ImageLoader {
    private val memoryCache = LruCache<String, Bitmap>(100)
    private lateinit var diskCacheDir: File
    private val client = OkHttpClient()
    // Track in-flight downloads to avoid duplicate requests
    private val inFlightRequests = mutableMapOf<String, Deferred<Bitmap>>()
    private val mutex = Mutex()
    private val scope = CoroutineScope(SupervisorJob() + Dispatchers.IO)

    fun init(context: Context) {
        diskCacheDir = File(context.cacheDir, "ImageCache").apply { mkdirs() }
    }

    suspend fun loadImage(url: String): Bitmap {
        // 1. Memory cache
        memoryCache.get(url)?.let { return it }

        // 2. Disk cache
        val file = File(diskCacheDir, url.sha256Hash())
        if (file.exists()) {
            BitmapFactory.decodeFile(file.path)?.let { bitmap ->
                memoryCache.put(url, bitmap)
                return bitmap
            }
        }

        // 3. Deduplicate in-flight requests
        val download = mutex.withLock {
            inFlightRequests.getOrPut(url) {
                scope.async {
                    val request = Request.Builder().url(url).build()
                    val bytes = client.newCall(request).execute().use { response ->
                        response.body?.bytes() ?: throw IOException("Empty body")
                    }
                    val bitmap = BitmapFactory.decodeByteArray(bytes, 0, bytes.size)
                        ?: throw IOException("Decoding failed")
                    file.writeBytes(bytes)
                    memoryCache.put(url, bitmap)
                    bitmap
                }
            }
        }

        try {
            return download.await()
        } finally {
            mutex.withLock { inFlightRequests.remove(url) }
        }
    }
}
// iOS: Simplified image caching pipeline
final class ImageLoader {
    static let shared = ImageLoader()

    private let memoryCache = NSCache<NSString, UIImage>()
    private let diskCachePath: URL
    private let session = URLSession.shared
    // Track in-flight downloads to avoid duplicate requests
    private var inFlightRequests: [String: Task<UIImage, Error>] = [:]

    init() {
        let caches = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask)[0]
        diskCachePath = caches.appendingPathComponent("ImageCache")
        try? FileManager.default.createDirectory(at: diskCachePath, withIntermediateDirectories: true)
    }

    func loadImage(from url: URL) async throws -> UIImage {
        let key = url.absoluteString

        // 1. Memory cache
        if let cached = memoryCache.object(forKey: key as NSString) {
            return cached
        }

        // 2. Disk cache
        let filePath = diskCachePath.appendingPathComponent(key.sha256Hash)
        if let data = try? Data(contentsOf: filePath),
           let image = UIImage(data: data) {
            memoryCache.setObject(image, forKey: key as NSString)
            return image
        }

        // 3. Deduplicate in-flight requests
        if let existingTask = inFlightRequests[key] {
            return try await existingTask.value
        }

        let task = Task<UIImage, Error> {
            let (data, _) = try await session.data(from: url)
            guard let image = UIImage(data: data) else {
                throw ImageLoadingError.decodingFailed
            }
            try data.write(to: filePath)
            memoryCache.setObject(image, forKey: key as NSString)
            return image
        }

        inFlightRequests[key] = task
        defer { inFlightRequests.removeValue(forKey: key) }
        return try await task.value
    }
}

A few things that real image caching libraries handle that you should mention in interviews:

  • Downsampling: Decoding a 4000x3000 photo into a UIImage consumes ~48MB of memory. If you're displaying it in a 200x200 cell, that's wasteful. Libraries downsample during decoding to match the target view size.
  • Progressive loading: Show a blurry placeholder first, then sharpen as more data arrives.
  • Disk size limits: Cap the disk cache at something reasonable (100-200MB) and evict oldest entries when the limit is hit.
  • Memory eviction on warning: Clear the memory cache entirely when the OS sends a low-memory warning.

Interview Tip: Don't just say "I'd use Kingfisher for image loading." Say "Kingfisher implements a two-tier cache — decoded images in memory via NSCache, compressed data on disk. It deduplicates in-flight downloads and downsamples images to match the display size, which prevents memory spikes when loading large photos in a scrolling list."

HTTP Caching Headers#

HTTP has a built-in caching mechanism, and your mobile client should respect it. Three headers matter most.

Cache-Control — tells the client how long a response can be cached and under what conditions.

  • max-age=3600: cache for 1 hour.
  • no-cache: you can cache it, but must revalidate with the server before using it.
  • no-store: don't cache this at all (sensitive data).
  • private: only the client can cache this, not intermediate proxies.

ETag — a version identifier for the response. On subsequent requests, send If-None-Match: <etag>. If the data hasn't changed, the server returns 304 Not Modified with no body, saving bandwidth.

Last-Modified — a timestamp of when the resource was last changed. Works like ETag but with timestamps via If-Modified-Since.

On iOS, URLSession respects these headers by default if you use the default URLCache. On Android, OkHttp does the same with its built-in cache.

// Android: Setting up OkHttp with HTTP caching
val cacheDir = File(context.cacheDir, "http_cache")
val cacheSize = 50L * 1024 * 1024 // 50 MB

val client = OkHttpClient.Builder()
    .cache(Cache(cacheDir, cacheSize))
    .addInterceptor { chain ->
        var request = chain.request()

        // Force cache when offline
        if (!isNetworkAvailable()) {
            request = request.newBuilder()
                .cacheControl(CacheControl.FORCE_CACHE)
                .build()
        }

        chain.proceed(request)
    }
    .build()
// iOS: Setting up URLSession with HTTP caching
let cacheDir = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask)[0]
    .appendingPathComponent("http_cache")

let config = URLSessionConfiguration.default
config.urlCache = URLCache(
    memoryCapacity: 10 * 1024 * 1024,  // 10 MB in memory
    diskCapacity: 50 * 1024 * 1024,    // 50 MB on disk
    directory: cacheDir
)
config.requestCachePolicy = .useProtocolCachePolicy

let session = URLSession(configuration: config)

// Force cache when offline
func makeRequest(url: URL) -> URLRequest {
    var request = URLRequest(url: url)
    if !isNetworkAvailable() {
        request.cachePolicy = .returnCacheDataDontLoad
    }
    return request
}

Sometimes you'll want to override the server's caching headers. Maybe the server sends no-cache for an endpoint, but you know the data is safe to cache for 5 minutes on the client. In that case, use a network interceptor (Android) or the willCacheResponse delegate (iOS) to rewrite the response headers:

// Override server cache headers for specific endpoints
.addNetworkInterceptor { chain ->
    val response = chain.proceed(chain.request())

    if (chain.request().url.encodedPath.contains("/feed")) {
        response.newBuilder()
            .header("Cache-Control", "public, max-age=60")
            .build()
    } else {
        response
    }
}
// Override server cache headers for specific endpoints
func urlSession(
    _ session: URLSession,
    dataTask: URLSessionDataTask,
    willCacheResponse proposedResponse: CachedURLResponse
) async -> CachedURLResponse? {
    guard let response = proposedResponse.response as? HTTPURLResponse,
          let url = response.url, url.path.contains("/feed"),
          var headers = response.allHeaderFields as? [String: String] else {
        return proposedResponse
    }

    headers["Cache-Control"] = "public, max-age=60"

    guard let newResponse = HTTPURLResponse(
        url: url,
        statusCode: response.statusCode,
        httpVersion: nil,
        headerFields: headers
    ) else {
        return proposedResponse
    }

    return CachedURLResponse(response: newResponse, data: proposedResponse.data)
}

Interview Tip: Mentioning HTTP caching headers shows that you think about caching at every layer, not just in your application code. It's a signal of real-world experience.

Pagination and Cache#

Most list-based features — feeds, search results, message histories — use pagination. Caching paginated data introduces specific challenges that interviewers love to explore.

The core problem: you cache page 1 of a feed. The user scrolls down, you fetch and cache page 2. Now the user pulls to refresh. New posts have been added at the top. Do you throw away your entire cache? Merge the new data in? What if an item that was on page 1 is now on page 2?

Offset-Based Pagination and Caching#

With offset-based pagination (?page=2&limit=20), cache keys map directly to page numbers. But insertion at the top shifts everything. Item 20 moves to item 21, and your page 2 cache now overlaps with the new page 1. This is why offset-based pagination caches poorly.

Cursor-Based Pagination and Caching#

Cursor-based pagination (?after=cursor_abc&limit=20) is more cache-friendly. Each page is anchored to a specific item, so insertions at the top don't affect existing pages.

Why offset pagination breaks your cache

Same feed, same new post, same scroll — two very different page 2s

Interactive

Server feed (newest first)

ABCDEF

Offset-based

post_Acached
post_Bcached
post_Ccached
user scrolls ↓

Cursor-based

post_Acached
post_Bcached
post_Ccached
user scrolls ↓

Press the button to publish post N and watch both clients paginate.

Offset pages are positions, and positions shift when new items arrive. Cursor pages are anchored to item IDs, which never shift. That single difference is why cursor pagination caches cleanly.

The pattern I've used in practice:

// Android: Caching cursor-paginated feed data
class FeedCache(
    private val feedDao: FeedDao,
    private val apiService: ApiService
) {
    /** Load the next page after the given cursor. */
    suspend fun loadPage(afterCursor: String?): FeedPage {
        // Check if we have this page cached
        feedDao.getPage(afterCursor)?.let { cached ->
            if (!cached.isExpired) return cached
        }

        val page = apiService.getFeed(after = afterCursor, limit = 20)

        // Store items with their position metadata
        feedDao.insertPage(
            items = page.items,
            nextCursor = page.nextCursor,
            previousCursor = afterCursor,
            fetchedAt = System.currentTimeMillis()
        )

        return page
    }

    /** Pull-to-refresh: fetch new items and prepend to cache */
    suspend fun refresh(): List<FeedItem> {
        val latestCachedId = feedDao.getLatestItemId()

        // Fetch newest items until we find overlap with our cache
        val newItems = apiService.getFeed(after = null, limit = 20)

        val overlapIndex = newItems.items.indexOfFirst { it.id == latestCachedId }
        return if (overlapIndex >= 0) {
            // Only insert items before the overlap
            val freshItems = newItems.items.take(overlapIndex)
            feedDao.prependItems(freshItems)
            freshItems
        } else {
            // No overlap — gap too large, reset cache
            feedDao.clearAndInsert(newItems.items, nextCursor = newItems.nextCursor)
            newItems.items
        }
    }
}
// iOS: Caching cursor-paginated feed data
final class FeedCache {
    private let feedDao: FeedDao

    /// Load the next page after the given cursor.
    func loadPage(after cursor: String?) async throws -> FeedPage {
        // Check if we have this page cached
        if let cached = try feedDao.getPage(afterCursor: cursor),
           !cached.isExpired {
            return cached
        }

        let page = try await apiClient.getFeed(after: cursor, limit: 20)

        // Store items with their position metadata
        try feedDao.insertPage(
            items: page.items,
            nextCursor: page.nextCursor,
            previousCursor: cursor,
            fetchedAt: Date()
        )

        return page
    }

    /// Pull-to-refresh: fetch new items and prepend to cache
    func refresh() async throws -> [FeedItem] {
        let latestCachedId = try feedDao.getLatestItemId()

        // Fetch newest items until we find overlap with our cache
        let newItems = try await apiClient.getFeed(after: nil, limit: 20)

        if let overlapIndex = newItems.items.firstIndex(where: { $0.id == latestCachedId }) {
            // Only insert items before the overlap
            let freshItems = Array(newItems.items.prefix(upTo: overlapIndex))
            try feedDao.prependItems(freshItems)
            return freshItems
        } else {
            // No overlap — gap too large, reset cache
            try feedDao.clearAndInsert(newItems.items, nextCursor: newItems.nextCursor)
            return newItems.items
        }
    }
}

The key insight for interviews: cursor-based pagination lets you append new pages to the cache without worrying about shifted offsets, and pull-to-refresh becomes a matter of finding the overlap point between fresh data and your cached data.

Cache Eviction Policies#

When a cache reaches its size limit, something has to go. Three policies show up in interviews.

LRU (Least Recently Used) — evict the entry that hasn't been accessed the longest. This is the default for NSCache, LruCache, and most image caching libraries. It works well because recently accessed data is likely to be accessed again (temporal locality). For mobile, LRU is almost always the right answer.

LFU (Least Frequently Used) — evict the entry with the fewest accesses. This keeps popular items in cache longer, but it has a cold-start problem: new items have low frequency and might get evicted before they have a chance to prove their popularity. Rarely used on mobile.

FIFO (First In, First Out) — evict the oldest entry regardless of access patterns. Simple to implement, but it doesn't adapt to usage patterns. Useful for log buffers or event queues where order matters more than access frequency.

Be the LRU cache

Capacity: 4 entries · tap items below to access them and watch what gets evicted

Interactive
← most recently usednext to evict →
📰feed_1
👤user_a
empty
empty

The user opens…

0 hits0 misses0 evictions

Blue-dotted items are already cached. Tap any item — cached ones move to the front, new ones push the least recently used out.

Notice how anything you keep touching stays near the front, no matter how old it is. That's temporal locality — and it's exactly how a user scrolling a feed behaves.

Why LRU wins on mobile: Mobile usage is bursty and recency-driven. A user scrolling through a feed will likely scroll back up. A user viewing a profile might tap back and view it again. LRU naturally keeps these recently-viewed items cached. LFU would keep items from yesterday's browsing session that happened to be viewed many times, wasting cache space on data the user no longer needs.

Interview Tip: If asked about eviction, say LRU and explain why it matches mobile access patterns. If the interviewer pushes, mention that LFU could work for something like a music app's "most played" cache, but for general purpose, LRU is the standard.

Presenting Caching in Interviews#

This might be the most important section. Knowing caching strategies is one thing. Weaving them into your system design naturally is what separates strong candidates from everyone else.

Don't bolt caching on at the end. The worst thing you can do is design your entire system, then say "and we could add caching too." By that point, it feels like an afterthought because it is one.

Instead, introduce caching when you design the data flow. When you draw the Repository layer, say: "The repository checks a memory cache first, then disk, then network. Let me show you the flow." This makes caching an integral part of your architecture, not an optimization you might add later.

Here's a structure that works:

  1. During requirements gathering, ask: "What's the acceptable staleness for this data? Can we show data from 5 minutes ago, or does it need to be real-time?" This tells the interviewer you're already thinking about caching.

  2. During high-level design, draw the cache as a first-class component in your data flow diagram. Put it between the ViewModel and the Network layer, inside the Repository.

  3. During deep dive, explain your specific strategy: "For the feed, I'd use cache-aside with a 60-second TTL. For the user's profile, write-through so the UI stays consistent after edits. For likes, write-behind because we want instant UI feedback."

  4. During optimization, discuss eviction policies, size limits, and what happens under memory pressure.

ViewModel

observes state

Repository

single source of truth for data access

Cache Manager

Memory

LRU · ~ns

Disk

SQLite · ~ms

only on miss

Network Service

100s of ms · costs battery & data

Draw this on the whiteboard: the cache lives inside the repository, between your ViewModel and the network — a first-class component, not an afterthought.

Match the strategy to the feature. In a single app, you'll use different caching strategies for different data:

  • User profile: Cache-aside, write-through, 5-minute TTL
  • Feed items: Cache-aside, cursor-based pagination, 60-second TTL
  • Likes/reactions: Write-behind with background sync
  • Images: Three-tier waterfall (memory, disk, network) with LRU eviction
  • Chat messages: Event-based invalidation via WebSocket, disk-persisted

When you can articulate this level of detail — matching specific strategies to specific features with clear reasoning — you're showing the interviewer that you've actually built these systems. That's what they're looking for.