Poshan Tracker Face Verification

🗄️ Investigation

Decompiling Poshan Tracker app to inspect its face verification functionality

A Decode investigation reported that “AI Facial Recognition Is Denying Food To Pregnant Women Across India” on March 19. The article is worth reading and includes quotes from several people with lived experience talking about the exclusionary systems being designed.

As a public health professional who has been looking at Poshan Tracker and other digital intrusions intermittently, I agree with the political statement the article is making and stand by it. It has been pointed out for years that Digital health obsession has compromised care in anganwadis. It has also been pointed out that surveillance apps in welfare is nothing but snake oil.

The eyesore

But when I was reading this article, the technical aspect of it didn’t make sense to me. The article seemed to rely on just the presence of Google ML Kit in the AndroidManifest.xml to state that ML Kit was being used for face verification/matching. It even included a quote from Google:

Google said that ML Kit “does not have facial recognition capabilities” and is not designed to identify specific individuals

But it went on to frame this as a design flow:

A Fundamental Design Flaw

Anoop, tech researcher, said the problem goes beyond implementation and points to a fundamental design flaw. He explained, “At its core, ML Kit is designed to detect faces, not reliably recognise people over time,” he said.

The hypothesis

As a critical thinker, I came up with an alternative explanation.

Maybe ML Kit was being used just to draw a rectangle around the face when the worker was taking the photograph.

UIDAI is supposed to provide a face verification API (pdf, page 19)

Maybe they are sending the photo to UIDIA to do the face authentication

The methodology

Whatever be the hypothesis, the method to proceed would be to look at how the app really operates. There are two ways to do it:

Dynamic analysis - look at the app while it is running
Static analysis - look at the code

Dynamic analysis

If I was doing dynamic analysis, I would first check whether the app was able to do face verification/matching offline. If we could, for example, turn off internet at the last moment between photo capture and verification, and it still gave a result, then we would know that the verification was being done offline. That would rule out the use of Aadhaar authentication API.

If the app refused to function when internet is off, then it would require a slightly more involved proxy setup where I routed all traffic from the app via a software I control and check for whether it makes any internet call in that verification phase.

In any case, doing a dynamic analysis requires running the app (bypassing credentials) and interacting with the servers of Poshan Tracker. This I didn’t want to do.

Static analysis

With static analysis, you don’t disturb the system in any way. All you’re doing is looking at the code and trying to figure out what it does.

Unfortunately, the Government hasn’t asked the vendors here to publish the source code of Poshan Tracker despite it being built with public money. This is a violation of the Constitutional spirit and GoI’s policy on adoption of open source software.

So, it falls upon us to decompile the APK and try to figure out what it does. The more obfuscated the code is, the harder it becomes to be certain about what it does. But still we can find some properties by looking at the code.

The results

APK source

First, I downloaded the APK (XAPX) version 25.2.5 from two sources:

The md5sum of both sources where 72ceb4e1993e7cefe58775ac8571d460 and the size was 199M (208514024 bytes).

Extraction

I used apktool to decompress the XAPK, and then again decompile the main APK.

apktool d com.poshantracker_25.2.5.xapk
apktool d com.poshantracker_25.2.5.xapk.out/unknown/com.poshantracker.apk

I also used jadx-gui for the second step (to be able to view the result in a more user friendly way).

Needle in the haystack

With decompiled source code, reading the whole source code to find out the program flow or lifecycle of the app is very complicated (for me). So, I used search based investigation where I search for key terms that might show up in the app and then read the code around that term. This creates a major limitation in the methodology as it is quite possible that some code is included in the app without actually being used, and vice versa some code that is being used might not be discovered by the researcher.

In some sense, Anoop was doing the same, except Anoop stopped at the easiest step of looking at the AndroidManifest.xml and making inferences based on that.

Architecture

The present architecture of the Poshan Tracker android app includes:

React Native - the frontend that is actually seen by people and is included in Decode’s photos.
Kotlin - the glue code of Android
Chaquopy - SPOILER ALERT: the face verification system in python

199 MB?

Usually an app like this would be about 50MB. But this one is almost 200MB. Why?

Use ncdu and you see that 48.2% of the size is contributed by assets and >80% of that is by chaquopy. Or alternatively,

$ eza -lDT -rs size --total-size --no-permissions --no-user --no-time --no-quotes -L2 | head -n 20
 322M .
 182M ├── assets
 157M │   ├── chaquopy
10.0M │   ├── fonts
 4.3M │   ├── models_bundled
 1.5M │   ├── net
 3.8k │   └── dexopt
  68M ├── smali_classes2
  38M │   ├── com
 3.7M │   ├── okhttp3
 3.3M │   ├── net
 1.6M │   ├── r8
 1.2M │   ├── q8
 1.1M │   ├── zb
 1.0M │   ├── Fa
 978k │   ├── io
 760k │   ├── b7
 754k │   ├── okio
 595k │   ├── f9
 541k │   ├── ta

If they bundled an asset that adds almost 100MB to an app that is downloaded all over India by Anganwadi workers, then they probably really wanted that asset. (Although if one looks at the fonts folder, one can also assume that these developers didn’t bother a bit about decreasing the bundle size).

Chaquopy

Chaquopy is a way to run python inside Android. It is very rare to run python inside Android.

When we look inside the chaquopy folder inside assets, we see several files with .imy format with app.imy being the largest one.

$ ls -lh assets/chaquopy/
total 144M
-rw-r--r-- 1 asd asd  89M Mar 20 12:54 app.imy
-rw-r--r-- 1 asd asd 419K Mar 20 12:54 bootstrap.imy
drwxr-xr-x 6 asd asd 4.0K Mar 20 12:54 bootstrap-native
-rw-r--r-- 1 asd asd 5.3K Mar 20 12:54 build.json
-rw-r--r-- 1 asd asd 284K Mar 20 12:54 cacert.pem
-rw-r--r-- 1 asd asd  11M Mar 20 12:54 requirements-arm64-v8a.imy
-rw-r--r-- 1 asd asd 9.3M Mar 20 12:54 requirements-armeabi-v7a.imy
-rw-r--r-- 1 asd asd 2.8M Mar 20 12:54 requirements-common.imy
-rw-r--r-- 1 asd asd  13M Mar 20 12:54 requirements-x86_64.imy
-rw-r--r-- 1 asd asd  11M Mar 20 12:54 requirements-x86.imy
-rw-r--r-- 1 asd asd 1.4M Mar 20 12:54 stdlib-arm64-v8a.imy
-rw-r--r-- 1 asd asd 1.3M Mar 20 12:54 stdlib-armeabi-v7a.imy
-rw-r--r-- 1 asd asd 2.9M Mar 20 12:54 stdlib-common.imy
-rw-r--r-- 1 asd asd 1.5M Mar 20 12:54 stdlib-x86_64.imy
-rw-r--r-- 1 asd asd 1.4M Mar 20 12:54 stdlib-x86.imy

These are just zip files and can be extracted with unzip.

The most important one (and the largest one) is app.imy and I unzipped that to see the following:

$ eza --tree --long app
drwxr-xr-x    - asd 21 Mar 13:41 app
drwxr-xr-x    - asd  1 Feb  1980 ├── face_matching_models
.rw-r--r--  22M asd  1 Feb  1980 │   ├── dlib_face_recognition_resnet_model_v1.dat
.rw-r--r-- 100M asd  1 Feb  1980 │   └── shape_predictor_68_face_landmarks.dat
.rw-r--r-- 4.0k asd  1 Feb  1980 └── script.pyc
$ file app/script.pyc 
app/script.pyc: Byte-compiled Python module for CPython 3.8 (magic: 3413), timestamp-based, .py timestamp: Tue Feb 18 05:59:10 2025 UTC, .py size: 5165 bytes

So you have a folder with two face matching models and a compiled python script.

We will come back to this folder after briefly looking at what we could find with React Native and the decompiled java.

Keywords search in decompiled java

Within jadx I tried several keywords like “https”, “uidai”, “face”, “verification”, “matching”, “ekyc”, and so on. I was able to find two very important classes: FaceRecognitionActivity and FaceVerificationActivity.

For example, here’s one function:

    public final void b(InterfaceC0288b callback, List messageArray, String faceEmbedding, double d10, int i10, double d11, double d12, boolean z10, boolean z11, boolean z12, boolean z13, boolean z14) {
        j.f(callback, "callback");
        j.f(messageArray, "messageArray");
        j.f(faceEmbedding, "faceEmbedding");
        Intent intent = new Intent(this.f31064a, (Class<?>) FaceVerificationActivity.class);
        intent.putExtra("messageArray", (String[]) messageArray.toArray(new String[0]));
        intent.putExtra("checkFaceDirection", z10);
        intent.putExtra("checkEyesOpen", z11);
        intent.putExtra("checkFaceWidthRatio", z12);
        intent.putExtra("checkImageBlurry", z13);
        intent.putExtra("checkImageBrightness", z14);
        intent.putExtra("faceEmbedding", faceEmbedding);
        intent.putExtra("verificationThreshold", d10);
        intent.putExtra("blurThreshold", i10);
        intent.putExtra("darkThresholdLightning", (float) d11);
        intent.putExtra("brightThresholdLightning", (float) d12);
        intent.addFlags(268435456);
        FaceVerificationActivity.f30985C0.b(new d(callback));
        this.f31064a.startActivity(intent);
    }

With some searching around, I could see that there are functions decorated as @ReactMethod in ImageCaptureModule that called into these, like checkMatching

    @ReactMethod
    public void checkFacMatching(ReadableArray readableArray, String str, Promise promise) {
        com.poshantracker.facerecognitionlibrary.b bVar = new com.poshantracker.facerecognitionlibrary.b(this.reactContext);
        int size = readableArray.size();
        String[] strArr = new String[size];
        for (int i10 = 0; i10 < size; i10++) {
            strArr[i10] = readableArray.getString(i10);
        }
        bVar.b(new b(promise), Arrays.asList(strArr), str, 0.4d, 2000, 90.0d, 200.0d, true, true, false, false, false);
    }

Notice the number 0.4d. That’s the verification threshold being passed into the face matching module.

The FaceVerificationActivity calls into chaquopy and calls a function called create_embedding_and_match_faces

        PyObject module = python.getModule("script");
        kotlin.jvm.internal.j.e(module, "getModule(...)");
        this.f31019x0 = module;
        if (module == null) {
            kotlin.jvm.internal.j.t("module");
            module = null;
        }
        Object obj = module.get((Object) "create_embedding_and_match_faces");
        kotlin.jvm.internal.j.c(obj);
        this.f31020y0 = (PyObject) obj;

So, we can see that this is all glue code. But we also see some important configuration parameters.

React Native

The react native code is bundled in assets/index.android.bundle and is in the following format:

$ file assets/index.android.bundle 
assets/index.android.bundle: Hermes JavaScript bytecode, version 96

By decompiling this with hermes-dec one is able to see that checkFacMatching is indeed called.

One can also see several API endpoints, error/success messages, and other descriptive text in this part of the code.

Decompiling script.pyc

So, the one part which might hold answers to a lot of our questions is the python script. Unfortunately the decompilers like decompile3, uncompyle6, and pycdc were erring out. And I started watching a talk by Rocky Bernstein who’s been a decompiler maintainer and in the beginning of the talk RB says AI is not upto the mark. And then I thought, let me try AI on this, and put it through Gemini Pro. Conversation. Here’s its output:

import os
import logging
from PIL import Image
import numpy as np
import json
import dlib
import io

logging.basicConfig(level=logging.DEBUG, format='PyTest: %(message)s')

def get_absolute_path(relative_path):
    return os.path.join(os.path.dirname(os.path.realpath(__file__)), relative_path)

def load_models():
    try:
        logging.debug('PyTest : Getting absolute path of models')
        path1 = get_absolute_path('face_matching_models/shape_predictor_68_face_landmarks.dat')
        path2 = get_absolute_path('face_matching_models/dlib_face_recognition_resnet_model_v1.dat')
        pose_predictor_68_point = dlib.shape_predictor(path1)
        logging.debug('PyTest : Loaded 68 point model')
        face_encoder = dlib.face_recognition_model_v1(path2)
        logging.debug('PyTest : Loaded face encoder')
        face_detector = dlib.get_frontal_face_detector()
        logging.debug('PyTest : Loaded face detector')
        return face_detector, pose_predictor_68_point, face_encoder
    except Exception as e:
        logging.error(f'PyTest : An error occurred in load_models function: {e}')
        return None

def face_encodings(face_image, num_jitters=1, number_of_times_to_upsample=1):
    logging.debug('PyTest : Starting face_encodings function')
    face_detector, pose_predictor_68_point, face_encoder = load_models()
    face_locations = face_detector(face_image, number_of_times_to_upsample)
    logging.debug(f'PyTest : Detected {len(face_locations)} face(s)')
    
    if len(face_locations) == 0:
        logging.debug('Returning None')
        return None
        
    try:
        face_location = face_locations[0]
        raw_landmarks = pose_predictor_68_point(face_image, face_location)
        logging.debug('PyTest : Computed raw landmarks')
        if raw_landmarks is None:
            logging.debug('PyTest: Raw_landmarks are None')
            return None
    except Exception as e:
        logging.error(f'PyTest : Error occured in Face Encoding Function: {e}')
        return None

    encoding = np.array(face_encoder.compute_face_descriptor(face_image, raw_landmarks, num_jitters))
    logging.debug('PyTest : Computed encodings for face')
    return json.dumps(encoding.tolist())

def face_distance(face_encodings, face_to_compare):
    try:
        logging.debug('PyTest : calculating distance ')
        return np.linalg.norm(face_encodings - face_to_compare, axis=1)
    except Exception as e:
        logging.error(f'PyTest : An error occurred in face_distance function: {e}')
        return None

def create_embedding_and_match_faces(image, encoding1, threshold=0.6):
    try:
        if isinstance(encoding1, str):
            encoding1 = np.array(json.loads(encoding1))
        logging.debug(f'PyTest : encoding one {type(encoding1)}')
        logging.debug(f'PyTest : encoding two {encoding1}')
        
        encoding2 = create_embedding(image)
        logging.debug(f'PyTest : encoding two {type(encoding2)}')
        logging.debug(f'PyTest : encoding two {encoding2}')
        
        if encoding2 is None:
            logging.debug('PyTest : Return None due to empty')
            return None
            
        distance = face_distance([encoding1], encoding2)[0]
        logging.debug(f'PyTest : distance is {distance}')
        
        if distance <= threshold:
            return True, distance
        return False, distance
    except Exception as e:
        logging.error(f'PyTest : An error occurred in create_embedding_and_match_faces function: {e}')
        return None

def create_embedding(image):
    try:
        logging.debug('PyTest : Starting create_embedding function')
        image_obj = Image.open(io.BytesIO(image))
        logging.debug('PyTest : Opened image')
        image_np = np.array(image_obj)
        logging.debug('PyTest : Converted image to numpy array')
        encoding = face_encodings(image_np)
        logging.debug('PyTest : Returning first encoding')
        logging.debug(f'PyTest : {encoding}')
        return encoding
    except Exception as e:
        logging.error(f'PyTest : An error occurred in create_embedding1 function: {e}')
        return None

While I do not know if this is accurate, I do know that the strings that show up in the file are similar/same.

Conclusion

So there is a great chance, that the developers of Poshan Tracker are using dlib and the models shape_predictor_68_face_landmarks and dlib_face_recognition_resnet_model_v1 for for doing the face matching/verification.

These are models created by Davis King, the main author of dlib, 10 years ago. The details of how they were created are present in https://github.com/davisking/dlib-models but suffice to say a lot of the data is coming from celebrity faces on the internet. The dlib_face_recognition_resnet_model_v1 was trained on the photos of 7485 individuals. And the shape_predictor_68_face_landmarks is trained on a 300 faces challenge handpicked after:

The images were downloaded from google.com by making queries such as “party”, “conference”, “protests”, “football” and “celebrities”.

References

So, basically, the supplies that poor Indians should receive is gatekept by whether models trained on celebrity faces are able to find out match between a very old photo of them and their present face.

Remember the threshold of matching I talked about earlier. The default and the highest “performance” of dlib is at 0.6. Here’s what dlib says about distance:

In general, if two face descriptor vectors have a Euclidean distance between them less than 0.6 then they are from the same person, otherwise they are from different people.

But our techbros behind Poshan Tracker seems to have increased it to 0.4. This means, only if the distance between two faces is less than 0.4 will they be matched. (I’m unsure in the maze of obfuscated code whether it is 0.4 or 0.6 that’s actually used. If it is 0.6, that’s better for the beneficiaries.)

Contextualizing

There are biases in facial recognition, like the other race effect. The age/physiology related face changes have already been spoken about in the Decode article.

These make the result of face matching algorithms probabilistic. That is, there is a non-zero chance that two photos of the same person will be declared as “not matching”.

This is the #1 argument against technology like Aadhaar in welfare, and has been written about numerous times. In the Supreme Court case too, it was clearly stated from the very beginning that Aadhaar is a probabilistic system.

When it comes to Poshan Tracker, it has reimplemented the probabilistic system and probably left it as a choice to a software engineer somewhere inside an AC office room to decide how much probabilistic. The difference of threshold between 0.4 and 0.6, for example, can change the result for thousands of individuals.

Limitations

Again, these are only descriptions of what I saw. It does not mean that this is exactly how the app operates. A conclusive answer on how the app operates can be given by someone who can read bytecode magically or by the developers releasing the original source code.

Supplements

You can find all the decompiled output in https://github.com/asdofindia/poshan-tracker-25.2.5