Educational Archives | ROC

Hardware Considerations when Architecting a Face Recognition System

Brendan Klare — Wed, 11 Aug 2021 14:00:50 +0000

As the capabilities of automated face recognition algorithms continue to skyrocket, so does the number of face recognition (FR) applications being deployed. Whether it is using FR to unlock a phone, create an investigative lead to help identify a violent criminal, enable low-income persons to open a bank account online, or perform visitor management at a courthouse, more and more face recognition applications continue to be developed by dozens of different system integrators. However, depending on the application, different system architectures and software requirements will be needed. And, depending on the architecture, different algorithm requirements will emerge. This article will discuss these requirements across different face recognition applications so that readers can select the proper FR algorithm when building FR systems.

Face Recognition Applications

There are three primary use-cases for FR technology:

Identity verification (1:1)
Analyst driven search / manual identification (1:N)
Automated search / automated identification (1:N+1)

Identity Verification 1:1

Identity verification (1:1) is the process of validating a person against a claimed identity. For example, the person will claim to the system they are “John Doe”. The system would take a photo of the person (the facial “presentation”), generate an FR template from the photo, and compare it against the template on file for “John Doe”. If the presented identity matches the reference identity, then access is granted. This could mean a door opens, a bank account is accessed, or a phone is unlocked. It is important to note that face can be one of multiple authentication factors used for identity verification (e.g., passwords, or tokens).

USE CASES

Bank Account Access

Secure Facility Access

Phone Unlock

Tax Return Filing

Analyst Driven Search1:N

Analyst driven search (1:N) is the process of manually searching a face image (a “probe”) against a database of pre-processed FR templates (a “gallery”). For example, in a criminal investigation, an image of a suspect may be obtained from a variety of sources, such as a still frame from a security camera, an online photo, or picture captured by a witness. This probe photo of the criminal suspect would then be manually uploaded for search. In turn, a template would be created from the probe image, and it would then be compared against all the templates in the gallery database. After comparing the probe to the gallery, the most similar matching images in the gallery would be presented to the analyst for manual adjudication. This process is labor intensive in that the FR system is merely a filtering tool that will reduce the size of the database. A significant amount of time and effort is needed for the manual adjudication process.

USE CASES

Identification of a Bank Robber

from a surveillance video frame.

Identification of an Assaulter

from their online dating profile.

Identification of a Hit & Run Suspect

from a bystander’s cell phone camera.

Automated Search1:N+1

Automated search (1:N+1) is typically performed in high-throughput applications such as traveler screening or video analytics. For example, a human trafficking investigation may require anlayzing terabytes of images and video to identity the different persons present (both victims and perpetrators). This step involves generating templates for every input image processed. Or, in the case of live streaming video, templates are generated at a rate of roughly five (5) video frames per second (FPS). For video, after the templates have been generated they are often clustered into the different identities present. This clustering and tracking step involves cross-comparing all the templates. Without an efficient template comparison speed and clustering algorithm, this process can be very time consuming and generally grows exponentially in time as a function of the number of templates being clustered. Each clustered identity, or each individual template if no clustering is performed, is then searched against available watch-list galleries. Any probe template that matches a gallery template beyond a predetermined similarity threshold will trigger an identity match alert. Or, in the case of a passenger screening for an automated search application, either a single image is manually captured by an operator, or a live video stream of a passenger is captured and automatically distilled down to a single representative photograph. In the case of the single image, the face image is captured, analyzed for quality conformance (e.g., using an automated quality metric and / or validation of ICAO compliance), and templatized. In the case of live video, five (5) to ten (10) FPS needs to be captured and templatized, followed by identity tracking and grouping, and finally cross-comparing templates from the recent collection sequence and possibly applying spatio-temporal constraints. The template for each passenger being screened can then be compared against multiple galleries, such as a passenger manifest or No Fly List. Any probe template that matches a gallery template beyond a predetermined similarity threshold will trigger an identity match alert. Or, in the case of the passenger manifest, if the presented passenger identity does not match any person in the manifest, a match alert would occur.

USE CASES

Bank Account Access

Secure Facility Access

Phone Unlock

Tax Return Filing

Hardware Considerations

Algorithm Efficiency

Previous Rank One articles have provided significant insights into the various efficiency metrics that influence an FR algorithm’s deployability. For new readers we highly recommend reading those articles, particularly our initial article on the topic. To summarize these metrics:

Template generation speed is the time needed to initially process a face image or video frame.
Template size is the memory required to represent facial features of a processed face image.
Comparison speed is the time needed to measure the similarity between two facial templates.
Binary size is the amount of memory needed to load an algorithm’s model files and software libraries.

The performance of an FR algorithm across these metrics will dictate whether or not they can run on a given hardware system. And, across the FR industry there is a tremendous amount of variation in efficiency metrics across different vendors. The following graphic demonstrates how different metrics can influence the amount of CPU throughput or memory needed for a hardware system:

Hardware Components

Different hardware and network resources may be available or desired for a given application. The common architectural components are:

Persistent server / desktop – low quantity, high cost, high processing power and memory. These systems will typically host FR libraries and/or system software. These systems will typically have server grade x64 processors and potentially GPU processors.
Embedded device – low-cost, high quantity devices with limited processing power and memory that can either host FR libraries on-edge or operate as a “thin-client” that passes imagery to a server or cloud system for processing. These systems typically have mobile grade ARM processors and potentially Neural Processing Units (NPU’s).
Scalable cloud – arrays of server resources abstracted through a cloud resource management system.
Network – communication channels between devices. Networks will have varying amounts of bandwidth depending on their properties.

Depending on the application and available hardware resources, different FR system architectures need to be deployed. And, depending on the architecture used, different FR algorithm efficiency requirements will emerge. This is because of the differences in processing and memory resources across these different hardware systems:

Note that this article does not specifically cover GPU acceleration, whether through a traditional NVIDIA CUDA-enabled GPU or an embedded Neural Processing Unit (NPU), but readers can assign such hardware components to the “processor” category. The main distinction is that GPU acceleration generally decreases the throughput-cost for CPU dependent applications.

Architecture Options

In this remainder of this article we will walk through the various architectures that are encountered when developing a face recognition system and the algorithm efficiency considerations for each architecture will then be discussed.

Persistent server and Desktop systems

Server and desktop systems are typically used in analyst driven applications, such as forensic analysis of digital media evidence, systems with predictable workloads such as an identity document agency (e.g., a DMV), or high-value systems with infrequent use (e.g., a law enforcement search system). These systems will typically stay installed on the same computer for several years at a time.

Advantages

Hardware flexibility
Predictable cost
Predictable throughput
High throughput

Disadvantages

Hardware cost
Lack of redundancy
Lack of scalability
Lack of portability

Algorithm limitations when using a persistent server:

Identity Verification – 1:1

Slow template generation speed will reduce throughput/system response time
Large binary size will impact system restart speed
High hardware cost
Powerful network needed for decentralized sensors

Manual Identification – 1:N

Large template size will require significant memory resources
High template generation speed will delay search results
High comparison speed will delay search results

Manual Identification – 1:N+1

High template generation speed will reduce throughput (e.g., video processing)
Large template and binary sizes will require significant memory resources

Embedded Devices

Embedded devices such as a phone or consumer electronic device are low cost and highly capable when running properly designed software. There are fundamental limits on what can be achieved by an embedded processor (e.g., ARM) and thus template generation speed and template size can play a major role in FR system requirements.

Advantages

Low hardware cost
Portability

Disadvantages

Limited hardware capacity
Limited power resources
Requires highly efficient algorithms

Algorithm limitations per FR application when using embedded devices

Identity Verification – 1:1

Slow template generation speed will cause major latency (> 3 seconds)
Large binary size will occupy a high percentage of available memory

Manual Identification – 1:N

Template size must be very small due to memory limits
High template generation speed will significantly delay search results
High comparison speed will significantly delay search results
Large binary size will occupy a high percentage of available memory

Manual Identification – 1:N+1

Slow template generation speed will render video processing impossible
Template size must be very small due to memory limits
Large binary sizes will exasperate memory resources

Scalable Cloud

A scalable cloud architecture, such as Kubernetes, running on a scalable cloud hardware provider, can be highly valuable for application workflows that have varied and unpredictable throughputs.

Advantages

Highly scalable
Pay per usage
Redundancy
Fault tolerance

Disadvantages

Latency to instantiate new nodes
Memory limitations
Higher cost to initially implement

Algorithm limitations per FR application when using the cloud

Identity Verification – 1:1

Large binary size will slow container instantiation time
Poor network bandwidth will delay image transmission
Slow template generation speed will reduce throughput / system response time

Manual Identification – 1:N

NOT ADVISED TYPICALLY
Large template size or large number of templates will make container instantiation very slow; as such, not typically advised
Gallery size is typically too large to instantiate containers in less than 30 seconds

Manual Identification – 1:N+1

Slow template generation speed makes video processing expensive
Large template size, large number of templates, and/or large binary size will make container instantiation very slow
Poor network bandwidth will prevent video transmission to the cloud

Summary

There are a wide range of considerations when building and deploying a face recognition system. This article walked through such considerations related to what hardware is being used to deploy such a system, and the various algorithm properties that are needed to run effectively on such hardware. Such an understanding is critical because while the majority of marketing focus on face recognition algorithms is on accuracy, the top 100 performers in NIST FRVT are often separated by less than a 1% in accuracy. By contrast, the efficiency of an algorithm can vary by 5x to 10x and can be make-or-break when it comes to the successful deployment of a face recognition system. The Rank One algorithm is the only Western friendly vendor to consistently achieve top performance marks in both FR algorithm accuracy and efficiency. As such, regardless of the FR application or the available hardware resources, the ROC SDK is an ideal backbone for any FR system configuration. Contact our team today to begin your trial of our industry leading face recognition algorithms and software libraries!

The post Hardware Considerations when Architecting a Face Recognition System appeared first on ROC.

Rank One Continues Strong Performance in Most Recent DHS Biometric Technology Rally

Brendan Klare — Tue, 16 Mar 2021 18:39:27 +0000

In the Fall of 2020 the U.S. Department of Homeland Security (DHS) Science and Technology Directorate (S&T) hosted another Biometric Technology Rally to assess the capabilities of biometric algorithms for automated passenger identification in real-time screening systems. Due to the presence of COVID-19, the evaluation was expanded to include the identification of passengers wearing masks.

Similar to the 2019 rally Rank One Computing (ROC) emerged as a stand-out performer in the 2020 rally. The DHS-assigned alias for the ROC algorithm was “Owens” on unmasked faces and on masked faces it was “Pond”.

Rank One’s accuracy stood out when including errors from the acquisition systems:

On masked faces ROC was the only algorithm to achieve above 90% accuracy on two different acquisition systems.
On masked faces ROC had the second highest accuracy of any algorithm with 91.9%

When excluding acquisition errors, even more impressive accuracy metrics were achieved:

On unmasked faces ROC achieved at least 99% accuracy with 3 of the 6 different acquisition systems
On masked faces ROC achieved above 94% accuracy with 4 of the 6 acquisition systems. Rank One was one of only two algorithms to achieve such accuracy.

As with previous years, the Rank One algorithm demonstrated that it is perhaps the only U.S. developed technology that can be trusted for accurate and effective real-time screening applications.

In addition to the strong performance by Rank One, a few other vendors have recently proclaimed strong results in the DHS competition, including NEC, Innovatrics, and Corsight. However, despite strong accuracy performance from Rank One and certain other vendors, the difference in hardware efficiency between these solutions is a notable consideration.

Efficiency Matters

In terms of efficiency, while the 2019 DHS Rally had fixed limits on the efficiency of algorithm submissions (algorithms needed to perform template generation and matching in under 250ms), this year’s rally did not impose such restrictions. Thus, the matching algorithms submitted to this year’s rally had significant variations in hardware efficiency.

Hardware efficiency, and in particular template generation speed, is an important consideration when deploying real-time screening systems. In fact, some face recognition solutions will require spending more on additional hardware than the licensing fees for software itself. This is why efficiency metrics are critical to consider when planning and procuring an enterprise grade real-time screening capabilities.

Of the vendors who have publicly stated their participation in the 2020 DHS Biometric Rally, surprisingly, only Innovatrics participates in the industry-standard NIST FRVT Ongoing report. The following is the efficiency comparison between ROC and Innovatrics:

Based on the FRVT benchmark, the Rank One algorithm can generate templates a staggering 7.5x faster than the Innovatrics algorithms, in addition to being an order of magnitude more efficient in memory usage. In real-time screening applications, this difference in template generation speeds means ROC would cost 7.5x less CPU in hardware expenses to achieve the same application workflow.

While NEC has surprisingly never participated in FRVT Ongoing, despite being deployed in several U.S. national security systems, they have submitted to FRVT 1:N. The following is the efficiency comparison between ROC and NEC:

Similarly, the NEC algorithm requires more than 3x the CPU throughput as the ROC algorithm to generate templates for faces encountered in an image or video frame. The total throughput difference to both generate a template and search it against a large-sized gallery is more than 2x different. The memory usage of the Rank One algorithm is an order of magnitude less than NEC.

The company Corsight had never submitted their algorithm to NIST FRVT benchmarking at the time this article was published.

Comparison to Iris Recognition Algorithms

In addition to the 10 face recognition matching algorithms submitted to the rally, three different iris matching algorithms were submitted. While iris recognition pairs very well with face recognition algorithms, as a stand-alone biometric all three solutions performed significantly worse than the Rank One face recognition algorithms in the rally.

When including acquisition errors, the highest accuracy achieved on unmasked faces by any of the three iris algorithms was 80.4%. By comparison, the ROC algorithm achieved an accuracy of 98.1% in these same conditions.

When excluding acquisition errors the highest accuracy achieved on unmasked faces by any of the three iris algorithms was 97.9%. By comparison, the ROC algorithm achieved an accuracy of 99.4% in these same conditions.

When masks were present, the iris recognition algorithms were impacted further, with the best performing solution only achieving an accuracy of 67.3% including acquisition errors. By comparison, the ROC algorithm achieved an accuracy of 91.9% in these same conditions.

Summary

Rank One Computing once again emerged as one of the most accurate matching algorithms in the DHS Biometric Rally. Impressively high accuracies were achieved on both masked and unmasked faces. In addition to the accuracies achieved by the ROC algorithm, it continues to be a stalwart of efficiency in the NIST FRVT benchmarks. The combination of top-tier accuracy and hardware efficiency both put Rank One in a category of its own and save ROC’s partners and customers substantial amounts of money in hardware costs.

The post Rank One Continues Strong Performance in Most Recent DHS Biometric Technology Rally appeared first on ROC.

Understanding the Importance of Peak Memory Usage

Brendan Klare — Wed, 12 Feb 2020 22:07:51 +0000

When building mobile, on-edge, or embedded face recognition applications, there is typically a small amount of memory (i.e. RAM) available. If a face recognition algorithm requires an extensive amount of RAM to perform enrollment and matching, then this could increase the hardware costs or, in the case of many embedded applications, wholly prevent the concept.

The latest versions of the NIST FRVT Ongoing benchmark are the first to ever provide the peak memory usage for each benchmarked algorithm, which measures how much memory is required to load an algorithms software libraries and statistical models.

Unfortunately to developers of embedded applications, most of the algorithms benchmarked in FRVT use too much memory to operate in such resource constrained applications.

One of several reasons that Rank One has emerged as a leading solution provider for embedded systems (e.g., smart doorbells, automobiles, access control), mobile phones (e.g. device unlock, on-device search), and on-edge applications (e.g., smart security cameras) is that the algorithms shipped in the ROC SDK require significantly less memory than competing solutions.

According to the FRVT efficiency statistics (Tables 6 to 10 in the 6 January, 2020 FRVT Ongoing report), Rank One’s most recent algorithm used a mere 70MB of memory, lower than any other vendor with competitive accuracy rankings (the only vendors using less memory were 50x less accurate). The median memory usage across the 195 algorithms benchmarked was 730MB, and compared to certain face recognition vendors that advertise mobile face recognition solutions, Rank One uses up to 25x less memory.

Here is a breakdown of the memory usage of Rank One versus relevant competing solutions:

Rank One’s ability to deliver software libraries that use such little memory respective to other solutions is based on a variety of trade secrets. The result of these techniques, though, is a unique offering in terms of both high-end accuracy and enough efficiency to operate on extremely low-cost hardware required in embedded systems.

For cases where even 70MB of memory is too much (as is the case for certain Rank One partners), Rank One also ships a secondary “Fast” algorithm which uses less than 10MB RAM.

Memory usage is one of several critical efficiency factors to consider when procuring a face recognition algorithm. Read more about this this topic and other related procurement considerations:

The post Understanding the Importance of Peak Memory Usage appeared first on ROC.

Race and Face Recognition Accuracy: Common Misconceptions

Brendan Klare — Thu, 12 Sep 2019 21:41:42 +0000

There is a misconception that face recognition algorithms do not work on persons of color, or are otherwise inaccurate in general. This is not true.

The truth is that across a wide range of applications, modern face recognition algorithms achieve remarkably high accuracy on all races, and accuracy continues to improve at an exponential rate.

The most comprehensive industry standard source for validating a face recognition algorithm is the U.S. National Institute of Standards and Technology (NIST) Face Recognition Vendor Test (FRVT). For two decades, this program has benchmarked the accuracy of the leading commercially-available face recognition algorithms.

Due to the rapid progression of face recognition technology in recent years, NIST FRVT introduced the “Ongoing” benchmark program, which is performed every few months on a rolling basis. FRVT Ongoing measures identity verification accuracy across millions of people and images, with wide variations in image capture conditions (constrained vs. unconstrained), and person demographics (age, gender, race, national origin).

One dataset analyzed in depth in the NIST FRVT Ongoing benchmark is the Mugshot dataset. Performance reported on this dataset includes accuracy measurements on over one million images and persons, including accuracy breakouts across the following four demographic cohorts: Male Black, Male White, Female Black, and Female White.

In terms of overall face recognition accuracy, leading algorithms are extremely accurate on the Mugshot dataset. The top performing algorithm identified faces at 99.64% accuracy for True Positive / Same Person comparisons and 99.999% accuracy for False Positive / Different Person comparisons. There are another 50 different algorithms benchmarked that identified faces with at least an accuracy of 98.75% for Same Person comparisons and 99.999% accuracy for Different Person comparisons.

In terms of accuracy breakouts across the four race and gender cohorts, all of the top 20 algorithms are found to be the most accurate on Male Black subjects.

The following score chart breaks down the accuracy rank for each of the four cohorts across the top 20 algorithms:

As shown in the above tally, Male Black was the most accurate demographic cohort for the top 20 most accurate algorithms analyzed by NIST. While this is counter to conventional wisdom and the media’s narrative, the results are not particularly surprising. Here is why:

Face recognition algorithms are highly accurate on all races.

For the above 20 algorithms, the median difference between the most accurate and least accurate cohort for a given algorithm was only 0.3%.

Academic institutions have been publicizing non-academic research.

The widespread belief that Race significantly biases face recognition accuracy is due to a non-peer reviewed investigative journalism article from Georgetown Law titled “The Perpetual Line-Up: Unregulated Police Face Recognition in America”. The sole source for the claim on racial biases was a peer-reviewed article written by myself and colleagues in 2012. This article was incomplete on the subject and not sufficient for being the sole source cited, as it indirectly has been since the publishing of the Georgetown report.

Another common source cited for the inaccuracy on persons of color is from a study performed by the MIT Media Lab, which did not measure face recognition accuracy. In the “Gender Shades project”, the accuracy of detecting a person’s gender (as opposed to recognizing their identity) was measured. Two of the three algorithms studied were developed in China, and had poor accuracy at predicting the gender of the Female Black cohort. Still, this study has been widely cited as an example of face recognition being inaccurate on persons of color. Again, this study did not measure face recognition accuracy.

Fast forward, and there has been a recent campaign to ban face recognition applications outright, regardless of their purpose and societal value. These initiatives are often premised on the claim that the algorithms are inaccurate on persons of color, which, as we have shown, is not true.

A path forward

Given the disproportionate impact of the criminal justice system on Black persons in the United States, the concern regarding whether a person’s race could impact a technology’s ability to function properly is valid and important. However, the current dialogue and public perception has left out a lot of key factual information, and has further confused the public regarding how face recognition is used by law enforcement.

In addition to the public being misled on the accuracy of FR algorithms with respect to a person’s race, the public has even been misled as to how law enforcement uses face recognition technology, as clarified in a recent article. In turn, cities like San Francisco have used such misinformation to compromise the safety of their constituents.

It is not in our nation’s interest to decide public policy based on politically motivated articles with weak scientific underpinnings. The benchmarks provided by NIST FRVT are currently the only reliable public source on face recognition accuracy as a function of race, and according to these benchmarks, all top-tier face recognition algorithms operating under certain conditions are highly accurate on both Black and White (as well as Male and Female).

–

Like this article? Subscribe to our blog or follow us on LinkedIn or Twitter to stay up to date on future articles.

Popular articles:

FormWithSteps

Which capabilities are you interested in?

Multimodal Biometrics

The fastest, most advanced SDK in biometrics. Packed with industry-leading, NIST-ranked face, fingerprint, and iris recognition algorithms.

Video Analytics & Live Alerting

Protect your campus with live threat detection powered by computer vision and AI. Includes weapon, tattoo, license plate, object detection, badgeless visitor management.

Identity Proofing and Onboarding

Know your customer with seamless digital authentication: multimodal biometrics, face analytics, ID authentication, liveness detection, age and gender verification.

If you are human, leave this field blank.

The post Race and Face Recognition Accuracy: Common Misconceptions appeared first on ROC.

Hardware requirements for video processing applications – Part 1: Template generation

Brendan Klare — Mon, 12 Aug 2019 16:52:05 +0000

When automated face recognition technology is used for analyzing streaming video, an important question is: how much computer hardware is needed?

The hardware required to process video depends on several factors which will be discussed in this article. After reading you should be able to determine hardware requirements for a particular application and algorithm.

If you are unfamiliar with the efficiency metrics of relevance for face recognition algorithms we recommended you first read our previous article on why efficiency metrics matter.

Enrolling frames: the computationally burdensome step in video processing

There are two face recognition steps for video processing applications that will require processing power and hardware use: (i) detecting and enrolling faces found into templates for each processed video frame, and (ii) searching the created templates against a gallery. If these concepts are unfamiliar, we encourage you to read our article on how face recognition works.

This article focuses on the efficiency requirements for enrolling faces in video frames into templates, which is the computational bottleneck for video processing applications. There are also computational demands for performing watch-list searches of templates processed in video frames against gallery databases. However, because watch-list searching is not nearly as a computationally burdensome, we instead cover the computational cost of template comparison in video applications in a supplemental article.

Enrolling video frames is a CPU intensive task. In order to assess the number of CPU cores required you first need to determine the following information:

Number of Streams: the number of video streams that will be processed concurrently.
Max Faces: the maximum number of faces that will appear across all video feeds at a single given time.
Templates per Second: the number of templates per second the face recognition algorithm can generate on a single CPU core. Note that NIST FRVT Ongoing reports this statistic as Enrollment Speed, where Enrollment Speed = 1 / Templates per Second, which is the time it takes to generate a single template.
Frames per Second: the number of frames per second (fps) the algorithm will process (as recommended by your vendor and use case).

Using this information, the number of CPU cores required for enrollment is roughly determined as follows:

Enrolling CPU Usage = Max Faces * Frames per Second / Templates per Second

If, however, the Enrolling CPU Usage is less than Number of Streams, then:

Enrolling CPU Usage = Number of Streams.

For example, let’s suppose that your application will be processing 4 camera feeds (Number of Streams = 4) with a maximum of 10 faces at a time (Max Faces = 10), i.e., across all 4 cameras feeds, at any given time, the largest number faces that will appear at one time is 10, and your face recognition algorithm can generate 4 templates per second per CPU core (Templates per Second = 4) and recommends processing 5 fps (Frames per Second = 5). This would mean 10 * 5 / 4 = 12.5, which implies the Enrolling CPU Usage is 13 (i.e., 13 CPU cores), as you should always round up.

As another example, let’s suppose that your application will be processing 6 camera feeds (Number of Streams = 4) with a maximum of 4 faces at a time (Max Faces = 4), and your face recognition algorithm can generate 4 templates per second per CPU core (Templates per Second = 4) and recommends processing 5 fps (Frames per Second = 5). This would mean 4 * 5 / 4 = 5. Because 5 is less than the Number of Streams, 6, the Enrolling CPU Usage is 6 (i.e., 6 CPU cores).

Using these guidelines to determine hardware costs for a vendor solution

It is important to use the guidance in this article in conjunction with the information provided by your algorithm vendor, as well as their measured performance in the NIST FRVT Ongoing benchmarks. As always, a vendor who does not submit their algorithm to NIST FRVT should never be considered.

The information you will need from your vendor is the enrollment speed, and the number of frames per second recommended. Information you will determine yourself is the maximum number of faces appearing across all video streams at any given time, and the total number of video streams. When used with the formula provided in this article, you will be able to properly estimate the hardware requirements for your application.

–

Like this article? Subscribe to our blog or follow us on LinkedIn or Twitter to stay up to date on future articles.

Popular articles:

The post Hardware requirements for video processing applications – Part 1: Template generation appeared first on ROC.

Hardware requirements for video processing applications – Part 2: Template comparison

Brendan Klare — Sun, 11 Aug 2019 16:52:22 +0000

While enrolling video frames into templates is the bottleneck for video processing applications in face recognition, there is also a computational cost for using the generated templates for search and identity verification. While the cost is often negligible, for large-scale applications it can become meaningful enough to need to be factored into procurement considerations.

In this article we will discuss the computational considerations for template comparison tasks in video processing applications. We encourage first reading our article on the computational cost of generating templates in video processing applications, as this is the computational bottleneck. For readers unfamiliar with face recognition efficiency metrics we recommended you read the article on why efficiency metrics matter as well.

Hardware required for searching and comparing templates

Templates generated during the enrollment process will typically be further processed by a template comparison algorithm for the following purposes: tracking, consolidating searching, and/or verifying. In discussing the computational demands of these tasks, the following information is needed:

Comparisons per Second: the number of template comparisons per second, per CPU core, that can be performed by the algorithm. Note that NIST FRVT Ongoing reports this statistic as Comparison Speed, where Comparison Speed = 1 / Comparisons per Second, which is the time it takes to perform a single template comparison.
Templates per Track: is the number of templates retained in a “face track”, where a face track is a set, or subset, of consecutive set templates of a person tracked in a video feed.
Max Faces: the maximum number of faces that will appear across all video feeds at a single given time.

Tracking

Typically a first step in using templates generated from video feeds is to track the templates into the different identities present in the video feed.

Tracking faces in video generally requires all faces detected in a given video frame to be compared against all existing face tracks. In turn, the face templates can be assigned to the face track corresponding to the same identity. The computational cost of this operation is roughly determined as:

Tracking CPU Usage = (Max Faces)^2 * Templates per Track / Comparisons per Second

Note that Max Faces is representative of both the maximum number of faces present across all video feeds, and the number of face tracks being processed.

Typically the Tracking CPU Usage is extremely low. For example, if there are at most 10 faces in a set of video feeds at a given time (i.e., Max Faces = 10), and each face track retains 5 templates (i.e., Templates per Track = 5), and Comparisons per Second = 1e7 (1e7 = 10 million), then Tracking CPU Usage = 10^2 * 5 / 1e7 = 0.00005, which is a fairly trivial fraction of CPU usage.

Though this approach does not require much processing power (if an algorithm has a fast comparison speed), some algorithms may use lower cost heuristics, such as adding templates to the track that is closest in spatial location (e.g., if faces in consecutive frames were detected in the same location, put them in the same track). However, approaches like this can suffer from poor tracking in the presence of densely located faces (e.g., two people close to each other) or other factors. Unless an algorithm has an unreasonably slow comparison speed, it should be assumed that the template comparisons described above are performed for tracking.

The output of the tracking step is typically sets of templates corresponding to the different identities present in the video feed. There will often be a subsequent task of either searching these templates against a gallery (1:N+1) or comparing them to a claimed identity (1:1), which are discussed below. However, it could also be the case that tracking and consolating are performed to store the identities and no further comparisons will be required beyond tracking.

Consolidating

In order to maintain the Templates per Track, each time a new template is matched to a track during the Tracking step, a decision needs to be made whether or not to retain this template in the corresponding face track, and, if it is retained, which existing template to drop (in order to not exceed the Templates per Track limit).

There are many different techniques that can be employed to determine which template to retain. In some cases this will be templates with the highest quality scores, in other cases this will be templates with different characteristics (e.g., different face pose angles).

The most computationally exhaustive approach involves cross-matching all existing templates in a track, along with the newly detected template from the track. In turn, a template can be dropped based on the similarity information (e.g., drop a template that provides the least additional information). The computational cost of consolidating all tracks at once is:

Consolidating CPU Usage = (Max Faces * (Templates per Track + 1) * Templates per Track / 2) / Comparisons per Second

Similar to tracking, for consolidating, the CPU usage is typically very low. For example, if there are at most 10 faces in a set of video feeds at a given time (i.e., Max Faces = 10), and each face track retains 5 templates (i.e., Templates per Track = 5), and Comparisons per Second = 1e7, then Tracking CPU Usage = (10 * 6 * 5 / 2) / 1e7 = 0.000015.

Searching

Often times tracked faces are searched against a gallery in order to determine the person’s identity, which is also known as watch-list identification. This may be for different reasons, including determining if the person is on a security blacklist or on a VIP whitelist.

Typically this process is done once per face track. There are different ways that the templates in a face track can be searched against a gallery. For example, all the templates could be searched against the gallery, which is the most computationally burdensome approach (as well as the most comprehensive from an accuracy perspective). Or, a single template (such as the one with the highest quality score) can be used for a single search. We will assume that all templates in a face track are searched.

The computational cost for searching a face track against a gallery of N templates is:

Searching CPU Usage = Max Faces * Templates per Track * N / Comparisons per Second

For example, if there are 10 face tracks (i.e., Max Faces = 10), 5 Templates per Track, a watch-list gallery with 1e4 templates (1e4 = 10,000), and a Comparisons per Second of 1e7, then Searching CPU Usage = 10 * 5 * 1e4 / 1e7 = 0.05.

If an algorithm has a fast comparison speed, then there not be a significant amount of CPU usage for searching. However, while the Rank One algorithm has a comparison speed of roughly 1e7 (i.e., 10M comparisons per CPU core, per second), the average NIST FRVT algorithm is roughly 10x slower, which would mean 0.05 CPU usage would become 0.5 CPU usage, which, in this example, means an additional CPU core would be required. Other NIST algorithms have 100x to 1000x slower comparison speeds, which creates significant CPU requirements to perform video-based watchlisting.

In addition to the wide fluctuation in comparison speeds, the number of templates in the gallery N can significantly influence the CPU usage. In the example above, N was set to 1e4. This number is somewhat meaningful in that larger gallery sizes typically cannot be searched with stable accuracy in watch-listing applications. However, depending on the level of security involved in an application, and in turn the number of human analysts available to adjudicate watch-list match alerts, the size of N can be upwards of 1e6 (i.e., 1 million) or even beyond 1e8 (i.e., 100 million). In these cases several additional CPU cores may be required.

Verifying

It may be the case that face tracks are used for verifying a person’s identity. In this scenario the person claims to be a given identity, and face verification is performed by comparing the person’s presented face to the stored face template(s) corresponding to this person’s identity. The person may claim their identity by entering a pin, scanning an access card, providing an NFC token from their mobile phone, or other methods.

Typically in these cases there is only one face in each video stream, as most access control identity verification systems are designed to process one user at a time. The Max Faces value could still be higher than 1, though, as a central server could be processing multiple video feeds / access control points.

The computational cost for verifying from video streams is determined as follows, where Templates per Person is the number of templates stored for each identity in the system:

Verifying CPU Usage = Max Faces * Templates per Track * Templates per Person / Comparisons per Second

For example, if there are 10 face tracks (i.e., Max Faces = 10), 5 Templates per Track, 5 Templates per Person stored, and a Comparisons per Second of 1e7, then Verifying CPU Usage = 10 * 5 * 5 / 1e7 = 0.000025.

Aside from an algorithm with an extremely slow comparison speed, or a system that is processing a large number of face tracks (i.e., a high Max Faces), compute cost for verification is typically trivial.

Summarizing hardware costs for template comparison

Typically there is not a large CPU cost for the different tasks that may be performed after enrolling video frames (i.e., tracking, consolidating, searching, and/or verifying). However, certain factors may result in meaningful CPU resources being required. These include a slow template comparison speed, a large gallery for watch-listing/searching, and a large number of Max Faces (persons in the video streams at once).

For watchlisting / searching applications, there will also be memory requirements that will be directly based on the algorithm’s template size. Please refer to our previous article on the implications of template size for this consideration.

It is important to use the guidance in this article in conjunction with the information provided by your algorithm vendor, as well as their measured performance in the NIST FRVT Ongoing benchmarks. As always, a vendor who does not submit their algorithm to NIST FRVT should never be considered.

–

Like this article? Subscribe to our blog or follow us on LinkedIn or Twitter to stay up to date on future articles.

Popular articles:

The post Hardware requirements for video processing applications – Part 2: Template comparison appeared first on ROC.

How Forensic Face Recognition Works

Brendan Klare — Wed, 12 Jun 2019 21:57:52 +0000

There is a misconception that law enforcement agencies in the U.S. use automated face recognition to actively surveil public spaces. Such a dragnet of mass real-time identification and surveillance would be a violation of the Fourth Amendment to the United States Constitution. While autocratic countries may intend to use face recognition technology for nefarious purposes, in the United States, and other nations with inalienable human rights, there is no systematic intent or process designed to exploit facial recognition technology in this manner.

Concerns that face recognition could be used to invade privacy are valid; however, this is not how it is being used by U.S. law enforcement. To the contrary, law enforcement primarily uses face recognition as a post-incident forensic tool to enable detectives and analysts to generate investigative leads in violent and harmful crimes.

In this article we explain both how forensic face recognition works, and how it is used by law enforcement in this country.

Step 1: A violent or harmful crime occurs

While modern societies have become safer with each passing decade, there were still over a million incidents of violent crime in the U.S. in 2017. These incidents range from murder (17,284 incidents), to rape (135,755 incidents), to aggravated assault (810,825 incidents), amongst many other crimes. Similarly, cases of burglary, larceny, arson, and fraud take a tremendous toll on victims.

Step 2: An image of a perpetrator (or victim) is available.

This image of the perpetrator could come from a number of different sources. For example:

The victim of a sexual assault could have the perpetrator’s image from an online dating site they met on.
A store owner who was the victim of an armed robbery could have a camera system installed that captured the robber’s face.
A high density tourist area may have recorded footage of a terrorist leaving a bomb.
A video of an unidentified adult engaging in inappropriate acts with a child may emerge while a warrant is being served for a related crime.
A homeowner’s doorbell may capture a picture of a burglar.
A traffic camera may have captured a person’s face before a violent act of road-rage.

In certain cases it is instead a victim who needs to be identified. This could be a deceased person without identification, or a victim filmed in a child exploitation case.

Step 3: An investigator or analyst searches the image against a database

The photograph or video frame image of the unidentified person of interest, often referred to as a probe image, is sent to a detective, analyst, or operator who manages digital forensic evidence. This human operator in turn uses automated face recognition to search the probe image against an available database of face images (often referred to as the gallery).

The galleries that are available for this search-and-compare process will vary, depending on the agency and jurisdiction. For most law enforcement agencies this will include mugshot arrest images. For certain law enforcement agencies, depending on state and local laws, the database may also include images from other Government agencies that grant identification cards (e.g., DMVs), criminal watch lists, or data otherwise meaningful to share.

Step 4: A candidate list is returned that contains the closest matching faces

Once this automated search is complete, the operator of the system will receive back a rank-ordered list of the top matches, where the first result is the image in the gallery that has the highest similarity score to the probe image. The second result will be the image in the gallery second highest similarity score, and so on.

The number of candidate matches returned will vary depending on the configuration of the system. For example, in some configurations only images that exceed a certain similarity score are presented. In other systems, the top N results are returned, regardless of similarity score, where N may be 20, 50, or 100.

Step 5: Candidate list is examined by the analyst

The operator of the system, who has often been trained in facial comparison, will examine the returned candidate list to determine if any of the candidate images match the person of interest in the probe image.

When performing comparisons, the analyst will examine the various morphological features of the face and document the entire comparison process. Most forensic search systems have automated tools that significantly improve an analyst’s ability to compare the two faces and document the process.

Step 6: If the analyst determines there is a high likelihood of a match, then an investigative lead report is generated

If the probe image from the person of interest has facial characteristics that indicate a strong match to a person in the gallery, then an investigative lead report is generated.

An investigative lead is not probable cause for arrest. The detective investigating the crime will use the investigative lead generated from face recognition technology as a potential clue; a clue that could potentially lead to solving the case.

Public safety benefits, without harm

An investigative lead generated by the forensic face recognition process could be the difference between whether or not a person who inflicts harm upon others is identified. This investigative method has tremendous benefits in terms of public safety. When performed under proper standards, this forensic procedure does not have the propensity for harm mistakenly claimed by those who think this use is akin to active real-time surveillance.

Given the percolating misunderstanding of how law enforcement agencies use automated face recognition technology, we will summarize certain points covered in this article as they relate to myths around use of the technology in the U.S.:

There is not a mass network of cameras that are identifying persons in real-time across public spaces.
Automated face recognition in law enforcement is predominantly a post-incident method used when a harmful crime occurs, and is a key forensic crime-solving tool
The results from an automated face recognition search are carefully examined by an analyst or operator. If an analyst finds a strong likelihood of a match, this information is considered an investigative lead, and is not probable cause for arrest.
There are no documented cases in the U.S. of invasion of privacy or wrongful arrest due to forensic face recognition despite over a decade of successful use.

Governing the use of face recognition technology is a good idea, but it must come from an informed point of view. When legislators make decisions based on campaigns of misinformation, the safety of their constituents suffers. Face recognition technology, used as a forensic process, provides incredible benefits that greatly enhance the safety of our citizens without compromising guaranteed civil liberties or privacy rights.

–

Like this article? Subscribe to our blog or follow us on LinkedIn or Twitter to stay up to date on future articles.

Popular articles:

The post How Forensic Face Recognition Works appeared first on ROC.

10 Steps for Selecting a Face Recognition Algorithm

Brendan Klare — Wed, 08 May 2019 21:47:33 +0000

Step #1: Understand your application

Common face recognition applications include forensic search, real-time screening, identity deduplication, and access control. Each application will involve different types of facial imagery (constrained or unconstrained) and will have different technical requirements.

Step #2: Determine if you need an SDK or a system

An SDK allows engineers to build a system.

If you need a system, such as a turn-key surveillance system or a custom designed visitor management system, then you will be engaging with integrators and product developers who either use a third-party face recognition SDK, or develop their own face recognition algorithm. It is important to ask them which is the case, and, if they use a third-party SDK, who is the SDK vendor.

If instead you are a systems integrator or product developer that needs to license an SDK in order to develop a face recognition system and product, then you will be engaging with an SDK vendor.

Step #3: Determine your accuracy and efficiency requirements

The accuracy and hardware requirements of a face recognition system are critical factors for success.

From an accuracy perspective, while the capabilities of modern algorithms are exceptionally high, all face recognition systems will exhibit some degree of errors. Know ahead of time what error rate you can accept in your application.

In terms of efficiency, top-tier face recognition algorithms can have a substantial difference in efficiency with only a minimal difference in accuracy. And, depending on the efficiency of an algorithm, hardware costs can soar, or wholly prevent a concept from being developed. The following links discuss how different efficiency metrics (enrollment speed, template size, comparison speed, and binary size) pertain to forensic search, access control, and real-time screening.

Step #4: Determine your business requirements

Factors that may influence your decision making process include software licensing costs, hardware requirements, budget, technical support, license enforcement mechanisms, cloud versus on-premises, and vendor geographic location (e.g., most Western organizations do not want to use Russian or Chinese developed software in security infrastructure).

Step #5: Analyze the NIST FRVT Ongoing report

The National Institute of Standards and Technology (NIST) conducts FRVT Ongoing, an up-to-date evaluation of the accuracy and efficiency for all legitimate face recognition vendors. A face recognition algorithm that has not been submitted to FRVT Ongoing should never be considered. It takes orders of magnitude less time for a vendor to submit their algorithm for third-party testing to NIST than it will for you to measure these same metrics.

Table 1 and Table 2 in the FRVT Ongoing report will provide a snapshot of each algorithm’s accuracy and efficiency. These metrics can be referenced against your requirements (Step #3) based on the application and types of imagery you will encounter (Step #1).

While NIST FRVT has many other great reports, FRVT Ongoing is updated multiple times a year and is the only report guaranteed to be current. And, given how fast face recognition algorithms are improving, it is critical to analyze current algorithms, not algorithms from years past.

It is important to note that while the NIST report is the only reliable source for third-party benchmarking results, the accuracy and efficiency reported should only serve as a rough guideline and early filtering resource. A subsequent evaluation on internal facial imagery and hardware that closely replicates your operational setting will be needed to properly assess the fitness of an algorithm or system for your use-case.

Step 6: Perform an initial selection of one or more vendors to evaluate

With consideration to the tips above, choose a vendor, or vendors, that meet your technical and business requirements.

Step 7: Ask for an evaluation

Ask selected vendors, integrators, or suppliers for an evaluation of their technology, whether it is an SDK or a system.

Step 8: Evaluate the vendors technology

You have defined the types of imagery and workflows they will be operating on (Step #1), the accuracy and hardware/efficiency requirements (Step #3), business requirements (Step #4), cross-referenced these against the NIST results (Step #5), and received access to technology that seems to meet all of these requirements (Step #7).

Now you can use the evaluation period to measure accuracy on an internally collected dataset. Along the way you will also learn about the API or GUI, ease of integration, and customer support of the SDK system.

Step 9: Negotiate a licensing agreement

If everything checks out in Step #8 then you will next want to receive a contract from the SDK vendor, system integrator, or product supplier. If you are working with an SDK vendor, you should generally be afforded options to license on a monthly, yearly, or perpetual basis. Perpetual licenses should include evergreen licensing!

You need to understand how long you will have access to software licenses, the cost of those licenses, and whether or not you will receive algorithm and system improvements along the way. Without such information in a binding contract all the time and effort put into the procurement, integration, and/or deployment process could be completely undermined by business and legal disputes down the road.

Step 10: Build or deploy your game changing system!

You have been diligent in following the first nine tips and can now move on with strong peace of mind.

The post 10 Steps for Selecting a Face Recognition Algorithm appeared first on ROC.

Procuring a Face Recognition Algorithm: Efficiency Considerations

Brendan Klare — Thu, 25 Apr 2019 17:07:27 +0000

Suppose you were offered a futuristic virtual reality system on par with The Matrix for just $100. You would purchase it, right?! Now imagine you needed 50,000 sq. ft. of space in your home to run the system. That would change the proposition quite a bit.

In many ways that analogy holds for modern face recognition (FR) algorithms. While accuracies have improved dramatically, the resources required to deploy a given solution varies dramatically by vendor.

Amongst the 70 most accurate NIST FRVT Ongoing algorithms in the Nov. 2018 report, there were the following fluctuations in efficiency metrics:

15x difference in how fast images could be processed for facial detection and vectorizing (i.e., enrollment)
40x difference in the length of the facial vector / template (i.e., template size)
220x difference in the time it takes to compare two templates (i.e., comparison speed)

These differences in efficiency can cause hardware cause costs to soar or wholy prevent an application concept, which should concern anyone procuring a face recognition algorithm.

This article will equip you with the knowledge to assess the efficiency requirements of your face recognition system. In turn, you will be able to factor this important consideration into your procurement process and potentially eliminate certain algorithms before the time consuming step of performing internal evaluations.

Enrollment speed

Enrollment is the process of detecting faces in an image (or video frame) and creating templates that encode the identifying characteristics of each face. It is one of the two steps performed in automated face recognition, and can play a critical role in system design.

The faster the enrollment speed, the less computing power required. Unfortunately, as shown in the following histogram, enrollment speeds vary considerably across face recognition algorithms:

The enrollment speeds provided in the histogram are from Table 1 in the November 2018 NIST FRVT Ongoing report, which measures enrollment speeds on images with a single face.

It is important to note enrollment speed is generally a function of the number of faces in an image that need to be templatized, and, to a lesser extent, the size of the image (or video frame). Thus, if an image has five faces present, it will take significantly longer to enroll than an image with one face present (usually five times as long).

The reason the number of faces in an image is the primary factor for enrollment speeds is that for modern algorithms the representation step of the enrollment process is generally slower than the face detection step. The duration of the representation step will increase directly as a factor of the number of faces in the image. The duration of the detection step changes based on the size of the image and the minimum face size considered.

Enrollment speeds have important implications for the following applications:

1. Video processing
  
  While most cameras capture at a rate of 30 frames-per-second (FPS), typically 5 FPS will suffice for face recognition algorithms. Depending on the enrollment speed of an algorithm, enrolling 5 FPS can make a major difference in the number of CPUs required for real-time processing. Particularly so when each frame may have multiple faces.
  
  For example, if an algorithm takes 800ms to enroll a frame with one face, then to process 5 FPS in real-time would require nearly 5 CPU cores. By contrast, an algorithm that has an enrollment speed of 150ms would be able to perform real-time processing on just 1 CPU core.
2. Enrolling a search database
  
  For search applications, templates must first be generated for each image in the database. The time it takes to enroll a database will be a function of the number of images in the database, the enrollment speed, and the number of CPU cores available.
  
  For example, let’s say a database of 10M images, each with a single face, needs to be processed for search and the system is powered by an 8 CPU core server. An 800ms algorithm is going to require 800ms * 10e6 images / 8 cores = 11.5 days. By contrast, a 150ms algorithm requires only 2 days (assuming there is no bottleneck reading the images from storage).
3. Re-enrolling a search database:
  
  When a vendor ships an improved algorithm it is typically the case that all of the database templates must be re-generated from the original images. And, given the massive rate of improvements in face recognition algorithms, database re-enrollment is more important than ever.
  
  To be clear, it is not the case that a system has to be updated every time a new algorithm is made available by a vendor. Further, some face recognition vendors unfortunately do not make algorithm updates readily available, or they are simply not innovating enough for there to be meaningful algorithm updates. However, assuming that you have (hopefully) selected a vendor that is delivering accuracy improvements on pace with the rest of the industry and you have licensing rights to these updates, at some point you will want to re-enroll your database to create templates from a newer version.
  
  The same time calculation for enrolling a database applies to re-enrolling it. A faster enrollment speed may be the difference in re-enrolling over the weekend on the same hardware that hosts the system, or having to purchase a separate server to perform re-enrollment over the course of a few weeks.
4. Enrolling a probe image:
  
  For both 1:N search and 1:1 identity verification applications, the probe image needs to be enrolled prior to searching it against the gallery (1:N), or compared against the claimed identity (1:1). Often times this enrollment is trivial because it is only a single image, however there are several cases where slow enrollment speeds can become an issue:
  1. 1. Mobile devices: ARM processors, common in mobile devices, typically see enrollment slow by a factor of 2x to 3x.
    2. Battery powered devices: the less time enrollment takes, the less power consumed. If face recognition is persistent (e.g., mobile device unlock), slower enrollment speeds can result in unreasonable usage of the limited battery capacity.
    3. Time-sensitive applications: As enrolling the presentation image is but the first step in a subsequent application (e.g., search or compare), if an application requires a quick turn-around, then slow enrollment speeds with negatively affect the user experience.

Template Size

Templates store the facial identity measurements needed to compare two faces. A critical software system design concept as it pertains to template size is the speed difference when reading data (e.g., templates) from different storage mediums. Specifically, the read bandwidth of RAM versus persistent storage (e.g., a hard-drive or network/cloud storage) can range from 20x to 1000x. Even at 20x difference, this means searching templates stored on disk would be 20x slower than those loaded into RAM.

Due to the above considerations, galleries of templates are first loaded into RAM before being searched. When this is performed the bottleneck for searching typically becomes compute bound as opposed to the I/O bound.

RAM is an expensive resource, particularly when a lot is required. The larger the template size the more RAM required, meaning the distribution of template sizes shown in the histogram above is directly indicative of the amount of RAM required to host search applications.

As an example, let us again consider a database of 10M faces. In this instance, we will consider two different algorithms, one with a template size of 200B (bytes) and one with 2,500B (2.5KB). These template sizes are within the distribution of template sizes by top-tier vendors in the NIST FRVT Ongoing tests.

In the case of the 200B algorithm: 200 bytes * 10M templates = 2GB of RAM required to load the templates.
In the case of the 2.5KB algorithm: 2,500 bytes * 10M templates = 25GB of RAM required to load the templates.

These differences in template size can equate to significantly higher hardware costs for a face recognition system.

Comparison Speed

Comparison speed is similar to template size in that it is quite important in search systems.

For search applications, a reference template is typically compared against every template in the gallery in order to find any matches. Thus, for every gallery template, a comparison must be performed between it and the probe.

In NIST FRVT Ongoing, top-tier vendors vary in comparison speeds from 300 nanoseconds (ns) to 75,000ns, as illustrated in the above histogram. This is a difference of 250x between the two extremes of top accuracy vendors. Typically this indicates a 250x difference in search speeds, or significantly more CPU cores needed to enable a timely search.

Consider searching a 10M template database with a single CPU core. With a 300ns comparison time it would take: 300e-9 * 10e6 = 3 seconds. With a 75,000ns comparison time it would take 750 seconds, or 12.5 minutes, a staggering difference between otherwise similar algorithms.

It is worth noting that there is a strong correlation between template size and comparison speed. For example, in the most recent NIST FRVT Ongoing, the correlation coefficient between template size and comparison speed for the 99 algorithms benchmarked is 0.835.

Binary size

In addition to loading templates into RAM, the face recognition software itself will occupy a generally fixed amount of memory. The memory used by a face recognition software comprises of (i) code libraries and (ii) statistical models.

While some face recognition software development kits (SDKs) are carefully implemented to minimize software libraries, others have not been designed with this consideration. Thus, there could be a difference between 10MB of RAM used for one implementation’s code, and 1GB used for another. SDKs and systems with larger libraries may also have more dependencies that make them harder to package and install.

Unfortunately NIST FRVT does not currently measure algorithm binary size.

The models used by a face recognition algorithm are the parameters learned through the offline statistical learning process employed by face recognition vendors. As every viable face recognition algorithm uses machine learning (often referred to as “AI”), every vendor will have models that need to be loaded into RAM in order to enroll images to templates.

NIST FRVT does measure the model size, though there are a few issues with how this is currently reported (e.g., Rank One’s model size is reported as 0 bytes as opposed to roughly 40MB). Regardless, model size ranges from 50MB to 4GB in the November FRVT Ongoing report, which is a tremendous difference.

The binary size of an SDK becomes particularly important for embedded devices. If an algorithm requires 4GB of RAM, it is generally not possible to use it on low cost embedded devices such as mobile phones, access control devices, doorbells, automobiles, and many other consumer electronic devices that are adding face recognition technology stacks. For many of these applications the face recognition algorithm is but one of several features, and it is often limited to less than 100MB (or even 10MB) of RAM.

Applications

With these four different efficiency metrics established, we will help tie these considerations back to applications.

Forensic Search:

Template size will impact how many images are searchable on a machine with a fixed amount of RAM.
Comparison speed will impact how long it takes to receive search results.
Enrollment speed will impact how long it takes to initially index or later upgrade a database.

Access Control:

Enrollment speed will impact how long it takes a system to process a request.
Enrollment speed will impact how much power is consumed, which is particularly important for battery powered devices.
Binary size will impact how much RAM is required, which is often limited in embedded devices.

Real-time screening:

Enrollment speed will impact how many CPU cores are required to process a camera feed.
Template size will impact how large of a watch-list can be searched.
Comparison speed will impact how long it takes to generate match-alerts.

Identity Deduplication:

Template size will impact how many images are searchable on a machine with a fixed amount of RAM.
Comparison speed will impact how long it takes to receive search results.
Enrollment speed will impact how long it takes to initially index or later upgrade a database.

Summary

The efficiency of a face recognition algorithm can significantly affect the hardware cost of an application, or even whether the application can be built in the first place. There are four efficiency metrics to consider when procuring a face recognition algorithm: enrollment speed, template size, comparison speed, and binary size. And, while there is a wide range of face recognition algorithms with similar accuracy, there can be a massive differences in efficiency across these algorithms.

Before you begin the process of integrating and testing an algorithm, it is critical to understand how these different metrics may impact your application and to filter out any solutions that do not meet your hardware budget or application requirements.

The post Procuring a Face Recognition Algorithm: Efficiency Considerations appeared first on ROC.

Evergreen Licensing

Brendan Klare — Tue, 09 Apr 2019 23:50:32 +0000

Perhaps no technology is improving as rapidly as automated face recognition.

For example, over the last four years Rank One has reduced the False Non-Match Rate of our algorithm by over 50x:Other face recognition vendors are similarly improving their accuracy at a rapid pace. However, despite these relentless improvements, many vendors are also denying their paying customers algorithm updates.

This issue involves the intent of maintenance fees for perpetual licenses. Some vendors contend that maintenance fees do not entitle a partner or customer to algorithm updates. However, the technology is improving too rapidly for this to be a contentious issue.

Vendors who license under perpetual terms need to provide a clear path to continued algorithm improvements. Public safety and national security infrastructure often rely on these same systems, and the tremendous enhancements to face recognition capabilities should be provided to active paying customers.

Unfortunately, it is often government and law enforcement agencies that are the most impacted by these unscrupulous licensing practices. Imagine pouring hundreds or thousands of hours into the procurement of a system and seat license for your agency or organization, finally selecting a vendor, and then six months later learning that the technology you licensed is already obsolete. Further, you are expected to continue paying annual support for your license without receiving access to the algorithm updates.

Evergreen licensing is the solution to this problem.

If you are purchasing a perpetual license, be sure to ask if the vendors supports “evergreen licensing.” The concept itself is quite simple:

The customer purchases a perpetual licenses with a maintenance plan.
While under an active maintenance plan, the customer receives access to algorithm updates.

Provided you have selected a vendor who is continuously improving, you can rest assured that your organization will remain at the cutting edge of facial recognition capabilities without having to regularly conduct new procurements or purchase new licenses.

The post Evergreen Licensing appeared first on ROC.