Reference | ROC https://roc.ai/category/reference/ Rank One develops industry-leading, American-made computer vision solutions that leverage Artifical Intelligence and make the world safer and more convenient. Mon, 08 May 2023 14:15:09 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 https://roc.ai/wp-content/uploads/2024/02/cropped-Group-44-1-32x32.png Reference | ROC https://roc.ai/category/reference/ 32 32 Safer Schools Initiative Pilot Rolls Out in WV https://roc.ai/2023/05/08/safer-schools-initiative-pilot-rolls-out-in-wv/ Mon, 08 May 2023 14:15:09 +0000 https://roc.ai/?p=9741 We are honored to formally announce the launch of our […]

The post Safer Schools Initiative Pilot Rolls Out in WV appeared first on ROC.

]]>
We are honored to formally announce the launch of our Safer Schools Initiative pilot program. This critical initiative will improve safety for students, teachers, and staff – first in West Virginia public schools, then across the country, and around the world.

Our live video analytics platform ROC Watch will be used to identify visitors and help administrators manage building access before guests enter the front office. Through a fast, easy mobile enrollment process, staff and visitors set up a profile within a virtual badgeless visitor management system stored locally with each school.

ROC Watch can then be used to:

  • Approve or deny visitors before they enter the premises
  • Detect incidents or intruders, including long guns 
  • Configure smart alerts for any device
  • Count or locate missing children during an incident

Here at ROC.ai, we’ve been working closely with more than 50 schools in four West Virginia districts on the initial rollout of our live video analytics platform into existing school-based camera systems.

Last month, ROC Watch went live at West Fairmont Middle School and Marion County School District. This month, schools in Taylor, Doddridge, and Putnam counties will all follow in the implementation process, which entails installing software to enhance the capabilities of existing security camera systems.

During this pilot phase, we are working closely with our school-based and community partners in West Virginia to optimize our technology and meet our undeniably high standards of accuracy, speed, and precision. With the successful initial rollout for our Safer Schools Initiative pilot program, we are proud to lead the future of proactive technology solutions to address one of our nation’s most troubling challenges.

“We are thrilled to be working with schools in West Virginia to pilot this important innovation. School safety is a top priority for us. We are dedicated to creating cutting-edge solutions that will help build a safer future for children and educators everywhere.” said ROC CEO Scott Swann.

 

ROC Watch in Action

Detection and Prevention
Local law enforcement and school administrators can collaborate on proactive watchlists of prohibited individuals, like discharged employees or expelled students. ROC Watch can automatically restrict access or send alerts to a designated authority.Crisis Response
ROC Watch can detect a long-gun near the building entrance, automatically restrict access, and alert authorities. It can also alert if an unapproved visitor is attempting to pick up a child from school, so security or staff can intervene.

Post-Crisis Response
During an evacuation, ROC Watch can quickly count children to ensure no one is missing. ROC Watch can identify a missing student’s current location with or without a face in view, and trace an individual’s movement across multiple camera streams. 

Our Safer Schools Initiative aligns with the West Virginia Governor’s statewide WV School Safety Initiative to improve safety and security measures before, during, and after a potential incident.

We are especially grateful for the early support our pilot initiative has received here in West Virginia. Thank you to Marion County School District Superintendent Donna Heston, Ed.D., as Marion County Schools were the first to sign on as early adopters followed by Taylor County (Superintendent Christy Miller), Doddridge County (Superintendent Adam Cheeseman), and Putnam County (Superintendent John Hudson).  ROC also appreciates our US Senators Joe Manchin III, Shelley Moore-Capito, West Virginia Department of Homeland Security, WV State Police, WV Fusion Center, WV Department of Corrections, and the West Virginia State Board of Education for their support, as we gift West Virginia schools with these enhanced security capabilities. 


Safer Schools Initiative Inquiries
For media and partnership inquiries, or general questions about the Safer Schools Initiative, you may also contact our Vice President of Congressional Affairs & Community Outreach Jessica Sell at jessica.sell@roc.ai or 949-874-2347.

 


About Rank One Computing
Founded in 2015 to build faster, more accurate, and reliable computer vision and biometric algorithms, Rank One Computing continuously raises the bar on American-made, ethically-developed technology solutions. We protect millions around the world every day with our industry-leading multimodal software development kit, which powers 3rd-party applications for fraud prevention, commercial security, and criminal investigations, as well as their own growing suite of full-stack video security and live analytics tools. In 2022 Rank One Computing opened its east coast center of excellence in Morgantown, WV.


Reach Out to Learn More

The Safer Schools Initiative is currently accepting applications from West Virginia school districts interested in collaborating on a smarter, safer future for our students and educators. To learn more about the Safer Schools Initiative, apply for the West Virginia pilot, or request additional information, please fill out the following form, visit roc.ai/schools, or email hi@roc.ai.

The post Safer Schools Initiative Pilot Rolls Out in WV appeared first on ROC.

]]>
ROC.ai Achieves HubZone Certification from U.S. SBA https://roc.ai/2023/04/30/roc-ai-achieves-hubzone-certification-from-the-u-s-sba/ Mon, 01 May 2023 05:01:52 +0000 https://roc.ai/?p=9657 Rank One Computing (ROC.ai) is proud to announce we have […]

The post ROC.ai Achieves HubZone Certification from U.S. SBA appeared first on ROC.

]]>
Rank One Computing (ROC.ai) is proud to announce we have secured SBA HUBZone certification. This certification is a testament to our commitment to providing American-made single-source technology solutions to our partners and customers. We’re thrilled to share this milestone with you, our partners and customers.

HUBZone is a small business program run by the Small Business Administration (SBA), helping small businesses located in historically underutilized business zones (HUBZones) gain access to federal procurement opportunities. This SBA HUBZone certification recognizes ROC.ai as a reliable and trustworthy business partner.

In order to qualify, an organization must 

  • Be at least 51% owned by US citizens (ROC.ai is 100% US-owned)
  • Have at least 35% employees living in HUBZones (A threshold we’ve met by hiring local talent)
  • Headquarter their organization in a HUBZone (ROC is proudly headquartered in a historically underutilized district of Denver, Colorado)

With our SBA HubZone certification, we are better positioned to offer our innovative products and services to government agencies and organizations. This means more customers and communities can benefit from our cutting-edge technology solutions, including public schools.

At ROC.ai, we understand how important it is to deliver reliable and accurate solutions to our customers. We control the entire supply chain, from research and development to production and distribution, enabling us to offer high-quality products that meet your needs while ensuring the security and privacy of your data.

Our American-made single-source technology solutions are designed with safety and security in mind. We prioritize ethical practices in developing our algorithms and are proud of our ownership structure that reflects our values.

Reach Out to Learn More

About Rank One Computing

Founded in 2015 to build faster, more accurate, and reliable computer vision and biometric algorithms, Rank One Computing continuously raises the bar on American-made, ethically-developed technology solutions. We protect millions around the world every day with our industry-leading multimodal software development kit, which powers 3rd-party applications for fraud prevention, commercial security, and criminal investigations, as well as their own growing suite of full-stack video security and live analytics tools. In 2022 Rank One Computing opened its east coast center of excellence in Morgantown, WV.

 

The post ROC.ai Achieves HubZone Certification from U.S. SBA appeared first on ROC.

]]>
Looking Back at Boston Marathon Bombing: A Decade of Face Recognition Advancement https://roc.ai/2023/04/17/looking-back-at-boston-marathon-bombing-a-decade-of-face-recognition-advancement/ Mon, 17 Apr 2023 19:23:55 +0000 https://roc.ai/?p=9554 Ten years ago on April 15th, 2013, the city of […]

The post Looking Back at Boston Marathon Bombing: A Decade of Face Recognition Advancement appeared first on ROC.

]]>
Ten years ago on April 15th, 2013, the city of Boston was struck by an unforgettable act of terror. Two bombs exploded near the finish line of the Boston Marathon, killing three people and forever changing the lives of so many more. 

A decade later, Rank One Computing CEO Scott Swann and Co-founder and Chief Scientist Brendan Klare recall their unique perspectives, and reflect on the last ten years of technology and innovation inspired by the tragic incident.

Recalling The Incident

Swann, now an 18-year FBI veteran, supported the FBI Science & Technology Branch within the Director’s Office in Washington DC as the case unfolded. Klare, like many, initially followed the disturbing news from home, but was later recruited by Swann to help identify technology solutions to address significant challenges that emerged from the investigation.

Swann recalls the immediate aftermath of the event, “I was working with FBI executives on the 7th floor. It was like a command center – so chaotic – things were happening really fast,” said Swann, “A video came in with the first big tip of who these guys were. My colleagues loaded it up on an iPad to get it in front of the President within minutes.”

A tragic act of terror had been captured publicly by so many cameras from so many angles, but law enforcement agencies unfortunately lacked automated solutions to help identify and track down the suspects. For the first time at scale, the FBI took the unprecedented steps of opening their tip line for the public to submit photos and videos to aid the investigation.  The response was overwhelming, and the FBI was quickly inundated by terabytes of multimedia from private citizens who wanted to help.

“It was almost too much video data to manage effectively at that time. There were great FBI agents and law enforcement officials working the events and at the end this is what really led to breaking the case,” said Swann. “I think the overload of data just opened up a lot of ‘what if’ questions. We knew we had a gap after this incident that we needed to address.”

Conducting a Major Issue Study

In the wake of the Boston Marathon Bombing, Swann recognized a significant technology limitation and suggested the need for a broad overarching assessment and industry study. So the 18-year FBI veteran led the charge for a Major Issue Study to better understand the video analytic landscape across commercial and government sectors, and to develop a comprehensive roadmap for managing video more effectively in the future.

Familiar with Klare’s graduate research and early work in the lab of Dr. Anil Jain, Swann eagerly recruited him as a consultant on the project. The assessment explored the scope of video processing challenges across government and commercial organizations, ultimately identifying massive cross-industry technology gaps, and recommending a path for prioritization and investment moving forward. 

There were no existing solutions that brought together all the capabilities needed at scale. “Child exploitation, gang affiliation, terrorist networks, criminal activity – as you think about the collection, analysis, and dissemination of such information it was daunting” said Klare.

At the time, face recognition technology wasn’t built for unconstrained environments with unusual angles or lighting, and it certainly wasn’t built to handle terabytes of data in seconds.

“We founded Rank One Computing in 2015 to focus on both accuracy AND efficiency, after experiencing these processing limitations firsthand,” said Klare. A few years later, the tables turned when Klare recruited Swann to join Rank One Computing as CEO. “He was the clear choice. He is an immense professional with incredible focus and vision, who carries weight and drives action,” said Klare.

While Swann moved on from the FBI after completing the study, the FBI’s Multimedia Exploitation Unit and others went on to stand up an impressive set of capabilities well beyond those initial set of recommendations.   

“The post-event analysis of the Boston Marathon Bombing was my last big assignment at the FBI. The contributions Brendan and I made with the video study were effective in getting some attention to a gap in technology capabilities. Working in the FBI Director’s Office provided one of the best vantage points I could ever have in understanding the professionalism of the agents that handle these cases. Equipping these men and women with the best technology possible is a force multiplier in preventing future catastrophes and ensuring a rapid response to serve justice,” said Swann.

Fast Forward to Today

Since 2013, the field of biometric and computer vision technology has made incredible progress. Face recognition technology is now faster, more accurate, and more reliable than ever before, enabling even small local law enforcement agencies to identify potential suspects more efficiently.

ROC has remained positioned at the forefront of this progress, working to develop faster and more accurate algorithms that can be used to enhance safety and security in a variety of critical settings. As an organization, ROC works to apply the important lessons learned from real world scenarios to improve safety across the country and around the world, including in schools.

Currently in West Virginia public schools, ROC’s deployed system can automatically control building access, helping to keep unauthorized individuals out. “There’s nothing more critical than our next generation of leaders,” said Swann. “Face recognition is now the most accurate biometric technology available – overtaking both fingerprint and iris identification in recent years. Our solutions can now process near infinite volumes of data essentially instantly.”

As Klare explains, “We can now organize and track data sets with more than a million unique identities in no time at all, on any hardware, even in unconstrained environments. We can also automatically identify long guns. Our biometrics have been used by law enforcement agencies across the country to catch violent criminals.”

ROC SDK, a multimodal software development kit, powers the fastest, most accurate, scalable solutions for not only identifying criminal suspects using face recognition, but also tracking the movement of suspects through a crowd (“clustering”), and securing spaces with automated visitor identification and watchlist enforcement. 

And ROC Watch, a live video analytics platform, channels the power of ROC’s industry-leading algorithms into a convenient, intuitive SAAS solution, unlocking innovative safety tools for more community and commercial organizations.

But while we have come a long way in the past ten years, there is still much work to be done. As we reflect on the progress we have made, we must also acknowledge the challenges that lie ahead. Issues of privacy, bias, and misuse of technology continue to be a concern, and we share a responsibility with our customers to ensure that our solutions are developed and deployed in a responsible and ethical manner.

Released in 2021, ROC’s Code of Ethics was the first of its kind in the biometric and computer vision industries. They require customers to abide by the code as part of their licensing agreement.

“There’s a lot we couldn’t do back then that we can proudly do today, but there are also many things still not possible today when it comes to computer vision and biometric technologies. We have to find the balance between pushing against limitations and recognizing boundaries,” said Klare. “As we look back on the role today’s technology could have played in the aftermath of the Boston Marathon bombing, we are reminded of the incredible potential it holds to make our world a safer and more secure place.”

The post Looking Back at Boston Marathon Bombing: A Decade of Face Recognition Advancement appeared first on ROC.

]]>
The Pros and Cons of Face Recognition https://roc.ai/2023/03/29/the-pros-and-cons-of-face-recognition/ Wed, 29 Mar 2023 20:02:02 +0000 https://roc.ai/?p=9193 The post The Pros and Cons of Face Recognition appeared first on ROC.

]]>
The capabilities of face recognition algorithms have skyrocketed the last five years. With substantial gains comes substantial responsibility – and so now we must reexamine the role face recognition can play in the future of biometrics and identity technology, and closely consider potential limits that these technologies may never overcome.

Unlike fingerprint or iris recognition, facial appearances are public. In turn, facial images have become widely available – trillions of them scattered across the Internet, plus the endless hours of faces embedded in video streaming services. A nearly endless source of facial imagery exists in every corner of the digital world. Even relative to other non-biometric computer vision or machine learning classification tasks, the availability of facial appearance data vastly exceeds data available to train other classification algorithms. In many ways, due to the overwhelming legacy of widely distributed facial imagery, it’s likely that facial recognition will continue down this path to become the most accurate machine learning technology in the world.

In this article, we’ll explore classic considerations for facial biometric traits and discuss how they align with modern advances and use cases. Through this examination, we have identified the following “pros” and “cons” of face recognition technology:

 

Pros.

Cons.

Highly Accurate Challenges with Familial Similarity
Convenient Challenges with Cosmetic Modification
Default Method for Humans Risks of Spoofing

Accuracy

Though historically face recognition has been the least accurate of the “Big Three” biometrics (facial appearance, fingerprint ridge patterns, and iris texture), in recent years face recognition technology capabilities have surged to become the world’s most accurate biometric technology.

 

 

 

Comparing Biometric Accuracy (Face, Fingerprint, Iris)

Comparing biometric accuracy

Sources:
Face: NIST FRVT Ongoing, Oct 2022 – Visa Dataset
Iris: NIST IREX IX report, April 2018 – Table D1, Single Eye – NOTE: the current NIST IREX Ongoing does not include 1:1 measurements
Fingerprint: NIST PFT, Oct 2022 – MINEX III Dataset, Single Finger

 

While these error rates are not a perfect apples-to-apples comparison, error rates produced by face recognition algorithms are significantly lower than those measured by any single finger or iris sample. 

Fingerprint has long since been considered the gold-standard for biometric accuracy (outside of invasive methods like DNA), so many are often surprised to hear that face recognition error rates are often substantially lower.

A closer look at the last five years of accuracy testing reveals an exponential decrease in error rates.

Evolution of ROC.ai Face Recognition Accuracy, 2017 – 2023

Evolution of ROC.ai face recognition accuracy

Over the past 5 years, when operating at a False Match Rate of 1 in 1 Million, the False

Non-Match Rate (FNMR) error has decreased by over 50x

Source: https://pages.NIST.gov/frvt/reportcards/11/rankone_014.html

How did facial recognition become so accurate? 

 

Public biometric

The exponential accuracy progression of face recognition technology is largely due to the following two factors: 

  • Facial appearance has been the primary biometric trait throughout civilized human existence; and
  • Facial appearance is not private.

In terms of the privacy of facial imagery, historically speaking, our facial appearance is the single least private piece of information about ourselves. When meeting a stranger, people may exhibit reluctance to provide simple information such as their name. Yet, they will provide their facial appearance immediately to countless strangers in day-to-day interactions. When in a public setting, the right to privacy of facial appearance simply does not and cannot reasonably exist. In fact, in some of the most liberal countries on the planet (e.g,. France, Denmark) it is illegal for a person to conceal their face in public. 

A tangible example of our lack of facial privacy in public settings can be seen when attending professional and/or televised sporting events. People pay hefty sums of money to attend games, knowing that their facial appearance may be broadcast on TV to audiences of millions.

 

 

Facial Privacy in Public Settings Examples

sleeping man example

Fan Example 1 (Source)

surprised man example

Fan Example 2 (Source)

In Fan Example 1, a fan was sleeping at a nationally televised baseball game. TV broadcasters noticed the fan sleeping and aired it nationally while discussing him at length. The fan attempted to sue the broadcaster (ESPN), but ultimately couldn’t find valid legal grounds and his suit was dismissed. 

In Fan Example 2, a fan was stunned after his football team lost a game in the final seconds. The fan was briefly aired on TV, but it was enough for Internet memesters to turn the image into a meme that went viral. The fan never consented to his image being widely distributed across the Internet, but at the same time he had no recourse to prevent this distribution. 

In both cases, the fans never consented to be broadcast on national television, or to further be spread online. Nor does any other fan when shown in the background of such televised events. Regardless, the lack of inherent privacy to one’s facial appearance means there is no recourse to prevent sharing facial appearance – aside from not appearing in public. 

Though precedent does not protect privacy of facial appearance, privacy concerns still exist regarding facial appearance and facial recognition. In fact, emerging technologies like highly-accurate automated facial recognition can create significant privacy issues due to the highly public nature of facial appearance. 

 

Deep Learning and Convolutional Neural Networks 

In the last decade, a technological revolution rapidly advanced the world of computer vision and machine learning. Inspired by the human visual processing system, we call this technology, deep learning via convolutional neural networks. This technique applies highly-tuned kernel convolutions against matrices of image pixels to yield powerful feature representations. 

The number of parameters in these models usually falls in the order of hundreds of thousands to hundreds of millions. To learn the optimal model parameters requires yet another order of magnitude more imagery than the number of parameters, as dictated by the “rule of 10” phenomenon in machine learning.

When provided with sufficient data, knowledge about the classification problem domain, machine learning methods, and GPU/supercomputing hardware, algorithm models can be trained that deliver truly stunning accuracy. In some cases, such as face recognition, these models can significantly surpass human performance.

 

 

NIST Face Recognition Algorithm Matching Accuracy

NIST face recognition algorithm matching accuracy
The plot above is from the NIST FRVT vendor scorecard for ROC.ai. It shows the matching accuracy of the face recognition algorithm on the same pairs of images that expert facial examiners were given three months to study and determine whether or not they were a match. Expert examiners achieved roughly 95% accuracy on the task, while several automated algorithms can now achieve perfect 100% accuracy in a matter of seconds. Shown above, the automated algorithm achieves perfect accuracy, and a significant separation in facial similarity score between the genuine comparisons of the same persons (the first 12 plotted values with very high facial similarity scores) and the impostor comparisons of different persons (the next 8 plotted values with low facial similarity scores). 

Face recognition has benefited tremendously from the combination of expansive data and incredibly powerful deep learning toolkits, and we expect error rates should continue to decline for years to come. 

However, as discussed previously, face recognition will also struggle to cross some hard boundaries. Eventually the “Moore’s law”-like effect achieved by face recognition algorithms these past 5 years may taper out. 

 

Convenience 

In addition to accuracy, another chief benefit of face recognition technology is convenience. 

Using a face recognition system typically requires little effort. When paired with continuous authentication and real-time screening, they are often completely frictionless. When used as a method of access control, face recognition often requires less user effort and cooperation than fingerprint or iris recognition. 

Due to such convenience, face recognition is the only primary biometric trait that can be used successfully in a fully unconstrained manner. Indeed, accuracies on highly unconstrained benchmarks, like IARPA Janus, have gone from extremely low a decade ago, to now approaching the accuracy of other biometric traits operating in fully cooperative settings in a relatively short period of time. 

Examples of Unconstrained Face Images in the IARPA Janus Dataset

Examples of Unconstrained Face Images
Image source: B. Klare, B. Klein, E. Taborsky, A. Blanton, J. Cheney, K. Allen, P. Grother, A. Mah, and A.K. Jain,  “Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
Convenience and ubiquity do come with a cost. While facial appearance is overwhelmingly public information, faces can also be linked to more sensitive private information. This consideration becomes especially important as face recognition accuracy and speed continue to advance.

 

Default Method for Humans

The most common method humans use to identify another person in day-to-day life is face recognition. Human face perception is such an important task – we have a large, dedicated region of the brain called the fusiform face area whose sole function is human face identification. 

It is natural for automated systems to prioritize compatibility with our manual, legacy methods. For this reason – inherent human understanding – face recognition is often preferable to other biometric traits.

Perhaps there is no better example of the human dependence of facial recognition than the Thatcher effect:

Example of the Thatcher effect

Example of the Thatcher effect 1
Example of the Thatcher effect 2
The two images above are the same, with the exception that the first image is flipped upside down. We can hardly perceive the manipulations in the inverted image due to the fact that inverted faces are not processed by the fusiform face area of the brain. The second image is immediately recognized as being altered and not an accurate face image.

One of the biggest reasons for society to continue investing in the use of properly developed automated face recognition technology is our inherent reliance on the facial biometric in our day-to-day lives. 

 

Limits of Face Recognition

While face recognition technology continues to deliver unprecedented convenience and accuracy, we must also address some fundamental limitations and challenges. 

Identical Twins

One of the more fundamental limits on face recognition is the challenge of identical twins and to a lesser extent familial relations. Though only accounting for roughly 0.3% of the population, identical twins in particular create a substantial challenge for automated face recognition algorithms. The reason is fairly obvious and illustrated by the following examples of pairs of identical twins:

two children example
Source: https://flic.kr/p/2naHVPb
two astronauts example
Over time the challenge of identical twins eases slightly due to differences in environmental factors. But this does not fully remove the challenge.  

There are a few algorithmic approaches that can be applied for identical twins. One option could be a focus on Level III facial features like freckles and moles, which are indeed unique between identical twins. 

A more realistic approach for operating facial recognition technology in the presence of identical twins could entail administrative declaration of twin status. Knowing this information in advance allows for improved differentiation between twins while treating the rest of the population normally.

To a lesser extent, familial relationships also produce higher degrees of facial similarity, due to shared genetics. Even the “gold standard” of identification – DNA – faces similar limitations with twins and familial relationships. 

While facial appearance and DNA are driven by genetics, fingerprint friction ridge patterns and iris texture are not. Instead, they are formed environmentally (e.g., fingerprints are formed in the womb). As a result, fingerprint and iris algorithms do not share the same limitations with twins and relatives. 

 

Facial Spoofing

One of the biggest strengths for face recognition technology is the amount of publicly available data, which enables development of highly accurate algorithms – but this strength is a double-edged sword. 

The abundance of facial imagery on LinkedIn, Facebook, company websites, yearbook photos, and a wide range of other sources, creates substantial risk for identity fraud. Acquiring copies of face photos is exceptionally easy.

As we learned from Beyonce’s 2016 Super Bowl appearance, it’s essentially impossible to remove a photo from the Internet. Because facial appearance is our single most public piece of information, it means that there will be no solution in the form of removing facial images from everywhere they exist. 

Instead, to protect individual identities, significant effort must be invested in the development of spoof detection algorithms (also referred to as “liveness” checks or “presentation attack detection” algorithms).

Liveness checks are critical when face recognition is used for ID-verification through a mobile app or unattended customer service kiosks. As the accuracy of face recognition algorithms continues to increase at exponential rates, these unattended access control applications grow increasingly common. For other use cases, such as forensic identification or identity validation when a human operator is present (e.g., at a border crossing), liveness checks lack relevance. 

While fingerprint and iris biometric modalities can also be spoofed, acquiring samples to use in an attack is far more difficult, due to their private nature.

 

Decorative Cosmetics and Cosmetic Surgery

While facial appearance is largely determined by genetics, facial appearance can also transform through the use of makeup, hormones, and cosmetic surgeries.

Instead of cosmetics, early facial recognition research often focused on overt traits like facial hair. Over time, however, it has become increasingly clear that while factors like facial hair present a fairly insignificant challenge to facial recognition accuracy, cosmetics create more difficulty than originally understood. 

The fact that cultural use of cosmetics is predominantly by female populations has in turn resulted in minor, but statistically significant, differences in facial recognition accuracy between males and females. While some may attribute the difference in accuracy between genders to some inherent bias in facial recognition algorithms, it is becoming increasingly clear that the decrease in accuracy is instead due to the latent factor of cosmetics use by females.

 

Single Trait

While we have 10 fingers and two eyes, we only have one face. Thus, while it is more challenging to collect and manage samples from multiple fingers or irises, we can improve accuracy using this technique. With face recognition, this opportunity does not exist. 

 

Ground Truth Issues

Another challenge with face recognition technology is that authoritative databases often lack pristine identity labels. Whether due to fraud, human operator error, or other factors, prominent government databases all seem to have identity labels errors. These ground truth errors are not inherent for facial biometrics algorithms themselves, but instead stem from the long-term legacy use of facial imagery in databases prior to automated face recognition. 

Ground truth labeling errors can typically only cause an algorithm to measure higher error rates than actually exist (i.e., perform worse), so the error rates we capture in quantitative evaluations generally represent the upper bound for actual error rates on our algorithms. A portion of what’s measured as algorithm errors actually reflects data labeling errors. In other words, face recognition algorithms are more accurate than most benchmarks indicate.

Historically difficult to identify, these legacy errors in identity databases can reduce effectiveness of facial recognition algorithms. While ground truth errors are not easy to discover, automated face recognition algorithms have become astonishingly good at flagging potential errors for human review. However, this approach requires dedicated de-duplication processes.

Identity labels errors present significant challenges across all biometric traits, though fingerprint and iris present more unique challenges than compared to faces. While it may be easy to find ground truth errors in face recognition databases, finding them in fingerprint or iris databases is much more difficult. For potential facial identity error, the human brain is highly adept at comparing facial appearance making it easy to flag such inconsistencies when encountered. For fingerprint recognition and iris recognition, identifying errors is far more difficult, time-consuming, and expensive, often requiring expert analysis. 

Perhaps no method can reduce the incidence of ground truth errors more than the incorporation of multi-biometric modalities. Though many face systems support use cases where fingerprints are available, the use of both face and fingerprint biometrics significantly reduces the chances for fraud to exist in identity databases.

 

Summary

In the last two decades, the capabilities of automated face recognition technologies have undergone a stunning transformation. Once considered too inaccurate for use as a primary biometric trait, facial recognition is now the single most accurate biometric technology in the world. These exponential improvements will not likely slow anytime soon.

At the same time, every new technology comes with limitations. For facial recognition, that primarily includes identical twins and identity spoofing. 

The other two primary biometric traits –fingerprint and iris– also have advantages and limitations.

All factors considered, given the extreme ubiquity of facial appearance in our daily lives, and the astonishing accuracy of modern face recognition algorithms, minimal drawbacks exist for the technology. 

Most importantly, fusing face recognition with another mature biometric technology like fingerprint recognition can create a multi-modal biometric solution for nearly impervious unattended biometric identification.

Multi-modal biometrics are indeed the holy grail of identification. All biometric traits come with tradeoffs and weaknesses, but in scenarios where two or more disjoint traits can be measured together, the incident of failure or fraudulent access becomes exceptionally low. 

This article is a summary of a presentation provided at the National Institution of Standards (NIST) International Face Performance Conference (IFPC) on November 16th, 2022.

Put ROC multimodal biometrics to work for your business

Discover how our combination of face and fingerprint recognition can provide unparalleled accuracy for your identity verification and access control needs. Reach out now to upgrade your systems and safeguard your communities with only the best in multimodal biometric tech.

The post The Pros and Cons of Face Recognition appeared first on ROC.

]]>
ROC AI’s Fingerprint Algorithms Achieve Best-in-Class Accuracy https://roc.ai/2023/02/15/roc-ais-fingerprint-algorithms-achieve-best-in-class-accuracy/ Wed, 15 Feb 2023 19:33:29 +0000 https://roc.ai/?p=8754 Fingerprint capabilities delivered in ROC SDK v2.4 lead world in accuracy and efficiency, as demonstrated by the statistical results presented in this article.

The post ROC AI’s Fingerprint Algorithms Achieve Best-in-Class Accuracy appeared first on ROC.

]]>
ROC SDK v2.4 Fingerprint Analysis

The following report documents ROC AI’s performance in the National Institute of Standards and Technology (NIST) Proprietary Fingerprint Template (PFT) III benchmark. Competitor plots were generated on 6 February 2023.

With the exponential explosion in the capabilities of face recognition algorithms, focus has in many ways shifted from what has been the most authoritative biometric trait the last several decades: fingerprints. 

While facial appearance has been the default biometric trait throughout all of human existence, this role has always been through innate, subconscious cognitive activity. When instead examining the systematic use of biometric traits for identification purposes, the history of fingerprint recognition is substantially larger than any other trait. And, from an automated biometric perspective, fingerprint recognition has long since held the distinction as the most trusted biometric.

Fingerprint recognition systems are used by nearly every country in the world for a range of critical identification infrastructure tasks. And while they provide extreme trust, they also contend with significant computational efficiency bottlenecks. Specifically, the comparison speed for fingerprint algorithms has historically been extremely slow. This is due to fingerprint recognition being treated as a point-set matching problem with minutiae location, orientation, and type being used as the point sets. 

As the pattern recognition technology fully progresses in the modern era of deep learning, should fingerprint recognition systems still be bottlenecked by legacy constraints? 

The answer is no, as fully demonstrated with the release of ROC AI’s new fingerprint recognition capabilities delivered in the ROC SDK v2.4.   

ROC AI’s fingerprint solutions leverage the same trade secrets that have enabled ROC SDK face recognition capabilities to lead the industry in combined accuracy and efficiency for the last five years. The end result is a fingerprint algorithm that:

  • Delivers best-in-class accuracy; and
  • Operates at an efficiency range that resets expectations on the scalability of fingerprint systems.

The remainder of this article provides dense statistics from the National Institute of Standards and Technology (NIST) Proprietary Fingerprint Template (PFT) III benchmark

Performance metrics for the ROC SDK v2.4 fingerprint algorithm, per NIST’s scorecard, are as follows: 

In terms of performance relative to all other vendors who have submitted to the NIST PFT benchmark, ROC AI is the number one performer on the Nail-to-Nail benchmarks, ranking #1 in 8 of the 12 sensors, #2 in the other four sensors, and #1 in mean error rate across all sensors:

For the remaining PFT test sets, ROC AI was #2 in three of the four sets and top three in all sets:

In terms of mean error rate across all PFT III test sets (all Nail-to-Nail sensors, AZ, LA, Port of Entry, and US VISIT), ROC AI has the lowest mean error rate of all vendors:

Of course, similar to all ROC AI algorithms, it is not just accuracy but also efficiency that sets ROC AI apart. In fingerprint this is also the case:

The ROC SDK v2.4 fingerprint algorithm uses a substantially smaller template than any other vendor in NIST PFT, and has the fastest comparison comparison speeds (more than 1000x faster than many key competitors). 

The combination of lowest error rates and best computational efficiency truly puts ROC AI in a class of it’s own in the fingerprint industry: 

Indeed, no other vendor can match this combination. In addition to accuracy and efficiency distinctions, the ROC AI fingerprint algorithms, like all of our algorithms, are developed entirely “in-house”, by ROC AI employees, in the United States of America. 

While the current ROC SDK v2.4 fingerprint algorithm catapults ROC AI to the top of the fingerprint capabilities in the world, it is important to remember that this is in fact the first fingerprint algorithm released by ROC AI. Similar to the pace of improvements delivered in face recognition, multiple new releases for fingerprint recognition will be delivered by ROC AI in 2023 and beyond. And, ROC AI will continue to provide our customer friendly “evergreen licensing” terms to our partners and customers. 

Reach out now to start your journey toward integrating the best-in-class ROC AI fingerprint capabilities!

 

The post ROC AI’s Fingerprint Algorithms Achieve Best-in-Class Accuracy appeared first on ROC.

]]>
Hardware Considerations when Architecting a Face Recognition System https://roc.ai/2021/08/11/hardware-considerations-when-architecting-a-face-recognition-system/ Wed, 11 Aug 2021 14:00:50 +0000 https://roc.ai/?p=5504 The post Hardware Considerations when Architecting a Face Recognition System appeared first on ROC.

]]>
As the capabilities of automated face recognition algorithms continue to skyrocket, so does the number of face recognition (FR) applications being deployed. Whether it is using FR to unlock a phone, create an investigative lead to help identify a violent criminal, enable low-income persons to open a bank account online, or perform visitor management at a courthouse, more and more face recognition applications continue to be developed by dozens of different system integrators. However, depending on the application, different system architectures and software requirements will be needed. And, depending on the architecture, different algorithm requirements will emerge. This article will discuss these requirements across different face recognition applications so that readers can select the proper FR algorithm when building FR systems.
Face Recognition Applications
Identity Verification 1:1
Identity verification (1:1) is the process of validating a person against a claimed identity. For example, the person will claim to the system they are “John Doe”. The system would take a photo of the person (the facial “presentation”), generate an FR template from the photo, and compare it against the template on file for “John Doe”. If the presented identity matches the reference identity, then access is granted. This could mean a door opens, a bank account is accessed, or a phone is unlocked. It is important to note that face can be one of multiple authentication factors used for identity verification (e.g., passwords, or tokens).
USE CASES
Bank Account Access
Secure Facility Access
Phone Unlock
Tax Return Filing
Analyst Driven Search1:N
Analyst driven search (1:N) is the process of manually searching a face image (a “probe”) against a database of pre-processed FR templates (a “gallery”). For example, in a criminal investigation, an image of a suspect may be obtained from a variety of sources, such as a still frame from a security camera, an online photo, or picture captured by a witness. This probe photo of the criminal suspect would then be manually uploaded for search. In turn, a template would be created from the probe image, and it would then be compared against all the templates in the gallery database. After comparing the probe to the gallery, the most similar matching images in the gallery would be presented to the analyst for manual adjudication. This process is labor intensive in that the FR system is merely a filtering tool that will reduce the size of the database. A significant amount of time and effort is needed for the manual adjudication process.
USE CASES
Identification of a Bank Robber
from a surveillance video frame.
Identification of an Assaulter
from their online dating profile.
Identification of a Hit & Run Suspect
from a bystander’s cell phone camera.
Automated search (1:N+1) is typically performed in high-throughput applications such as traveler screening or video analytics. For example, a human trafficking investigation may require anlayzing terabytes of images and video to identity the different persons present (both victims and perpetrators). This step involves generating templates for every input image processed. Or, in the case of live streaming video, templates are generated at a rate of roughly five (5) video frames per second (FPS). For video, after the templates have been generated they are often clustered into the different identities present. This clustering and tracking step involves cross-comparing all the templates. Without an efficient template comparison speed and clustering algorithm, this process can be very time consuming and generally grows exponentially in time as a function of the number of templates being clustered. Each clustered identity, or each individual template if no clustering is performed, is then searched against available watch-list galleries. Any probe template that matches a gallery template beyond a predetermined similarity threshold will trigger an identity match alert. Or, in the case of a passenger screening for an automated search application, either a single image is manually captured by an operator, or a live video stream of a passenger is captured and automatically distilled down to a single representative photograph. In the case of the single image, the face image is captured, analyzed for quality conformance (e.g., using an automated quality metric and / or validation of ICAO compliance), and templatized. In the case of live video, five (5) to ten (10) FPS needs to be captured and templatized, followed by identity tracking and grouping, and finally cross-comparing templates from the recent collection sequence and possibly applying spatio-temporal constraints. The template for each passenger being screened can then be compared against multiple galleries, such as a passenger manifest or No Fly List. Any probe template that matches a gallery template beyond a predetermined similarity threshold will trigger an identity match alert. Or, in the case of the passenger manifest, if the presented passenger identity does not match any person in the manifest, a match alert would occur.
USE CASES
Bank Account Access
Secure Facility Access
Phone Unlock
Tax Return Filing
Hardware Considerations
Algorithm Efficiency
Previous Rank One articles have provided significant insights into the various efficiency metrics that influence an FR algorithm’s deployability. For new readers we highly recommend reading those articles, particularly our initial article on the topic. To summarize these metrics:

  • Template generation speed is the time needed to initially process a face image or video frame.
  • Template size is the memory required to represent facial features of a processed face image.
  • Comparison speed is the time needed to measure the similarity between two facial templates.
  • Binary size is the amount of memory needed to load an algorithm’s model files and software libraries.

The performance of an FR algorithm across these metrics will dictate whether or not they can run on a given hardware system. And, across the FR industry there is a tremendous amount of variation in efficiency metrics across different vendors. The following graphic demonstrates how different metrics can influence the amount of CPU throughput or memory needed for a hardware system:

Hardware Components
Different hardware and network resources may be available or desired for a given application. The common architectural components are:

  • Persistent server / desktop – low quantity, high cost, high processing power and memory. These systems will typically host FR libraries and/or system software. These systems will typically have server grade x64 processors and potentially GPU processors.
  • Embedded device – low-cost, high quantity devices with limited processing power and memory that can either host FR libraries on-edge or operate as a “thin-client” that passes imagery to a server or cloud system for processing. These systems typically have mobile grade ARM processors and potentially Neural Processing Units (NPU’s).
  • Scalable cloud – arrays of server resources abstracted through a cloud resource management system.
  • Network – communication channels between devices. Networks will have varying amounts of bandwidth depending on their properties.

Depending on the application and available hardware resources, different FR system architectures need to be deployed. And, depending on the architecture used, different FR algorithm efficiency requirements will emerge. This is because of the differences in processing and memory resources across these different hardware systems:

Note that this article does not specifically cover GPU acceleration, whether through a traditional NVIDIA CUDA-enabled GPU or an embedded Neural Processing Unit (NPU), but readers can assign such hardware components to the “processor” category. The main distinction is that GPU acceleration generally decreases the throughput-cost for CPU dependent applications.
Architecture Options
In this remainder of this article we will walk through the various architectures that are encountered when developing a face recognition system and the algorithm efficiency considerations for each architecture will then be discussed.
Persistent server and Desktop systems
Server and desktop systems are typically used in analyst driven applications, such as forensic analysis of digital media evidence, systems with predictable workloads such as an identity document agency (e.g., a DMV), or high-value systems with infrequent use (e.g., a law enforcement search system). These systems will typically stay installed on the same computer for several years at a time.

Advantages

  • Hardware flexibility
  • Predictable cost
  • Predictable throughput
  • High throughput

Disadvantages

  • Hardware cost
  • Lack of redundancy
  • Lack of scalability
  • Lack of portability

Algorithm limitations when using a persistent server:

Identity Verification – 1:1

  • Slow template generation speed will reduce throughput/system response time
  • Large binary size will impact system restart speed
  • High hardware cost
  • Powerful network needed for decentralized sensors

Manual Identification – 1:N

  • Large template size will require significant memory resources
  • High template generation speed will delay search results
  • High comparison speed will delay search results

Manual Identification – 1:N+1

  • High template generation speed will reduce throughput (e.g., video processing)
  • Large template and binary sizes will require significant memory resources
Embedded Devices
Embedded devices such as a phone or consumer electronic device are low cost and highly capable when running properly designed software. There are fundamental limits on what can be achieved by an embedded processor (e.g., ARM) and thus template generation speed and template size can play a major role in FR system requirements.

Advantages

  • Low hardware cost
  • Portability

Disadvantages

  • Limited hardware capacity
  • Limited power resources
  • Requires highly efficient algorithms

Algorithm limitations per FR application when using embedded devices

Identity Verification – 1:1

  • Slow template generation speed will cause major latency (> 3 seconds)
  • Large binary size will occupy a high percentage of available memory

Manual Identification – 1:N

  • Template size must be very small due to memory limits
  • High template generation speed will significantly delay search results
  • High comparison speed will significantly delay search results
  • Large binary size will occupy a high percentage of available memory

Manual Identification – 1:N+1

  • Slow template generation speed will render video processing impossible
  • Template size must be very small due to memory limits
  • Large binary sizes will exasperate memory resources
Scalable Cloud
A scalable cloud architecture, such as Kubernetes, running on a scalable cloud hardware provider, can be highly valuable for application workflows that have varied and unpredictable throughputs.

Advantages

  • Highly scalable
  • Pay per usage
  • Redundancy
  • Fault tolerance

Disadvantages

  • Latency to instantiate new nodes
  • Memory limitations
  • Higher cost to initially implement

Algorithm limitations per FR application when using the cloud

Identity Verification – 1:1

  • Large binary size will slow container instantiation time
  • Poor network bandwidth will delay image transmission
  • Slow template generation speed will reduce throughput / system response time

Manual Identification – 1:N

  • NOT ADVISED TYPICALLY
  • Large template size or large number of templates will make container instantiation very slow; as such, not typically advised
  • Gallery size is typically too large to instantiate containers in less than 30 seconds

Manual Identification – 1:N+1

  • Slow template generation speed makes video processing expensive
  • Large template size, large number of templates, and/or large binary size will make container instantiation very slow
  • Poor network bandwidth will prevent video transmission to the cloud
Summary
There are a wide range of considerations when building and deploying a face recognition system. This article walked through such considerations related to what hardware is being used to deploy such a system, and the various algorithm properties that are needed to run effectively on such hardware. Such an understanding is critical because while the majority of marketing focus on face recognition algorithms is on accuracy, the top 100 performers in NIST FRVT are often separated by less than a 1% in accuracy. By contrast, the efficiency of an algorithm can vary by 5x to 10x and can be make-or-break when it comes to the successful deployment of a face recognition system. The Rank One algorithm is the only Western friendly vendor to consistently achieve top performance marks in both FR algorithm accuracy and efficiency. As such, regardless of the FR application or the available hardware resources, the ROC SDK is an ideal backbone for any FR system configuration. Contact our team today to begin your trial of our industry leading face recognition algorithms and software libraries!

The post Hardware Considerations when Architecting a Face Recognition System appeared first on ROC.

]]>
Introducing the ROC Web API https://roc.ai/2020/06/12/introducing-the-roc-web-api/ Fri, 12 Jun 2020 18:41:14 +0000 https://roc.ai/?p=582 The ROC Web API enables developers to host an API server on their own hardware, and allows numerous client devices to easily connect and send facial images for processing server-side from an array of languages (C++, C#, Java, JavaScript, Objective-C, PHP, Python, and Ruby).

The post Introducing the ROC Web API appeared first on ROC.

]]>
Rank One Computing is excited to announce the availability of a new “ROC Web API” offering!

The ROC Web API differs from solutions by Microsoft or Amazon, which require users to upload face images to their servers with little-to-no transparency as to how those images are processed. Instead, the ROC Web API enables integrators to host their own Web service on hardware that they control. Thus, Rank One is able to provide this Web API offering without ever receiving access to the face imagery processed by the system. ROC integrators can configure a wide range of system workflows while ensuring that their users can fully explain where their images are (and are not) being transmitted and stored.

Technical Details

The ROC Web API interface facilitates the development of applications where computation happens remotely (server-side). Client-side applications need only integrate a single source file that defines the communication protocol, and do not need a license to the ROC SDK.

Currently in “beta”, the Web API consists of a provided server-side application for translating HTTP requests into ROC SDK function calls, and client-side single-source-file definitions of the communication protocol for C++, C#, Java, JavaScript, Objective-C, PHP, Python, and Ruby. The primary use cases supported by the Web API are template generation, 1:1 comparison, gallery enrollment, and 1:N search.

Developers familiar with our long-standing “remote galleries” feature will find that the Web API utilizes the same underlying software stack, but with the communication protocol exposed via ProtoBuf and messages wrapped in HTTP. The current implementation adheres to the following principle: one server process equals one gallery file equals one network port. Thus multiple gallery files are cleanly enabled by hosting separate server processes on different network ports.

Rank One developed proof-of-concept Web APIs using JSON RPC 2.0, FlatBuffers and Protocol Buffers, before ultimately settling on ProtoBuf due to the following considerations. While supported everywhere, JSON objects are inefficient for transmitting images, and their lack of a well-defined schema makes it difficult to document an API. On the opposite end of the spectrum, FlatBuffers are extremely efficient, but the asymmetry between how objects are read versus written makes them cumbersome for client-side applications using the output of one API call as the input to another (as one often wants to do). Protocol Buffers were “just right”, offering a well-defined schema that is both clean and efficient.

The Protobuf protocol implementation consists of two root message types: request and response. The root message types have subtypes that mirror each ROC SDK function call. For example, roc_at() decomposes into Request.At which expects a template index, and Response.At which contains a template.

Code Examples

We will now provide several examples illustrating how the client communicates with the server.

First, the server is initialized as follows:

$ ./roc-represent -k 5 --thumbnail ../data/roc.jpg roc.t
$ ./roc-serve 8080 --gallery roc.t --http --log-stdout

Server-side, the above commands construct a gallery of faces from a single image and then start the Web API on port 8080. Client devices can in turn communicate with the web server in C++, C#, Java, JavaScript, Objective-C, PHP, Python, or Ruby.

The following code snippets for Python, Java, and JavaScript demonstrate how to establish a connection with the Web API server, generate templates from face images (or video frames), and search those templates against a gallery.

Connecting with the Server

Python

def rpc(request):
    server = httplib.HTTPConnection(serverURL)
    server.request("POST", "/", request.SerializeToString(), { "Content-Type" : "application/octet-stream" })
    response = roc.Response()
    response.ParseFromString(server.getresponse().read())
    return response

Java

public static Roc.Response rpc(Roc.Request request) throws IOException {
    final URL url = new URL(serverUrl);
    HttpURLConnection con = (HttpURLConnection) url.openConnection();
    con.setRequestMethod("POST");
    con.setRequestProperty("Content-Type", "application/octet-stream");
    con.setDoOutput(true);
    con.getOutputStream().write(request.toByteArray());
   if (con.getResponseCode() ==  HttpURLConnection.HTTP_OK) {
       Roc.Response response = Roc.Response.parseFrom(con.getInputStream());
       if (response.hasError())
           throw new IOException(response.getError().getError());
       return response;
   } else {
       throw new IOException(String.valueOf(con.getResponseCode()));
   }
}

Template Generation

Python

def represent(imageFile):
    request = roc.Request()
    with open(imageFile, "r") as f:
        request.represent.image = f.read()
    request.represent.algorithm_id = roc.AlgorithmOptions.ROC_FRONTAL | roc.AlgorithmOptions.ROC_FR | roc.AlgorithmOptions.ROC_THUMBNAIL
    request.represent.min_size = 20
    request.represent.k = -1
    request.represent.false_detection_rate = 0.02
    request.represent.min_quality = -4
    response = rpc(request)
    return response.represent.templates

Java

public static java.util.List represent(String imageFile) throws IOException {
    final ByteString image = ByteString.readFrom(new FileInputStream(imageFile));
    final Roc.Request request = Roc.Request.newBuilder().setRepresent(Roc.Request.Represent.newBuilder()
                                    .setImage(image)
                                    .setAlgorithmIdValue(Roc.AlgorithmOptions.ROC_FRONTAL_VALUE | Roc.AlgorithmOptions.ROC_FR_VALUE | Roc.AlgorithmOptions.ROC_THUMBNAIL_VALUE)
                                    .setMinSize(20)
                                    .setK(-1)
                                    .setFalseDetectionRate(0.02f)
                                    .setMinQuality(-4.f)
                                    .build()
                                ).build();
    final Roc.Response response = rpc(request);
    return response.getRepresent().getTemplatesList();
}

Template Search

Python

def search(probeTemplate):
    request = roc.Request()
    request.search.probe.CopyFrom(probeTemplate)
    request.search.k = 3
    request.search.min_similarity = 0
    response = rpc(request)
    return response.search.candidates

Java

public static java.util.List search(Roc.Template probeTemplate) throws IOException {
    final Roc.Request request = Roc.Request.newBuilder().setSearch(Roc.Request.Search.newBuilder()
                                    .setProbe(probeTemplate)
                                    .setK(3)
                                    .setMinSimilarity(0.f)
                                    .build()
                                ).build();
    final Roc.Response response = rpc(request);
    return response.getSearch().getCandidatesList();
}

Complete HTML+JavaScript Example

Let’s now take a look at an example of the Web API using HTML+JavaScript to create a webpage that enables a user to execute a face recognition search

<!DOCTYPE HTML>
<html>
<body>
  <script_ src="js-browserify/roc.js"></script_>

<script_>
function rpc(requestProto, responseHandler) {
  const xhr = new XMLHttpRequest()
  xhr.responseType = "arraybuffer";
  xhr.onreadystatechange = function() {
    if (xhr.readyState == XMLHttpRequest.DONE) {
      if (xhr.response.byteLength == 0) {
        responseHandler(null, "No response from server!")
      } else {
        const response = proto.roc.Response.deserializeBinary(new Uint8Array(xhr.response))
        if (response.getResponsesCase() == proto.roc.Response.ResponsesCase.RESPONSES_NOT_SET) {
          responseHandler(null, "Invalid response from server!")
        } else if (response.getResponsesCase() == proto.roc.Response.ResponsesCase.ERROR) {
          responseHandler(null, response.getError().getError())
        } else {
          responseHandler(response, null)
        }
      }
    }
  }
  xhr.open("POST", "http://localhost:8080", true)
  xhr.setRequestHeader("Content-Type", "application/octet-stream");
  try {
    xhr.send(requestProto.serializeBinary())
  } catch (err) {
    responseHandler(null, err.message)
    throw(err)
  }
}

function at(index, thumbnailSetter, metadataSetter) {
  rpc((new proto.roc.Request()).setAt((new proto.roc.Request.At()).setIndex(index)),
      function(response, err) {
        template = response?.getAt().getTemplate()
        thumbnailSetter?.(err ?? ("data:image/jpeg;base64," + template.getTn_asB64()))
        metadataSetter?.(err ?? template.getMd())
      })
}

function search(probe) {
  rpc((new proto.roc.Request()).setSearch((new proto.roc.Request.Search()).setProbe(probe)
                                                                          .setK(3)
                                                                          .setMinSimilarity(0)),
      function(response, err) {
        if (err) {
          document.getElementById("candidate_thumbnail").src = err
          document.getElementById("candidate_similarity").innerHTML = err
        } else {
          const candidate = response.getSearch().getCandidatesList()[0]
          at(parseInt(candidate.getIndex()),
             tn => document.getElementById("candidate_thumbnail").src = tn)
          document.getElementById("candidate_similarity").innerHTML = "Similarity: <b>" + candidate.getSimilarity().toFixed(3) + "</b>"
        }
      })
}

function enroll(id, templateCallback) {
  const fileReader = new FileReader()
  fileReader.onload = function(e) {
    rpc((new proto.roc.Request()).setRepresent((new proto.roc.Request.Represent()).setImage(new Uint8Array(fileReader.result))
                                                                                  .setAlgorithmId(proto.roc.AlgorithmOptions.ROC_FRONTAL | proto.roc.AlgorithmOptions.ROC_FR | proto.roc.AlgorithmOptions.ROC_THUMBNAIL)
                                                                                  .setMinSize(20)
                                                                                  .setK(-1)
                                                                                  .setFalseDetectionRate(0.02)
                                                                                  .setMinQuality(-4)),
        function(response, err) {
          templateCallback(response?.getRepresent().getTemplatesList()[0], err)
        })
  }
  fileReader.readAsArrayBuffer(document.getElementById(id).files[0])
}

function enrollProbe() {
  enroll("probe_image",
         (template, err) => {
           document.getElementById("probe_thumbnail").src = err ?? ("data:image/jpeg;base64," +  template.getTn_asB64())
           search(template)
         })
}

</script_>

  <h1>Search</h1>
  <p>Probe Image <input id="probe_image" type="file" onchange="enrollProbe()"></p>
  <table>
    <tr>
      <td><img id="probe_thumbnail"></td>
      <td><img id="candidate_thumbnail"></td>
    </tr>
    <tr>
      <td></td>
      <td id="candidate_similarity"></td>
    </tr>
  </table>
  
</body>
</html>

In the above example, the face recognition search decomposes into three API calls:

  1. “represent” to generate the probe face template from the input image.
  2. “search” to compare the probe template against the gallery and obtain the highest matching candidates.
  3. “at” to retrieve the highest matching candidate for displaying.

Some of the key takeaways from this web-page example are:

  • js-browserify/roc.js is the single-source-file definition of the communication protocol.
  • The HTTP request is constructed using a platform-standard library, in this case XMLHttpRequest, with the body of the request being the ROC Protobuf message requestProto.serializeBinary().
  • After checking for an error response from the server, response.getResponsesCase() == proto.roc.response.ResponsesCase.ERROR, responses are casted to their expected type based on the request.
  • Otherwise this code should look quite familiar to current users of our SDK!

If you are interested in testing and building solution using the ROC Web API, please contact us today to begin!

Popular articles: 

The post Introducing the ROC Web API appeared first on ROC.

]]>
Hardware requirements for video processing applications – Part 1: Template generation https://roc.ai/2019/08/12/hardware-requirements-for-video-processing-applications-part-1-template-generation/ Mon, 12 Aug 2019 16:52:05 +0000 https://roc.ai/?p=421 When automated face recognition technology is used for analyzing streaming video, an important question is: how much computer hardware is needed? The hardware required to process video depends on several factors which will be discussed in this article.

The post Hardware requirements for video processing applications – Part 1: Template generation appeared first on ROC.

]]>
When automated face recognition technology is used for analyzing streaming video, an important question is: how much computer hardware is needed?

The hardware required to process video depends on several factors which will be discussed in this article. After reading you should be able to determine hardware requirements for a particular application and algorithm. 

If you are unfamiliar with the efficiency metrics of relevance for face recognition algorithms we recommended you first read our previous article on why efficiency metrics matter

Enrolling frames: the computationally burdensome step in video processing

There are two face recognition steps for video processing applications that will require processing power and hardware use: (i) detecting and enrolling faces found into templates for each processed video frame, and (ii) searching the created templates against a gallery. If these concepts are unfamiliar, we encourage you to read our article on how face recognition works

This article focuses on the efficiency requirements for enrolling faces in video frames into templates, which is the computational bottleneck for video processing applications. There are also computational demands for performing watch-list searches of templates processed in video frames against gallery databases. However, because watch-list searching is not nearly as a computationally burdensome, we instead cover the computational cost of template comparison in video applications in a supplemental article.

Enrolling video frames is a CPU intensive task. In order to assess the number of CPU cores required you first need to determine the following information: 

  • Number of Streams: the number of video streams that will be processed concurrently.
  • Max Faces: the maximum number of faces that will appear across all video feeds at a single given time.
  • Templates per Second: the number of templates per second the face recognition algorithm can generate on a single CPU core. Note that NIST FRVT Ongoing reports this statistic as Enrollment Speed, where Enrollment Speed = 1  / Templates per Second, which is the time it takes to generate a single template.
  • Frames per Second: the number of frames per second (fps) the algorithm will process (as recommended by your vendor and use case).

Using this information, the number of CPU cores required for enrollment is roughly determined as follows: 

Enrolling CPU Usage = Max Faces * Frames per Second / Templates per Second

If, however, the Enrolling CPU Usage is less than Number of Streams, then:

Enrolling CPU Usage = Number of Streams. 

For example, let’s suppose that your application will be processing 4 camera feeds (Number of Streams = 4) with a maximum of 10 faces at a time (Max Faces = 10), i.e., across all 4 cameras feeds, at any given time, the largest number faces that will appear at one time is 10, and your face recognition algorithm can generate 4 templates per second per CPU core (Templates per Second = 4) and recommends processing 5 fps (Frames per Second = 5). This would mean 10 * 5 / 4 = 12.5, which implies the Enrolling CPU Usage is 13 (i.e., 13 CPU cores), as you should always round up. 

As another example, let’s suppose that your application will be processing 6 camera feeds (Number of Streams = 4) with a maximum of 4 faces at a time (Max Faces = 4), and your face recognition algorithm can generate 4 templates per second per CPU core (Templates per Second = 4) and recommends processing 5 fps (Frames per Second = 5). This would mean 4 * 5 / 4 = 5. Because 5 is less than the Number of Streams, 6, the Enrolling CPU Usage is 6 (i.e., 6 CPU cores).

Using these guidelines to determine hardware costs for a vendor solution

It is important to use the guidance in this article in conjunction with the information provided by your algorithm vendor, as well as their measured performance in the NIST FRVT Ongoing benchmarks. As always, a vendor who does not submit their algorithm to NIST FRVT should never be considered. 

The information you will need from your vendor is the enrollment speed, and the number of frames per second recommended. Information you will determine yourself is the maximum number of faces appearing across all video streams at any given time, and the total number of video streams. When used with the formula provided in this article, you will be able to properly estimate the hardware requirements for your application.  

Like this article? Subscribe to our blog or follow us on LinkedIn or Twitter to stay up to date on future articles.

Popular articles: 

The post Hardware requirements for video processing applications – Part 1: Template generation appeared first on ROC.

]]>
Hardware requirements for video processing applications – Part 2: Template comparison https://roc.ai/2019/08/11/hardware-requirements-for-video-processing-applications-part-2-template-comparison/ Sun, 11 Aug 2019 16:52:22 +0000 https://roc.ai/?p=426 In this article we explain how to factor in the computational demand for template comparison in video processing applications. While this task is not as computational burdensome as template generation, for larger-scale applications it can become meaningful.

The post Hardware requirements for video processing applications – Part 2: Template comparison appeared first on ROC.

]]>
While enrolling video frames into templates is the bottleneck for video processing applications in face recognition, there is also a computational cost for using the generated templates for search and identity verification. While the cost is often negligible, for large-scale applications it can become meaningful enough to need to be factored into procurement considerations. 

In this article we will discuss the computational considerations for template comparison tasks in video processing applications. We encourage first reading our article on the computational cost of generating templates in video processing applications, as this is the computational bottleneck. For readers unfamiliar with face recognition efficiency metrics we recommended you read the article on why efficiency metrics matter as well. 

Hardware required for searching and comparing templates

Templates generated during the enrollment process will typically be further processed by a template comparison algorithm for the following purposes: tracking, consolidating searching, and/or verifying. In discussing the computational demands of these tasks, the following information is needed: 

  • Comparisons per Second: the number of template comparisons per second, per CPU core, that can be performed by the algorithm. Note that NIST FRVT Ongoing reports this statistic as Comparison Speed, where Comparison Speed = 1  / Comparisons per Second, which is the time it takes to perform a single template comparison.
  • Templates per Track: is the number of templates retained in a “face track”, where a face track is a set, or subset, of consecutive set templates of a person tracked in a video feed.
  • Max Faces: the maximum number of faces that will appear across all video feeds at a single given time.

Tracking

Typically a first step in using templates generated from video feeds is to track the templates into the different identities present in the video feed. 

Tracking faces in video generally requires all faces detected in a given video frame to be compared against all existing face tracks. In turn, the face templates can be assigned to the face track corresponding to the same identity. The computational cost of this operation is roughly determined as:

Tracking CPU Usage = (Max Faces)^2 * Templates per Track / Comparisons per Second

Note that Max Faces is representative of both the maximum number of faces present across all video feeds, and the number of face tracks being processed. 

Typically the Tracking CPU Usage is extremely low. For example, if there are at most 10 faces in a set of video feeds at a given time (i.e., Max Faces = 10), and each face track retains 5 templates (i.e., Templates per Track = 5), and Comparisons per Second = 1e7 (1e7 = 10 million), then Tracking CPU Usage = 10^2 * 5 / 1e7 = 0.00005, which is a fairly trivial fraction of CPU usage.

Though this approach does not require much processing power (if an algorithm has a fast comparison speed), some algorithms may use lower cost heuristics, such as adding templates to the track that is closest in spatial location (e.g., if faces in consecutive frames were detected in the same location, put them in the same track). However, approaches like this can suffer from poor tracking in the presence of densely located faces (e.g., two people close to each other) or other factors. Unless an algorithm has an unreasonably slow comparison speed, it should be assumed that the template comparisons described above are performed for tracking.

The output of the tracking step is typically sets of templates corresponding to the different identities present in the video feed. There will often be a subsequent task of either searching these templates against a gallery (1:N+1) or comparing them to a claimed identity (1:1), which are discussed below. However, it could also be the case that tracking and consolating are performed to store the identities and no further comparisons will be required beyond tracking.  

Consolidating

In order to maintain the Templates per Track, each time a new template is matched to a track during the Tracking step, a decision needs to be made whether or not to retain this template in the corresponding face track, and, if it is retained, which existing template to drop (in order to not exceed the Templates per Track limit). 

There are many different techniques that can be employed to determine which template to retain. In some cases this will be templates with the highest quality scores, in other cases this will be templates with different characteristics (e.g., different face pose angles). 

The most computationally exhaustive approach involves cross-matching all existing templates in a track, along with the newly detected template from the track. In turn, a template can be dropped based on the similarity information (e.g., drop a template that provides the least additional information). The computational cost of consolidating all tracks at once is:

Consolidating CPU Usage = (Max Faces * (Templates per Track + 1) * Templates per Track / 2) / Comparisons per Second

Similar to tracking, for consolidating, the CPU usage is typically very low. For example, if there are at most 10 faces in a set of video feeds at a given time (i.e., Max Faces = 10), and each face track retains 5 templates (i.e., Templates per Track = 5), and Comparisons per Second = 1e7, then Tracking CPU Usage = (10 * 6 * 5 / 2) / 1e7 = 0.000015. 

Searching

Often times tracked faces are searched against a gallery in order to determine the person’s identity, which is also known as watch-list identification. This may be for different reasons, including determining if the person is on a security blacklist or on a VIP whitelist. 

Typically this process is done once per face track. There are different ways that the templates in a face track can be searched against a gallery. For example, all the templates could be searched against the gallery, which is the most computationally burdensome approach (as well as the most comprehensive from an accuracy perspective). Or, a single template (such as the one with the highest quality score) can be used for a single search. We will assume that all templates in a face track are searched. 

The computational cost for searching a face track against a gallery of N templates is:

Searching CPU Usage = Max Faces * Templates per Track * N / Comparisons per Second

For example, if there are 10 face tracks (i.e., Max Faces = 10), 5 Templates per Track, a watch-list gallery with 1e4 templates (1e4 = 10,000), and a Comparisons per Second of 1e7, then Searching CPU Usage = 10  * 5 * 1e4 / 1e7 = 0.05. 

If an algorithm has a fast comparison speed, then there not be a significant amount of CPU usage for searching. However, while the Rank One algorithm has a comparison speed of roughly 1e7 (i.e., 10M comparisons per CPU core, per second), the average NIST FRVT algorithm is roughly 10x slower, which would mean 0.05 CPU usage would become 0.5 CPU usage, which, in this example, means an additional CPU core would be required. Other NIST algorithms have 100x to 1000x slower comparison speeds, which creates significant CPU requirements to perform video-based watchlisting.

In addition to the wide fluctuation in comparison speeds, the number of templates in the gallery N can significantly influence the CPU usage. In the example above, N was set to 1e4. This number is somewhat meaningful in that larger gallery sizes typically cannot be searched with stable accuracy in watch-listing applications. However, depending on the level of security involved in an application, and in turn the number of human analysts available to adjudicate watch-list match alerts, the size of N can be upwards of 1e6 (i.e., 1 million) or even beyond 1e8 (i.e., 100 million). In these cases several additional CPU cores may be required. 

Verifying

It may be the case that face tracks are used for verifying a person’s identity. In this scenario the person claims to be a given identity, and face verification is performed by comparing the person’s presented face to the stored face template(s) corresponding to this person’s identity. The person may claim their identity by entering a pin, scanning an access card, providing an NFC token from their mobile phone, or other methods. 

Typically in these cases there is only one face in each video stream, as most access control identity verification systems are designed to process one user at a time. The Max Faces value could still be higher than 1, though, as a central server could be processing multiple video feeds / access control points. 

The computational cost for verifying from video streams is determined as follows, where Templates per Person is the number of templates stored for each identity in the system:

Verifying CPU Usage =  Max Faces * Templates per Track * Templates per Person / Comparisons per Second

For example, if there are 10 face tracks (i.e., Max Faces = 10), 5 Templates per Track, 5 Templates per Person stored, and a Comparisons per Second of 1e7, then Verifying CPU Usage = 10  * 5 * 5 / 1e7 = 0.000025. 

Aside from an algorithm with an extremely slow comparison speed, or a system that is processing a large number of face tracks (i.e., a high Max Faces), compute cost for verification is typically trivial. 

Summarizing hardware costs for template comparison

Typically there is not a large CPU cost for the different tasks that may be performed after enrolling video frames (i.e., tracking, consolidating, searching, and/or verifying). However, certain factors may result in meaningful CPU resources being required. These include a slow template comparison speed, a large gallery for watch-listing/searching, and a large number of Max Faces (persons in the video streams at once). 

For watchlisting / searching applications, there will also be memory requirements that will be directly based on the algorithm’s template size. Please refer to our previous article on the implications of template size for this consideration. 

It is important to use the guidance in this article in conjunction with the information provided by your algorithm vendor, as well as their measured performance in the NIST FRVT Ongoing benchmarks. As always, a vendor who does not submit their algorithm to NIST FRVT should never be considered. 

Like this article? Subscribe to our blog or follow us on LinkedIn or Twitter to stay up to date on future articles.

Popular articles: 

 

 

 

 

The post Hardware requirements for video processing applications – Part 2: Template comparison appeared first on ROC.

]]>
Face Recognition Dictionary https://roc.ai/2018/11/01/face-recognition-dictionary/ Fri, 02 Nov 2018 02:32:19 +0000 https://roc.ai/?p=110 A comprehensive set of definitions and terms used when discussing face recognition technology.

The post Face Recognition Dictionary appeared first on ROC.

]]>
A comprehensive set of definitions and terms used when discussing face recognition technology.

This page will be a continuously evolving reference for the basic terminology used when evaluating, integrating, and operating face recognition algorithms. Please let us know if there are any definitions or descriptions you would like added!

Algorithm Evaluation

Accuracy – the rate at which the system makes a correct prediction regarding a person’s identity. Accuracy will range from 0.0 to 1.0, though this will also be expressed as percentages, in which case it will range from 0.0% to 100.0%. Accuracy = 1.0 – Error.

Error – the rate at which at the system makes an incorrect prediction regarding a person’s identity. Error will range from 0.0 to 1.0, though this will also be expressed as percentages, in which case it will range from 0.0% to 100.0%. Error = 1.0 – Accuracy.

Type I error / false match / false positive / false acceptance – when two different persons are incorrectly determined to be the same person because a comparison of their face templates exceeds the specified similarity threshold.

False Match Rate (FMR) / False Accept Rate (FAR) – the frequency / percentage of comparisons that are false matches.

Type II error / false non-match / false negative / false rejection – when two instances of the same person are incorrectly determined to be different persons because a comparison of their templates falls below the specified similarity threshold.

False Non-Match Rate (FNMR) / False Reject Rate (FRR) / 1.0 – True Accept Rate (TAR) – the frequency / percentage of comparisons that are false non-matches.

Receiver Operating Characteristic (ROC) curve – measures the tradeoff between false matches and false non-matches on a dataset of face images. The curve is generated by systematically adjusting the match threshold, and for each different threshold measuring the FAR and TAR. As the threshold increases both the FAR and TAR will decrease. 

Decision Error Tradeoff (DET) curve – similar to the ROC curve, measures the tradeoff between false matches and false non-matches. The difference between a DET curve and a ROC curve is that a DET curve plots FAR versus FRR, each typically on a logarithmic axis. Thus, the information reported is the same, but the presentation style is different.

Cumulative Match Characteristic (CMC) curve – measures the frequency that a person in a probe image is matches against their same identity when being searched against a gallery. The x-axis of the plot contains the rank. The frequency plotted at rank 1 is the percentage of times the top match in the gallery is the same person. The frequency plotted at rank 2 is the percentage of times that at least one of the top two matches in the gallery is the same person. The frequency plotted at rank 3 is the percentage of times that at least one of the top three matches in the gallery is the same person. Etc.

Face Recognition API Concepts

Enrollment – the process of receiving an image or video frame, detecting all faces present, and outputting a template for each detected face.

Template – the numerical encoding of a face in an image.

Template comparison – the process of measuring the facial similarity between two templates.

Facial similarity – the similarity measured during the template comparison process. While the similarity will be a numerical value, and often ranges from 0 to 1, no assumptions can be made about the meaning of a given similarity score for an algorithm without knowledge of the underlying distribution, which will be different for every vendor.

Similarity thresholding – the process of converting a numerical similarity score measured between two face templates into a match or no-match determination. This typically involves a single static similarity threshold, such that any similarity score lower than the threshold is determined to be a no-match, and any similarity score greater than the threshold is determined to be a match.

Probe / Query – a template submitted for search against a gallery.

Gallery / Database – a collection of templates to be searched against.

Candidate match list – an ordered list of the top matching templates in a gallery to a submitted probe image. Templates are typically returned in decreasing similarity. Typically a trained facial examiner will make the final determination as to whether any of the images in the candidate match list are the same person as the probe image.

1:N search / human-guided search – the process of a manually submitting a probe image to be search against a gallery, receiving the candidate match list, and determining if a match exists. “1” refers to the single probe image, where “N” is an integer that represents the number of templates in the gallery.

1:(N+1) search / watch-list identification / open-set identification  – the process of automatically searching a probe image against a gallery. As opposed to 1:N search which will return a candidate list to a human examiner, watch-list identification will instead send match alerts if any of the gallery templates exceed a similarity threshold when compared against the probe template. Thus, the “+1” refers to the null hypothesis case of the person not being in the watch-list gallery.  In this searching paradigm a human is only alerted when match occurs, as opposed to a human always reviewing the search results when a probe is compared against a gallery.

1:1 / identity verification – the process of comparing two face templates and determining if they are a match using similarity thresholding.

N:N / facial clustering  – facial clustering is the process of taking a set of N face images and grouping them into their different identities. This process involves, either explicitly or implicitly, measuring the facial similarity between all N faces images, which means N*(N-1)/2 total facial comparisons. Depending on the efficiency of an algorithm, this process can be extremely slow and resource intensive if N is large.

There are a lot of different applications for facial clustering. One of the most common use cases is in child exploitation, where investigators serving warrants often end up with large amounts of digital evidence (images and videos) containing children (victims) and adults (perpetrators). Using facial clustering, the investigators can ingest these images and determine how many different persons are in the dataset. In turn, the images from these different identities can passed into a 1:N search system to determine the identities of each person, and in turn either help rescue them (victims) or proceed with the criminal investigation (perpetrators).

Interocular distance (IOD) / Inter-pupillary distance (IPD) –  – the number pixels (Euclidean distance) between the center of the two eye sockets. It is common for a face recognition algorithm’s sensitivity to image resolution to be measured as accuracy versus IPD.

Minimum bounding box size – the smallest size face that will be searched for in an image. This is typically a single number, measured in pixels, which specifies the height and width of the square face bounding box. As the minimum bounding box size is set smaller, exponentially more face regions will be considered, which will slow down the enrollment speed and increase the chances of a false positive face detection.

Integration

System – software and hardware configured to perform a particular task(s). A system can be operated by a person(s) or another system.

Software – a series of instructions that are performed by a computer.

Hardware – physical devices, which may include a central processing unit (CPU), memory (e.g., RAM), storage, touchscreen, camera, etc.

Native software – byte-level machine code that is executed directly by a central processing unit (CPU). Native software is dependent on the software platform it was compiled for.

Software Platform – the CPU architecture and operating system used for running software. E.g., Ubuntu Linux 16.04 running on an x64 CPU.

Software Development Kit (SDK) – provides software libraries that perform specific functions, such as face detection and recognition, and are accessed through API‘s and command line interfaces. An SDK typically has little off-the-shelf utility, and it must instead be embedded into a system. An SDK is a critical component that powers nearly every system that exists.

In terms of face recognition, many developers of face recognition systems license SDK’s from third parties. There are also larger companies that both have their own software development kit and develop systems around it.

An effective SDK will require little to no installation, provide an intuitive documented API, and support a variety of software platforms.

Software Application – executable software designed for end-user interaction, such as a Graphical User Interface (GUI) or a command line interface (CLI).

End-user – a person who interacts with a software application or a system.

Software Library – a collection of software functions that are called by other software libraries or applications.

Software function – a set of computer instructions that are called and run based on provided input and output parameters. Input and output parameters are defined by the function.

Application Programming Interface (API) – the set of functions accessible to a developer. An API is written in a specific software language (e.g., C, C++, Java, Python, Go).

Native API – an API that accesses functions in native software that is running on the same machine that calls the API. With respect to face recognition, an SDK with a native API provides a system developer the most control over how their data is handled, as it should be the case that the SDK only performs the actions stated in the documentation, and the developer controls any data storage or transmission. Some SDKs may still perform unwanted data transmission and storage, which is not difficult to identify during security testing.

Turn-key system  – a system that needs no custom integration, modification, or complicated installation in order to work. It is simply a matter of “turning the key” and using the system. 

One-off system  – a system that is custom developed for use in only a single deployment. Typically a one-off system is derived from an existing system. 

Web API – an API that allows functions to be called between machines, one being the client machine that makes the web API call and the other being the host machine that receives and processes the API call. With respect to face recognition, a web API means the client machine will be required to send images and data to the host machine (e.g., a cloud server). It is not possible to know if the host machine is storing the images and data longer than necessary.

Frames Per Second (FPS) – the number of frames / images sampled from a video in a second. The frames are sampled in uniform / even manner. Standard cameras record video up to 30 FPS. Face recognition typically do not benefit from sampling frames at a rate higher than 5 FPS.

Efficiency Metrics

Enrollment speed – the amount of time it takes to detect and templatize all faces in an image. Enrollment speed is typically measured on a single processing core, and will be dependent on the speed of the processor, the number of faces in the image, and the resolution of the image.

Comparison speed – the amount of time it takes to compare two templates and generate a threshold.

Template size – the number of bytes required to represent a face. When performing 1:N search it is generally required to cache all N templates into the computer’s memory to provide quick responses. Thus, the amount of RAM required to load N templates will be N times the template size. In addition to requiring less memory, smaller templates enable larger galleries to be searched more quickly.

Binary size  – the amount of memory (RAM) used by face recognition software, which comprises of the code libraries and statistical models. Embedded devices (e.g., mobile phones) have limited memory resources, and some face recognition algorithms require more memory than is available on the entire device (let along the fact that other applications need memory as well). 

Environmental Factors

Constrained  – when aspects of the capture environment (camera configuration, illumination, background, etc.) can be controlled. Face recognition algorithms typically do better in constrained / controlled environments.

Unconstrained  – when aspects of the capture environment (camera configuration, illumination, background, etc.) cannot be controlled.  Face recognition algorithms typically have increased error rates in unconstrained / non-controlled environments. Until around 2015 almost no commercial face recognition algorithms supported unconstrained environments. Today, viable face recognition vendors are expected to handle off-angle face images, varying illumination, different cameras types, and other variates.

Facial pose angle – the orientation of a face relative to a camera, measured as Yaw, Pitch, and Roll.

face definition

Yaw angle  – rotation of the face about the Y-axis of the camera plane. E.g., when a person turns their face to the left or the right relative to the camera.
face matching percentage

Pitch angle  – rotation of the face about the X-axis of the camera plane. E.g., when a person tilts their face up or down relative to the camera.

face comparison percentage

Roll angle – rotation of the face about the Z-axis of the camera plane.

face matching algorithm

Occlusion – when regions of the face are covered. E.g., due to sunglasses, scarf, hair, or capture conditions.

Applications

Identity Deduplication – the process of cross-referencing identity document applicants’ face image against existing images in a identity document database. This is process is typically semi-automated, where a human investigator only intervenes if a face image from two different identities generate a similarity score above the specific threshold, which is indicative of a fraudulent application. Identity deduplication is performed by agencies and organizations that that grant identity documents, such as driver’s licenses and passports, and financial institutions.

Forensic Search – when an analyst searches a gallery with a probe image in an attempt to determine the identity of the person in the probe image. Typically a forensic search recognition system will provide the analyst with a list of the gallery images with the highest similarity scores to the probe mate. In turn, the analyst will manually verify if any of search results are the same person as the probe image.

Access Control – a 1:1 verification system where a user claims their identity (e.g., via a username, ID card, or badge) and presents their face to the system for access. These systems range from mobile device unlock to accessing a secure facility.

Real-time Screening  – a 1:(N+1) watch-list identification system that analyzes a camera feed or streaming video to compare each detected facial identity against a gallery watchlist. While real-time screening shares similarities to identity deduplication, it is typically performed on much smaller galleries, and receives streaming video as an input data.

Licensing

Perpetual license – a grant to use software in a specified manner (e.g., on a specific machine, with certain usage parameters) in perpetuity (forever).

Maintenance fees – fees collected on a perpetual license to provide software updates and technical support. Typically these fees are collected on an annual basis and are a percentage of the original license cost.

Computer Architecture

Random Access Memory (RAM)  – volatile memory that can be rapidly read by a central processor unit (CPU). The term volatile means that the contents of the memory will be erased if power is lost, which differs from persistent storage mediums such as hard-drives. The notion of random access, or uniform read and write times, means that data can be read from any physical location in memory in roughly the same amount of time, or written to any location in memory in roughly the same amount of time.

As it pertains to face recognition, search applications generally require that the templates are stored in RAM so they can be rapidly compared against a probe template. This means there will be a latency when a search application is first initialized while the templates are read from persistent storage into RAM, and that there needs to be enough RAM bandwidth available to load all of the templates in RAM at the same time.

Persistent storage  – non-volatile memory that is typically 20x to 1000x slower to read and write to than RAM, and can also have non-uniform read and write times. Examples of persistent storage include hard-drives, flash memory, cloud storage, and file servers (note that cloud storage and file servers typically are abstractions to configured hard-drives).

As it pertains to face recognition, templates, images, and videos need to be saved on a persistent storage medium. For verification applications, as they only need to reference a single template, a template can typically be read from the persistent storage when needed (as opposed to having all templates loaded into RAM first). For search applications, the templates need to be first read from persistent storage into RAM in order to enable searching the gallery in a reasonable period of time. Finally, any template generated by a face recognition application that is not saved to a persistent storage medium will eventually be lost.

I/O bound  – when an algorithm or application has to wait longer to read or write data than it does to process the data. The “I” in I/O refers to Input, and the “O” refers to Output.

Compute bound  – when an algorithm or application has to wait longer to process data than it does to read or write the data.

The post Face Recognition Dictionary appeared first on ROC.

]]>