As more and more businesses move to an increasingly online-focused operating model, they open up new attack vectors for fraudsters – and often struggle to put adequate fraud detection safeguards in place.
In industries such as retail and advertising, where fraud has not been a major threat to offline operations in the past, developing effective online fraud prevention strategies can be a challenge. For example, the Interactive Advertising Bureau (IAB) estimates that fraudulent practices are currently costing the U.S. digital marketing, advertising and media industry USD 8.2 billion per year.
But the problem is not confined to industries where fraud is a relatively new challenge. Even in the insurance, banking, government and telecommunications sectors, where fraud prevention has been a high priority for many years, traditional methods of detection and prevention are struggling to cope with new, more sophisticated attacks.
In particular, the rise of well-coordinated criminal gangs means that it is no longer enough for organizations simply to detect large anomalies in individual transactions. When fraudsters work together and spread their activity across a large number of transactions, you need to be able to look for much subtler patterns in customer behaviors and relationships.
The evolving nature of online fraud – with most fraudsters now aiming to bleed their targets for multiple small pay-offs over time, instead of risking a single big score – also puts a premium on the ability to detect problems quickly. The sooner you can identify a suspicious behavior pattern or a group of individuals who seem to be colluding, the faster you can block their activity, which helps to prevent additional losses and minimize the drain on your bottom line.
How to Prevent Fraud Before You Pay the Price
Fundamentally, fraud detection depends on an ability to analyze the relationships between customers and transactions, and recognize patterns or trends.
Let’s take an example: a complex credit card fraud scheme. Fraudsters create a large number of false identities, establish fake credit histories, apply for credit cards, and then collude with unscrupulous businesses to process “purchases” that never actually take place. When the credit card company pays up, the merchants and fraudsters then share the proceeds.
To apply for a credit card, you need a mailing address; and since genuine mailing addresses are relatively difficult to obtain, it is likely that the fraud ring will re-use a small number of addresses for multiple applications. Equally challenging, it’s not easy to find merchants who are willing to be a party to such a scheme, so it is likely that the majority of the transactions will take place with just a handful of merchants.
The result is that the fraudulent transactions share relationships that genuine transactions wouldn’t share. Imagine that Alice lives at 123 Fake Street, buys a diamond ring from Crooked Joe’s Jewelry Store, and then vanishes into thin air without paying her credit card bill. Then Bob, who supposedly lives at the same address, does the same thing. When their roommate Carol tries to make a purchase at Crooked Joe’s too, it is fair to assume that something may be amiss.
It’s an over-simplified example, but the point remains: if you can recognize the right patterns in the relationships between customers, addresses and merchants (or between vehicles, locations, insurance policies and claims, or whatever data you possess) you can potentially detect any future suspicious transaction that follows the same pattern.
However, there is a snag: most fraud detection schemes generally depend on analyzing these relationships retrospectively. Organizations pass historical transactions and customer data into their fraud detection models on a daily, weekly or monthly basis, and hope to identify suspicious transactions that have occurred during the previous period.
The problem, of course, is that by the time the analysis is complete, the fraudsters have probably already received the money or goods – and recovering the loss may either be impossible, or require costly and time-consuming legal action.
Wouldn’t it be better if you could bring fraud analytics to the point of transaction and identify suspicious behavior before the claim is processed or the products are dispatched? That way, you could prevent losses before they happen and limit the impact of fraud on your bottom line.
Why Aren’t We Doing This Already?
Real-time fraud analytics can transform a company’s ability to protect itself from avoidable losses – but it is still a relatively uncommon strategy. Despite investing time in developing complex offline fraud detection models, most companies still only run those models on a monthly, weekly or daily basis.
The difficulty of achieving real-time fraud detection results from three main factors: the ever-growing complexity of fraud detection models, the massive growth in the volumes of data that these models need to process, and the ever-greater demands of today’s customers.
As volume and complexity grow, the system running the fraud detection algorithms is often unable to process the data quickly enough, and can take a long time to return a result. In such circumstances, it may be impossible to provide an instant, responsive user experience. Customers will not tolerate a system that requires them to wait on the phone or sit on a website for more than a few minutes, just to submit an insurance claim or purchase a product. For this reason, introducing a fraud detection system at the point of transaction is impractical unless you can ensure that it will deliver the right result within a few seconds at most.
The problem is that when most existing fraud detection systems were originally built, the technology required to traverse relationships between data in real-time did not exist – or at least, was not available in a scalable, production-ready form. Until relatively recently, the main options were either to use a relational database, or try a NoSQL alternative – for example, a JSON document store such as MongoDB. Unfortunately, both of these types of technologies have problems when dealing with data-sets where the relationships between records are the primary topic of interest.
In a relational database, for example, when you want to traverse a relationship between the records in a relational database, you need to write a SQL query that joins the relevant tables together. As the complexity of the query rises, the number of joins increases, and it requires ever-greater computational power to return a result in a timely manner. This makes real-time analysis impractical at scale, without cost-prohibitive investment in hardware.
With most types of NoSQL databases, you have a different type of problem. If you use a JSON document store, for example, you don’t have a well-defined way to link records together at the database level. Defining relationships and creating efficient ways to traverse them becomes a task for application developers to solve, since the database does not provide a native way to handle these tasks. Even if your developers are smart enough to build a solution that provides sufficient performance, the effort is almost certain to add significant cost and complexity to the application development process.
The Answer: Graph Databases
Graph databases such as IBM Graph offer a third option that can help to solve this problem. In a graph database, both records and relationships are first-class citizens, which means that you can accomplish complex traversals from one record to another very quickly. To understand how this works, let’s compare it with the way a relational database handles traversals.
Imagine your relational database has two tables: Customers and Merchants. In the schema, you specify a relationship between those tables: for example, each Customer can make purchases from multiple Merchants, and each Merchant can sell products to multiple Customers. However, each time you want to find out which Customers have bought products from a particular Merchant, you must first join the Customers and Merchants tables together, and then look through the new combined table to find any records that fit your criteria.
By contrast, in a graph database, there are no separate tables storing your data, so the joining process is unnecessary. The database contains not only Customer and Merchant elements (known as “nodes” or “vertices”), but also relationship elements (or “edges”) that specify how the vertices are linked to each other. If a Customer buys from a new Merchant, a new edge element will be added to the database, linking the two objects and describing their relationship.
The explicit relationship between edge elements and node elements makes the dataset very fast to traverse: running sophisticated relationship-based queries in real time is no longer a challenge, making it easy to analyze unusual behavior patterns – and therefore detect fraud – at the point of transaction.
Why Aren’t Graph Databases More Widely Used?
Although graphs can be much easier to visualize than relational data structures, many IT professionals are still apprehensive about the mechanics of using graph databases in practice. Mathematical graph theory, which underpins the way these databases store and traverse data, is a very complex subject – and there is often an assumption that to set up, manage and maintain a graph database, you need to understand a lot of advanced math.
Happily, now that graph database technology has matured, this is no longer the case. Modern graph databases, such as IBM Graph, generally offer modern APIs, letting you insert or extract data via simple HTTP requests. Querying a graph database has also become very intuitive, thanks to a new generation of query languages like Gremlin, which abstract away the underlying complexity.
IBM Graph also comes as a fully managed cloud service – provisioned, maintained and supported 24/7 by IBM experts. The cloud infrastructure allows you to spin up new database instances in seconds, and there is no need to worry about basic administrative tasks such as patches and updates: IBM handles them for you.
The cloud service’s web interface lets you monitor and manage your Graph instances, load data, run queries and graphically visualize the results within a few mouse-clicks. As well as a host of powerful features for experienced users, it also provides a user-friendly environment for beginners who want to learn more about graph technology, with sample data and queries to help you take your first steps, and hosts a set of easy-to-follow tutorials about graph-related technologies.
Built from the Ground up for Real-time Analytics
The key enabler of effective fraud detection systems is the ability to traverse and analyze the relationships between customers, products, places and other entities. Graph databases are built from the ground up to support this type of analysis, with none of the performance problems that are common to most relational databases and NoSQL document stores.
If you embrace the graph database approach, you can achieve real-time, transaction-level analysis of customer behavior quickly and with minimal investment. If you need to identify fraudulent activity by spotting patterns and relationships between a ring of fraudsters in your customer-base, the most effective solution is a database that puts relationships first.
Learn more about GraphDB
- No More Joins: An Overview of Graph Database Query Languages
- Getting started with IBM® Graph
- Should I care about graph databases?