2.3. Anti-Money Laundering

This is a synthetic dataset generated to study AML techniques.

This AML Dataset and its generation techniques are described here.

It was originally published under the Community Data License Agreement Sharing 1.0, and this alternative format of the same data is published here under the terms of the same CDLA-Sharing-1.0 License.

2.3.1. Dataset Modification from the Original

The original data was a single file. This format splits the data into two components compatible with property graph data structures.

  • There is a node file, Accounts, containing bank account information. A new property called acct_id is generated by juxtaposing the bank number and the account number, separated by a vertical bar (|).

  • There is an edge file, Transactions, containing the transactions.

2.3.2. Downloadable Datasets

There are two sizes of the data available in this alternative form. The first is the full dataset with 45+ million transactions. The second uses the first 1 million transactions and the associated accounts.

Full AML: 45,403,506 edges and 9,914,140 nodes

1 Million AML: 1,000,000 edges and 1,101,709 nodes