Payments Cards & Mobile - a leading magazine for global payments news, has asked Cristina Soviany - CEO and VP R&D of Features Analytics, to talk about the applications of machine learning and big data to the financial industry. The article is entitled "Behind the Big Data hype" and it appeared in the Jan/Feb 2016 issue. In the article it is discussed how the data rich payment industry can use now machine learning to derive intelligent actions for the business.
In the article section entitled: Does the size matter? (pages 20-22), Soviany answers the question: "When it comes to data, is it really a case of bigger is better, though? How much data is enough?"
For Cristina Soviany, CEO, Features Analytics, a Belgian firm specializing in machine learning technologies, huge quantities of data alone is not enough to build accurate predictive models. Soviany notes that sometimes one can get by quite well with limited or small amounts of data. "The answer lies in the quality of data combined with the ability to enhance it with the right features, or sets of variables, able to detect hidden patterns but also evolve with the data," she says. When Features Analytics builds models to detect payment fraud, having the right type and amount of historical data is important, ideally 12-18 consecutive months of transaction data. The data also needs enough statistical coverage. "In the case of payment fraud solutions, we are used to building models where the fraud class size is 0.005 percent to 1 percent of total transaction numbers," explains Soviany. "If the volumes of data are large enough - more than hundreds of millions of transactions or samples, for example - and if the above percentages hold, then we have enough statistical data," Good data quality enables Features Analytics to apply algorithms to select the best variables, and to design new features to ensure the models learn and evolve. So, it is not always a case of bigger is better. Both quantity and quality of data matter - as well as the underlying modeling technology, which drives insights.