What sets Plaid apart from its competitors in terms of data connectivity or data quality beyond access to specific institutions?

So when Plaid entered the scene, the biggest differentiator for them was the developer experience and the business model. Pre-Plaid, you needed to sign a contract with a minimum of at least $500 a month. You couldn't just get access to an API or sign up for an API key. You’d have to talk to a salesperson.

Then, Plaid came onto the scene and said, "Hey developer, you can get started in two minutes for $0. We're going to create a much better developer experience around the documentation, around just the APIs than Yodlee did.” That's what really differentiated Plaid from Yodlee, and what allowed Plaid to get themselves onto the scene.

Yodlee has been playing catch up since. Now, you can actually get a Yodlee plan for a much lower cost as well but, you can still see the level of difference in API documentation and ease of use, for sure. It's hard to build that and it's a cultural thing, but they of course have been playing catch up in many areas.

With regard to data quality, I've heard things both ways. In our personal experience, it's really institution specific. It’s hard to just say in general, Plaid has better data quality versus Yodlee or vice versa. What's a lot easier to tell is if a specific bank is supported or not. It gets harder the deeper you want to go, "Well, for what percentage of the time is it supported?” “We say it's supported but will it fail 100 percent of the time?" That, you can only know by trying.

Technically, Capital One was on Plaid's list of supported banks for a long time, but for an entire year and a half, it was pretty much failing 99 percent of the time. And then, the deeper you go, "What about the quality of the transactions themselves? How many months or years of history do you get back? What about the richness of that data?" The more you want to go into it, the harder it is to actually evaluate. By the time you get a few levels down, it is really hard to just say without actually running real data and real traffic against them, to see which one actually works better for one specific use case. It's quite challenging to say that in the abstract.

That being said, with open banking becoming more and more adopted, I do see disparity in data quality becoming less and less of an issue because if all of these vendors are going to be talking to the same underlying banking APIs that the banks are coming out with, then, the data will actually just be the same.

It's only when you're scraping that there’s a bigger concern around what is the quality of the scraper you built because once it becomes standardized APIs, then, the quality should be much more standardized as well.

Now there's a whole other set of data activity called data enrichment, which is not the data that comes from the banks themselves, but the additional processing that the data aggregators do on top of them. I'm seeing an interesting trend where there are third-party players now, like Heron Data and others, whose entire purpose is to enrich data. They don't aggregate any data, they're breaking the aggregator apart saying, "We're not going to do the job of aggregating data. We're just going to do the job of enriching data. You need to give us data from wherever you have it.”