Using pysqldf with Pandas


Staff member
I was using the pandasql package to do some data manipulations with Pandas. My dataframe is enormous so I've been looking into ways to speed up my computations. One blog (found <a href="" rel="nofollow noreferrer">here</a>) claims that the pysqldf package is much faster. However, I can't get it to work with a Pandas dataframe.

Specifically, here's what I have:

from pysqldf import SQLDF
sqldf = SQLDF(globals()) #also tried with locals(), doesn't make a difference
#I know the sql is good, I've used it elsewhere (in R)
result = sqldf.execute("Select * from data where blah")

is a Pandas dataframe that I know exists, but I keep getting the error:

sqlite3.OperationalError: no such table: data

It appears from my web search, including the link above, that the usual operation of this package is with a sqlite db that has been stored in memory. However, the documentation (found <a href="" rel="nofollow noreferrer">here</a>) makes it seem like it can be used directly on a dataframe, like pandasql or R's sqldf.

The question: If the pysqldf package can really be used like this, how can I make it recognize my dataframe?