Using pysqldf with Pandas

admin

Administrator
Staff member
I was using the pandasql package to do some data manipulations with Pandas. My dataframe is enormous so I've been looking into ways to speed up my computations. One blog (found <a href="https://statcompute.wordpress.com/tag/pandasql/" rel="nofollow noreferrer">here</a>) claims that the pysqldf package is much faster. However, I can't get it to work with a Pandas dataframe.

Specifically, here's what I have:

Code:
from pysqldf import SQLDF
sqldf = SQLDF(globals()) #also tried with locals(), doesn't make a difference
#I know the sql is good, I've used it elsewhere (in R)
result = sqldf.execute("Select * from data where blah")

Here,
Code:
data
is a Pandas dataframe that I know exists, but I keep getting the error:

Code:
sqlite3.OperationalError: no such table: data

It appears from my web search, including the link above, that the usual operation of this package is with a sqlite db that has been stored in memory. However, the documentation (found <a href="https://pypi.python.org/pypi/pysqldf" rel="nofollow noreferrer">here</a>) makes it seem like it can be used directly on a dataframe, like pandasql or R's sqldf.

The question: If the pysqldf package can really be used like this, how can I make it recognize my dataframe?