I was using the pandasql package to do some data manipulations with Pandas. My dataframe is enormous so I've been looking into ways to speed up my computations. One blog (found <a href="https://statcompute.wordpress.com/tag/pandasql/" rel="nofollow noreferrer">here</a>) claims that the pysqldf package is much faster. However, I can't get it to work with a Pandas dataframe.
Specifically, here's what I have:
Here,
is a Pandas dataframe that I know exists, but I keep getting the error:
It appears from my web search, including the link above, that the usual operation of this package is with a sqlite db that has been stored in memory. However, the documentation (found <a href="https://pypi.python.org/pypi/pysqldf" rel="nofollow noreferrer">here</a>) makes it seem like it can be used directly on a dataframe, like pandasql or R's sqldf.
The question: If the pysqldf package can really be used like this, how can I make it recognize my dataframe?
Specifically, here's what I have:
Code:
from pysqldf import SQLDF
sqldf = SQLDF(globals()) #also tried with locals(), doesn't make a difference
#I know the sql is good, I've used it elsewhere (in R)
result = sqldf.execute("Select * from data where blah")
Here,
Code:
data
Code:
sqlite3.OperationalError: no such table: data
It appears from my web search, including the link above, that the usual operation of this package is with a sqlite db that has been stored in memory. However, the documentation (found <a href="https://pypi.python.org/pypi/pysqldf" rel="nofollow noreferrer">here</a>) makes it seem like it can be used directly on a dataframe, like pandasql or R's sqldf.
The question: If the pysqldf package can really be used like this, how can I make it recognize my dataframe?