Recherche vectorisée des valeurs dans Pandas dataframe
J'ai deux pandas dataframes un appelé 'orders' et un autre appelé 'daily_prices'. daily_prices est comme suit:
AAPL GOOG IBM XOM
2011-01-10 339.44 614.21 142.78 71.57
2011-01-13 342.64 616.69 143.92 73.08
2011-01-26 340.82 616.50 155.74 75.89
2011-02-02 341.29 612.00 157.93 79.46
2011-02-10 351.42 616.44 159.32 79.68
2011-03-03 356.40 609.56 158.73 82.19
2011-05-03 345.14 533.89 167.84 82.00
2011-06-03 340.42 523.08 160.97 78.19
2011-06-10 323.03 509.51 159.14 76.84
2011-08-01 393.26 606.77 176.28 76.67
2011-12-20 392.46 630.37 184.14 79.97
Les Ordres sont les suivants:
direction size ticker prices
2011-01-10 Buy 1500 AAPL 339.44
2011-01-13 Sell 1500 AAPL 342.64
2011-01-13 Buy 4000 IBM 143.92
2011-01-26 Buy 1000 GOOG 616.50
2011-02-02 Sell 4000 XOM 79.46
2011-02-10 Buy 4000 XOM 79.68
2011-03-03 Sell 1000 GOOG 609.56
2011-03-03 Sell 2200 IBM 158.73
2011-06-03 Sell 3300 IBM 160.97
2011-05-03 Buy 1500 IBM 167.84
2011-06-10 Buy 1200 AAPL 323.03
2011-08-01 Buy 55 GOOG 606.77
2011-08-01 Sell 55 GOOG 606.77
2011-12-20 Sell 1200 AAPL 392.46
L'Index des deux dataframes est datetime.date. la colonne' prices 'dans le dataframe' orders ' a été ajoutée en utilisant une compréhension de liste pour parcourir toutes les commandes et rechercher le ticker spécifique pour la date spécifique dans le cadre de données 'daily_prices', puis en ajoutant cette liste en tant que colonne au dataframe' orders'. Je voudrais comme pour le faire en utilisant une opération de tableau plutôt que quelque chose qui boucle. peut-il être fait? j'ai essayé d'utiliser:
Daily_prices.ix[les dates,les tickers]
Mais cela renvoie une matrice de produit cartésien des deux listes. je veux le retour d'un vecteur colonne de seulement le prix d'un ticker pour une date spécifiée.
2 réponses
Utilisez notre ami lookup
, conçu précisément à cet effet:
In [17]: prices
Out[17]:
AAPL GOOG IBM XOM
2011-01-10 339.44 614.21 142.78 71.57
2011-01-13 342.64 616.69 143.92 73.08
2011-01-26 340.82 616.50 155.74 75.89
2011-02-02 341.29 612.00 157.93 79.46
2011-02-10 351.42 616.44 159.32 79.68
2011-03-03 356.40 609.56 158.73 82.19
2011-05-03 345.14 533.89 167.84 82.00
2011-06-03 340.42 523.08 160.97 78.19
2011-06-10 323.03 509.51 159.14 76.84
2011-08-01 393.26 606.77 176.28 76.67
2011-12-20 392.46 630.37 184.14 79.97
In [18]: orders
Out[18]:
Date direction size ticker prices
0 2011-01-10 00:00:00 Buy 1500 AAPL 339.44
1 2011-01-13 00:00:00 Sell 1500 AAPL 342.64
2 2011-01-13 00:00:00 Buy 4000 IBM 143.92
3 2011-01-26 00:00:00 Buy 1000 GOOG 616.50
4 2011-02-02 00:00:00 Sell 4000 XOM 79.46
5 2011-02-10 00:00:00 Buy 4000 XOM 79.68
6 2011-03-03 00:00:00 Sell 1000 GOOG 609.56
7 2011-03-03 00:00:00 Sell 2200 IBM 158.73
8 2011-06-03 00:00:00 Sell 3300 IBM 160.97
9 2011-05-03 00:00:00 Buy 1500 IBM 167.84
10 2011-06-10 00:00:00 Buy 1200 AAPL 323.03
11 2011-08-01 00:00:00 Buy 55 GOOG 606.77
12 2011-08-01 00:00:00 Sell 55 GOOG 606.77
13 2011-12-20 00:00:00 Sell 1200 AAPL 392.46
In [19]: prices.lookup(orders.Date, orders.ticker)
Out[19]:
array([ 339.44, 342.64, 143.92, 616.5 , 79.46, 79.68, 609.56,
158.73, 160.97, 167.84, 323.03, 606.77, 606.77, 392.46])
Ci-dessous pour la commodité de chacun de reproduire les résultats.
columns=['AAPL','GOOG','IBM','XOM']
index = ['2011-01-10','2011-01-13','2011-01-26','2011-02-02','2011-02-10','2011-03-03','2011-05-03','2011-06-03','2011-06-10','2011-08-01','2011-12-20']
prices = pd.DataFrame(columns=columns, index=index)
prices.iloc[0]=[339.44,614.21,142.78,71.57]
prices.iloc[1]=[342.64,616.69,143.92,73.08]
prices.iloc[2]=[340.82,616.50,155.74,75.89]
prices.iloc[3]=[341.29,612.00,157.93,79.46]
prices.iloc[4]=[351.42,616.44,159.32,79.68]
prices.iloc[5]=[356.40,609.56,158.73,82.19]
prices.iloc[6]=[345.14,533.89,167.84,82.00]
prices.iloc[7]=[340.42,523.08,160.97,78.19]
prices.iloc[8]=[323.03,509.51,159.14,76.84]
prices.iloc[9]=[393.26,606.77,176.28,76.67]
prices.iloc[10]=[392.46,630.37,184.14,79.97]
columns=['Date','direction','size','ticker','prices']
orders = pd.DataFrame(columns=columns)
orders.loc[0] = ['2011-01-10','Buy',1500,'AAPL',339.44]
orders.loc[1] = ['2011-01-13','Sell',1500,'AAPL',342.64]
orders.loc[2] = ['2011-01-13','Buy',4000,'IBM',143.92]
orders.loc[3] = ['2011-01-26','Buy',1000,'GOOG',616.50]
orders.loc[4] = ['2011-02-02','Sell',4000,'XOM',79.46]
orders.loc[5] = ['2011-02-10','Buy',4000,'XOM',79.68]
orders.loc[6] = ['2011-03-03','Sell',1000,'GOOG',609.56]
orders.loc[7] = ['2011-03-03','Sell',2200,'IBM',158.73]
orders.loc[8] = ['2011-06-03','Sell',3300,'IBM',160.97]
orders.loc[9] = ['2011-05-03','Buy',1500,'IBM',167.84]
orders.loc[10] = ['2011-06-10','Buy',1200,'AAPL',323.03]
orders.loc[11] = ['2011-08-01','Buy',55,'GOOG',606.77]
orders.loc[12] = ['2011-08-01','Sell',55,'GOOG',606.77]
orders.loc[13] = ['2011-12-20','Sell',1200,'AAPL',392.46]
lookupValues = prices.lookup(orders.Date, orders.ticker)
Puis résultat:
>>> prices
AAPL GOOG IBM XOM
2011-01-10 339.44 614.21 142.78 71.57
2011-01-13 342.64 616.69 143.92 73.08
2011-01-26 340.82 616.5 155.74 75.89
2011-02-02 341.29 612 157.93 79.46
2011-02-10 351.42 616.44 159.32 79.68
2011-03-03 356.4 609.56 158.73 82.19
2011-05-03 345.14 533.89 167.84 82
2011-06-03 340.42 523.08 160.97 78.19
2011-06-10 323.03 509.51 159.14 76.84
2011-08-01 393.26 606.77 176.28 76.67
2011-12-20 392.46 630.37 184.14 79.97
>>> orders
Date direction size ticker prices
0 2011-01-10 Buy 1500 AAPL 339.44
1 2011-01-13 Sell 1500 AAPL 342.64
2 2011-01-13 Buy 4000 IBM 143.92
3 2011-01-26 Buy 1000 GOOG 616.50
4 2011-02-02 Sell 4000 XOM 79.46
5 2011-02-10 Buy 4000 XOM 79.68
6 2011-03-03 Sell 1000 GOOG 609.56
7 2011-03-03 Sell 2200 IBM 158.73
8 2011-06-03 Sell 3300 IBM 160.97
9 2011-05-03 Buy 1500 IBM 167.84
10 2011-06-10 Buy 1200 AAPL 323.03
11 2011-08-01 Buy 55 GOOG 606.77
12 2011-08-01 Sell 55 GOOG 606.77
13 2011-12-20 Sell 1200 AAPL 392.46
>>> lookupValues
array([339.44, 342.64, 143.92, 616.5 , 79.46, 79.68, 609.56, 158.73,
160.97, 167.84, 323.03, 606.77, 606.77, 392.46])
>>>