Click here to Skip to main content
15,867,568 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi I have a Python Code, that I run in Jupyter Notebook, manipulating Data from an Excel File .XLS Document File, which isn't giving the desired output yet.

I would like the Correct Date to be shown, in the Date Column, in
the format Day Month Year, i.e. 13-4-2009. The Dates that do show, are in the format Year Month Day, but the year is either showing as 2008 or 1999, I would like all the Years to be 2009, as that is what the Timetable Year is.

And what should be typed in the Code, to get the Data, to display in the order it was in, in the original Excel File.

I say that because most if not all the Flypast appearances, don't show the date. So when the venues are shown, in alphabetical order, it makes it harder to know if the Flypast, was associated with a relevant display, i.e. when the aircraft were transiting to the display etc. If the Data is in the Original, non alphabetical order for the Venues, the relevant Flypasts should be underneath, I think, I may be wrong though.

In the Output, when I run the Python Code, I now have an additional Column, called Dakota and Fighters In The Same Display Slot. Because of the way, the data is collated, in the Source .xls File. Only the Venues, where the Aircraft are all in the same Display slot, Show here, I think because, when DAK and HS become DHS, as you will understand from the Code, and can see in the Original .XLS document when those values in the Duration Column i.e. Durn don't match i.e. 0:05 and 0:10, i.e. 5 and 10 minutes they don't show in the Output.

Only when both values, are the same do they show. I.e. when DHS are all in the same display slot, although for some reason for the Venue Woodstock, where both values are 0:15 The Row isn't showing, any ideas, why that is the case ?
The Woodstock Row, wasn't showing before either, can't work out why.

So all those Rows Showing, but in the, "In The Same Display Slot" Column I would like all The Venue Rows, where they are in the same Display Slot saying YES and the ones that are not in the same display slot saying NO.

Also I have noticed, that in the Date Column, it says No Date in the DataFrame Output. This is when the Date isn't present, in the 'Date' Column, in Rows Of Data in the .xls File. I.e. the Layout, of the Data in the .xls File being the issue. I have all the YES's bar the Venue 'WOODSTOCK' showing in the Dakota And Fighters In the Same Display Slot Column, and now would all the 'NO' Rows to show.

Here is a Link to a Picture, of a sample of the .xls File Data. Is there a way, to have the Date populate, the empty Columns in the Rows with Data in, up to the right point in the Dataframe ?

Here is the link, to a Picture of what I mean :-

https://www.flickr.com/photos/153177973@N08/49330715541/in/dateposted-public/

Kembles's DHS appearance on the 20th June, won't show a date in the DataFrame, as that Data is missing from that Row. The Date is higher up, but they have not included the Date for every Row, even though the relevant Data for the Date, is lower down. How could this be done for all Rows, giving the Correct Date 


Here is the Current Python Code, I run In Jupyter Notebook :-

Python
<pre>import pandas as pd
        
xls = pd.ExcelFile(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF Display Schedule 2009.xls')   
data = pd.read_excel(xls, sheet_name="Sheet1")      
pd.options.display.max_rows = 1000   
pd.options.display.max_columns = 1000
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID','Durn'])
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] 
df["Date"].fillna("No Date", inplace = True)   
df['A/C'].unique().tolist()  
rename_map = {
    'DAK': 'D',
    'SPIT': 'S',
    'LANC': 'L',
    'HURRI': 'H',
    'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
print("=======")  
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
#selected = df.loc[df['DISPLAY/'] == 'FLYPAST']
#selected = df.loc[df['DISPLAY/'].isin(['DISPLAY', 'FLYPAST'])]
#selected = df.loc[df['A/C'].isin(['DS', 'DH', 'DHS', 'SD', 'HSD'])]
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/', 'Durn']) 
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
def sorted_string(s):
    return ''.join(sorted(s))   
def remove_duplicate_chars(s):
    return ''.join(set(s))
   
   
#####################
#####################
#####################
#####################
   
   
df=pd.DataFrame(aircraft)
#print(df) #original df
     
    
indexNames = df[~df['Aircraft-combined'].str.contains('D',na=False)].index
df.drop(indexNames , inplace=True)
#print(df)#prints all rows having D and removes remaining rows
   
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace='SD', value='DS')
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace=['HSD','HDS', 'SHSD'], value='DHS')
#print(df)#replaces the SD with DS and HSD,HDS and SHSD with DHS
     
  
    
df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove value that match letter D
df = df.drop(df[df['Aircraft-combined'] == 'DD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'DL'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LDS'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LSSD'].index)
#df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column
  
  
  
#column names before alignment
#print("column names: before change  ", list(df.columns.values.tolist()))
#print("=======")
#print(df)
#print("=======")  
   
  
#to fix column name alignment 
df.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv')
df = pd.read_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv',names=['Venue','BID','Date','DISPLAY/','Durn','Aircraft-combined'], header=0 )
  
  
#column names before alignment
#print("column names:  after change ",list(df.columns.values.tolist()))
#print("=======")
#print(df)
#print("=======")
  
#convert df['Durn'] column to datetime foramt
df['Durn']= pd.to_datetime(df['Durn'],format='%H:%M:%S').apply(pd.Timestamp)
#print("changing datatype of column Durn----->",df['Durn'].dtype)
#print("=======")
#compare minutes with extracting minutes from datetime64 column 
comparelist=[5,10,15,20]
df['Dakota And Fighters In The Same Display Slot'] = ((df['DISPLAY/'] == 'DISPLAY') & (df['Durn'].dt.minute.isin(comparelist))).map({True:'YES', False:'NO'})
#print(df)
#print("applied... the condition to compare minutes for Durn column")
#print("=======")
  
  
#return df['Durn'] column datetime64 format to object type
df['Durn'] = pd.to_datetime(df['Durn'], format='%H:%M:%S').dt.time
#print("reverted datatype of column Durn back to ----->",df['Durn'].dtype)
#print("=======")
  
#print("\n\n*** FINAL RESULTSET ***\n\n")
  
pd.DataFrame(df)
 
#print(df)    #final resultset
#print("======="


And here is the Link to The Excel .XLS File in Question :-

http://web.archive.org/web/20090804234934/http://www.raf.mod.uk/bbmf/rafcms/mediafiles/F0ED6EA8_1143_EC82_2E4534A1036AA506.xls

Any help would be appreciated

Regards

Eddie Winch

What I have tried:

Update To Code, With Question :-

I have added the following to my Code :-

df["Date"].fillna(method='ffill', inplace = True)


Inplace of :-

df["Date"].fillna("No Date", inplace = True)


So that the 'No Dates', showing in the 'Date' Column for the Rows, now have been filled with the correct Date, bar the Year part, i.e. 2008, which should say 2009, I am not sure how to rectify that ?

In the following Part of my Code :-

<pre>#convert df['Durn'] column to datetime format
df['Durn']= pd.to_datetime(df['Durn'],format='%H:%M:%S').apply(pd.Timestamp)
#print("changing datatype of column Durn----->",df['Durn'].dtype)
#print("=======")

#compare minutes with extracting minutes from datetime64 column 
comparelist=[5,10,15,20,]
df['Dakota And Fighters In The Same Display Slot'] = ((df['DISPLAY/'] == 'DISPLAY') & (df['Durn'].dt.minute.isin(comparelist))).map({True:'YES', False:'NO'})

#print(df)
#print("applied... the condition to compare minutes for Durn column")
#print("=======")
 
#return df['Durn'] column datetime64 format to object type
df['Durn'] = pd.to_datetime(df['Durn'], format='%H:%M:%S').dt.time
#print("reverted datatype of column Durn back to ----->",df['Durn'].dtype)
#print("=======")


What will I need to change, so that the Rows not showing, due to the non matching values in the 'Durn' Column, do show and have NO showing in the 'Dakota And Fighters In The Same Display Slot' Column, for those Rows ? For the matching values, in comparelist=[5,10,15,20,] if they match they occupy both rows, the same Value, so 5+5 = 10, 10+10 = 20, 15+15 = 30, 20+20 = 40.

Whereas for those that don't match, i.e. 5 in one row and 10 in the other, 5+10 = 15. How would I write Code, to achieve what I am getting at ? Any help would be appreciated

Update 2 :-

I have added the following two lines to my Code :-

df['Date']= pd.to_datetime(df['Date'],format='%Y-%m-%d')
df['Date']= pd.to_datetime(df['Date']).dt.strftime('%d-%m-%Y')


So now the Dates are in the Date Format I want, bar the Year showing 2008, which I want showing as 2009. Also I want the Rows, going descending, in relation to the 'Date' Column, earliest Date to the Latest Date, in the DataFrame Output. What would I need to type, to achieve this ?

Regards

Eddie Winch
Posted
Updated 27-Jan-20 9:30am
v19
Comments
Richard MacCutchan 26-Jan-20 12:41pm    
I cannot figure out in your code where you show the dates. Also, looking at the worksheet the years in the entries seem to be for 1999 or 2008 so if you want them all to be 2009 you need to ignore the year in the source data.
Eddie Winch 27-Jan-20 9:16am    
Hi Richard, The Date comes up, simply by showing the Date Column in the Output. Can you see the issue with Dates, if you run the Code in Jupyter Notebook, and look at the Excel Document ?

I would like to post the Output I get, when I run the Code, but don't know how to ? Also I have updated, the 'What I have tried' Section, in my Post.

Eddie
Richard MacCutchan 27-Jan-20 9:29am    
Sorry, but I don't understand. Where is the actual Python code that reads the date from the worksheet and tries to display it?
Eddie Winch 27-Jan-20 9:40am    
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID','Durn'])
Richard MacCutchan 27-Jan-20 9:44am    
Sorry but I have no idea what that code is.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900