Click here to Skip to main content
15,881,139 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi there I have the Following Code, which I run in Jupyter Notebook :-

<pre>import pandas as pd
import requests
from bs4 import BeautifulSoup
   
#res = requests.get("http://web.archive.org/web/20011108193342/http://www.raf.mod.uk/bbmf/calendar.html")     
res = requests.get("http://web.archive.org/web/20041020000138/http://www.raf.mod.uk/bbmf/displaydates.html")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table', align="CENTER")[0]
df = pd.read_html(str(table))
   
df = df[0]
  
  
##################
##################
##################
  
  
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
 
 
#make df[0] to list
list=[]
for i in df[0]:
    list.append(i)
  
#reverse the list to make split to sublist easier
list.reverse()
  
#split list to sublist using condition len(val)> 2 
size = len(list) 
idx_list = [idx + 1 for idx, val in
            enumerate(list) if len(val) > 2] 
res = [list[i: j] for i, j in
        zip([0] + idx_list, idx_list + 
        ([size] if idx_list[-1] != size else []))] 
  
#make monthname to numbers and print
for i in res:
    for j in range(len(i)):
        if i[j].upper()=='JUNE':
            i[j]='6'
        elif i[j].upper() =='MAY':
            i[j]='5'
        elif i[j].upper() == 'APRIL':
            i[j]='4'
        elif i[j].upper() =='JANUARY':
            i[j]='1'
        elif i[j].upper() == 'FEBRUARY':
            i[j]='2'
        elif i[j].upper() =='MARCH':
            i[j]='3'
        elif i[j].upper() == 'JULY':
            i[j]='7'        
        elif i[j].upper() =='AUGUST':
            i[j]='8'
        elif i[j].upper() == 'SEPTEMBER':
            i[j]='9'
        elif i[j].upper() =='OCTOBER':
            i[j]='10'
        elif i[j].upper() == 'NOVEMBER':
            i[j]='11'
        elif i[j].upper() =='DECEMBER':
            i[j]='12'       
  
  
#append string and append to new list
finallist=[]
for i in res:
    for j in range(len(i)):
        if j < len(i) - 1:
            #print(f'2004-{i[-1]}-{i[j]}')
            finallist.append(f'2004-{i[-1]}-{i[j]}')
#print(finallist)
finallist.reverse()
  
  
  
  
#print("\n=== ORIGINAL DF ===\n")
#print(df)
  
  
  
#convert dataframe to list
listtemp1=df.values.tolist()
  
#replace found below values with 0000_removable
removelist=['LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA','DATE','JUNE','JANUARY','FEBRUARY','MARCH','MAY','JULY','AUGUST','SEPTEMBER','OCTOBER','NOVEMBER','DECEMBER','APRIL']
for i in listtemp1:
    for j in range(len(i)):
        for place in removelist:
            if str(i[j]).upper()==place:
                i[j]='0000_removable'
            else:
                pass
  
                  
#remove sublists with the replaced values we redirected
dellist=['0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable']
res = [i for i in listtemp1 if i != dellist]
  
#assign back to dataframe DF3
df3=pd.DataFrame()
df3=pd.DataFrame(res, columns=['Date','LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA'])
#print("\n=== AFTER REMOVE month and column names from DF, assigned to new as DF3 ===\n")
#print(df3)
  
  
#now assign that sorted date list to dataframe DF3
idx = 0
df3.insert(loc=idx, column='DATE', value=finallist)
pd.options.display.max_rows = 500
  
#print("\n=== FINAL DF3 after joining the edited date format column list ===\n")
#print(df3)
  
  
#validation logic if needed compare processed date from new joined "edited_Date_format" column with already existing "Date" column
#df3['ED1']=  pd.to_datetime(df3['EDITED_DATE_FORMAT'],format='%Y-%m-%d').dt.day
#df3['validation of date'] = df3.apply(lambda x: str(x['ED1']) == x['Date'], axis=1)
  
  
#convert df3['EDITED_DATE_FORMAT'] column from object to datetime64 foramt
#df3['EDITED_DATE_FORMAT']= pd.to_datetime(df3['EDITED_DATE_FORMAT'],format='%Y-%m-%d')
  
  
  
##################
##################
##################
  
  
 
 
  
 
#df3 = df3.rename(columns=df.iloc[0])
#df3 = df.iloc[2:]
#df3.head(15)
pd.options.display.max_rows = 1000
pd.options.display.max_columns = 1000

df3['LANCASTER'] = df3['LANCASTER'].replace({'X': 'L'})
df3['HURRICANE'] = df3['HURRICANE'].replace({'X': 'H'})
df3['SPITFIRE'] = df3['SPITFIRE'].replace({'X': 'S'})
df3['SPITFIRE'] = df3['SPITFIRE'].replace({'X x 2': 'SS'})
df3['DAKOTA'] = df3['DAKOTA'].replace({'X': 'D'})


#display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DAKOTA'].str.contains('X')) & (df3['SPITFIRE'].str.contains('X', na=True)) & (df3['LANCASTER'] != 'X')]    
#display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DAKOTA'].str.contains('D')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['LANCASTER'] != 'L')]    
    
display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DATE'].str.contains('-10$|15$')) & (df3['DAKOTA'].str.contains('D')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['LANCASTER'] != 'L')]    
    
#Months = May Jun Jul Aug Sep
#Months = -5- -6- -7- -8- -9-

#print(display) 

display['DATE']= pd.to_datetime(display['DATE'],format='%Y-%m-%d')
display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%m-%d-%Y')
##added two lines above to convert date format

display=display.rename_axis('MyIdx')
display=display.sort_values(by=['DATE','MyIdx'])
display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')
#display.drop_duplicates(subset=['LOCATION', 'DATE'], keep='last', inplace=True)

#display.drop('LANCASTER', axis=1, inplace=True)
#display.drop('Date', axis=1, inplace=True)
display=display.dropna(subset=['SPITFIRE', 'HURRICANE'], how='all')
display=display[['LOCATION','DATE','DAKOTA','HURRICANE','SPITFIRE']]
display=display.fillna('--')
#display.reset_index(drop=True, inplace=True)
display.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF-2004-Code (Dakota With Fighters).csv')


display

display.sort_values(by=['DATE'])

#print(display)


And when I include that last line of Code, it correctly outputs the earliest days in the DataFrame Output first, i.e. 10 before 15, but not in the month order I want :-


LOCATION DATE DAKOTA HURRICANE SPITFIRE
MyIdx
176 Duxford 10-Jul-2004 D H S
177 Cirencester 10-Jul-2004 D H S
178 Brize Norton 10-Jul-2004 D H S
74 Shrivenham 20:00 10-Jun-2004 D H S
257 Campbletown 15-Aug-2004 D -- S
258 Sunderland 15-Aug-2004 D -- S
261 Scampton 15-Aug-2004 D -- S
200 RIAT Fairford 15-Jul-2004 D -- SS
22 Tilford 15-May-2004 D -- S
23 Abingdon 15-May-2004 D -- S
24 Hyde Heath Village 15-May-2004 D -- S

I want 10th June 2004 first then the 10th of July/s then the 15th of May's then the 15th of July Rows, then the 15th August Rows. How do I modify that line of Code, so that I can filter to get that order, without changing the index position of the Rows via code, which I know how to do ?

I mean add something to the first line of code e.g. the displays.sort_values Line, so that the Earlier month with a day, is shown 'favoured' before the later month with the same day ? i.e. 10-Jun-2004 is shown before 10-Jul-2004 , 15-May-2004 is shown before 15-Jul-2004 Rows etc. But still dates with day 10, showing before day 15 Rows.

(df3['DATE'].str.contains('-10$|15$'))


Was the part of the Code, I use to filter the days of the month, I want included
in the DataFrame Output.

Any help would be much appreciated.

Eddie Winch

What I have tried:

I run the Code as shown above, and get the stated Output.
Posted
Updated 21-Aug-20 7:14am
v2

1 solution

Parse each line, and convert the date information to a datetime — Basic date and time types — Python 3.8.5 documentation[^]
Those should sort "properly" where strings sort by finding the first different character pair and basing the whole comparison on the difference between those, ignoring the rest of the string completely.
 
Share this answer
 
v2
Comments
Eddie Winch 20-Aug-20 9:59am    
Hi OriginalGriff,

Sorry could you explain how to do that ? I am not sure where to modify the Code exactly, many thanks for your help.

Regards

Eddie Winch
OriginalGriff 20-Aug-20 10:50am    
Which bit is giving you problems?
Eddie Winch 20-Aug-20 10:56am    
I think I just need to modify the Line of Code :- display.sort_values(by=['DATE'])

and or the Line of Code :- (df3['DATE'].str.contains('-10$|15$')) to get the answer I want, could you suggest what I should add or change in those lines of Code, to get the DataFrame Output I want ?

Best Regards

Eddie
OriginalGriff 20-Aug-20 11:22am    
Read what I said again.
Eddie Winch 20-Aug-20 11:32am    
I am not sure what parsing each line means, I should be able to convert the date information to datetime though.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900