Click here to Skip to main content
15,886,032 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a Large CSV file with over 100,000 rows. I have a long string I want to insert in front of the first value at the start of each row.

The string to be added to the first value in each row is the same. The first value on each row is different. e.g. String to insert is "blah...blah " and the value for row 1 is "AAA" and the value for row 2 id "BBB". The new value for row 1 should be "blah...blah AAA" and the new value for row 1 should be "blah...blah BBB"

Is there a text editor that will allow me to accomplish this?

Any help will be appreciated.

What I have tried:

Searched the internet, Microsoft, etc.
Posted
Updated 11-Jan-22 3:05am

You could do it with Notepad++ : Notepad++ add to every line - Stack Overflow[^]
You might be able to use PowerShell's replace method to do the job, too. There's an article here[^] on that. If you can't make replace add text to the start of the line, then perhaps insert with a StartIndex of 0 might work: insert - PowerShell - SS64.com[^]
 
Share this answer
 
Comments
Maciej Los 11-Jan-22 10:40am    
5ed!
Solution 1 gets my vote - I'm a fan of Notepad++ but if you do decide to go down the CSV suggestion in Solution 2 here is a far quicker alternative and this also fills in some of the gaps from Solution 2

* Insert a new column A either by right-click on the A, select Insert from the context menu or keyboard shortcut Ctrl and Plus symbol
* Type formula = CONCAT("Blah blah ", B1) in what is now cell A1 and either double-click the (rather tiny) square in the bottom of cell A1, or if you want to get cramp, drag it down your 100,000 rows.
* Highlight the whole of column A, right-click Copy (or keyboard Ctrl and C), then Right-Click Paste, Special, Values. This is only necessary to stop you losing the data in the next step
* Delete Column B (highlight column and right-click Delete or keyboard Ctrl and Minus)
* Save
No need for an extra row nor an extra sheet

Things to be cautious of
- If you have any formatting on things like dates with preceding zeroes or ID strings with preceding zeroes then those zeroes will be lost (and it is a real PITA to get them back/not lose them in the first place)
- watch out for spurious extra lines being added at the end (or columns to the right)
- I have sometimes ended up with UTF-8-BOM encoding if I use Save As and don't choose the "right" CSV option

If you are going to utilise Excel then I would suggest using Power Query within Excel to read the existing CSV, retain any formatting and to add the new computed column.
E.g. the steps in Power Query would be
1. Source
2. Promoted Headers
3. Change Type <- delete this, this is what could lose you the formatting on preceding zeroes
4. Check any types you might really want to change for yourself
5. Add your custom column (note use "Blah blah " & B1 rather than "Blah blah" & " " & B1
6. Put the custom column at the front
7. Remove the original Column A

Here's the M-code I used
let
    Source = Csv.Document(File.Contents("C:\Dev\Demo.csv"),[Delimiter=",", Columns=5, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Added Custom" = Table.AddColumn(#"Promoted Headers", "Custom", each "Blah blah " & [#" Column2"]),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"Column1", "Custom", " Column2", " Column3", "date", ""}),
    #"Removed Columns" = Table.RemoveColumns(#"Reordered Columns",{"Column1"})
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Custom", "Column1"}})
in
    #"Renamed Columns"
And the before and after CSVs
Column1, Column2, Column3,date,id
a,a,A,01/01/2022,00001
b,b,B,02/02/2022,00002
c,c,C,03/03/2022,00003
Column1, Column2, Column3,date,id
Blah blah a,a,A,01/01/2022,00001
Blah blah b,b,B,02/02/2022,00002
Blah blah c,c,C,03/03/2022,00003
 
Share this answer
 
Comments
_Asif_ 11-Jan-22 9:24am    
Thanks for extending Solution 2 :) However, solution 2 should not take more than 2 minutes if one is fluent in Excel
CHill60 11-Jan-22 12:52pm    
You are presupposing the OP is fluent in Excel - not necessarily a pre-requisite for a coder. However, it is not just about "speed" - it's about redundant steps - the more steps there are in a process the more chances there are that it can go wrong. Perhaps I should have used the phrase "Far more efficient". For example
- why put a formula in cell B3 and not B2 (I guess you are assuming that there is a header line)
- why is that formula referring to the previous row
- why use & " " & A2 when you can just add a space to the cell.
- Why use that extra row at all when you could have just used = "blah blah " & A2
- Why copy to the entire column when you only need to go to row 100,000
- Why paste into a new sheet when you can just paste over the existing one
Maciej Los 11-Jan-22 10:40am    
5ed!
_Asif_ 12-Jan-22 3:19am    
My 2 Cents,
- why put a formula in cell B3 and not B2 (I guess you are assuming that there is a header line)
[Asif] You are absolutely right. CSV usually contains headers
- why is that formula referring to the previous row
[Asif] OP's "Blah Blah" is just an example, in real-world this constant text would definitely have meaning. Instead of hardcoding this text on each row, why not dedicate a row/column for this hard-coded text and reference it in all subsequent calls. Hence $A$1 is used. Secondly, if OP wants to change this constant text then he needs to change only once, the rest automatically gets updated.
- why use & " " & A2 when you can just add a space to the cell.
[Asif] Please refer to 2nd Point for The underlying reason
- Why use that extra row at all when you could have just used = "blah blah " & A2
[Asif] The reason for that extra row has been explained in the 2nd Point. Hard Coding is bad, bad, and bad
- Why copy to the entire column when you only need to go to row 100,000
[Asif] I meant the same, the entire column means from B3 till B100000
- Why paste into a new sheet when you can just paste over the existing one
[Asif] It's always good to have a rollback strategy. The second Sheet idea is to preserve the original data so that we can go back to the original data if something goes wrong.
CHill60 12-Jan-22 3:47am    
- CSV usually contains headers - not in my world. State your assumptions to be clear
- I wasn't referring to the "Blah Blah" - the formula in cell B3 refers to Cell A2 i.e. the previous row. I.e. a bug.
- Hard coding is bad, I agree. But this is not "hard-coding" - it's a very temporary excel formula
- Rollback strategy is - don't save the file.
So to be clear, I didn't "extend" your solution I offered an alternative and also pointed out a few flaws that the OP needs to be aware of
You can easily do this in Microsoft Excel. Excel can open .CSV files. All you need to do is

* Insert a new Column After A Column
* Insert a new Row Before Row 1
* Copy the Blah Blah text @A1
* Put Formula on B3 = $A$1 & " " & A2
* Copy B3 to entire B column
* Copy entire sheet and paste it as value in a new sheet
* Remove A Column and Row 1

and you have an updated sheet ready :)
 
Share this answer
 
Comments
Maciej Los 11-Jan-22 10:40am    
5ed!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900