Removing Duplicates of Entire Rows

Question

0.00/5 (No votes)

See more:

Hi guys,

I have a thousands rows with 106 columns. The first column (chromosome and location) just contains a chromosome and location but can be duplicated whereas the rest of the columns range from 1-105 in which it correspond to the sample number. If the sample has a certain chromosome and location then, I want to add the number one to that cell so that at the end I will calculate the sum of each sample that has one in it. The problem I am having tough time to program in Python is how can I write this to a file if the same key appear more than once of different sample. How can I add the number one to that cell so I can get the sum later on.

Thanks a lot in advance,

The code I have so far is found below:

Python

 with open(os.path.join(file_out+".txt"),'w') as outpt:

 dic = defaultdict(list)
 dic[chro_pos].append(sample_num)
  outpt.write("chrom_pos"+"\t"+"\t".join(samp_num)+ "\t"+"\n")
  for k ,val in dic.iteritems():      # k is the chromosome:location. val is the sample number 1 out 105
    for  v in val:     
        outpt_TSS.write(int(k)*("\t")+ str(1)+'\n')   # This will have duplicates chrome_pos and I don't want that, I want one chrome_pos with number ones corresponding to multiple samples.

Posted 11-Jan-16 15:31pm

Member 12258122

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

DeDenker · Accepted Answer · 2016-01-13T00:03:00

Solution 1

write val to a new array and with next, verify if already exist in that list then skip.

Posted 13-Jan-16 0:03am

DeDenker