How I Found Over 175,000 Downloads on Bepress for My Faculty

aretteen Posted on June 20, 2019 Posted in Posts

The impact maps provided by Bepress Author Dashboards as part of the institutional repository and SelectedWorks system are not completely capturing data associated with our faculty throughout the Digital Commons network. Our faculty have works stored in external (non-TAMU operated) Bepress networks, and any impact data associated with those works will not appear in the impact maps unless the metadata records for those works contain one of the faculty member’s e-mail addresses. As a result, faculty are reporting and relying on incomplete data when viewing their Dashboards.

While I was helping faculty understand the Bepress Author Dashboards, I noticed that some works in Digital Commons were not appearing in the Author Dashboard’s works list, even though the work had the author’s name in the metadata. Most of the works were accumulating downloads, but I could not find this data represented anywhere in the Dashboard.

Screenshot of one work by Professor Yu with 2,533 downloads associated with it.

Screenshot of Professor Yu’s Bepress Author Dashboard Works list that does not include any of the downloads associated with the object above.

Out of curiosity, I ran a name-based search on the Digital Commons repository network advanced search for every faculty member at my school, using “Author” as the field and entering the author’s first name and last name as a string and selecting “All repositories” so that I could search the whole network.

Screenshot showing the Bepress Digital Commons Advanced Search with settings to search all repositories by author name.

This search process produced some false positives in the search results, so I verified each object was associated with the author I was searching for by cross-referencing CVs before adding the object’s direct URL to a spreadsheet to help me keep track of my discoveries. Then I checked to make sure each object was showing up in the dashboard.

Pro tip – Bepress’s Author Dashboard’s work list can contain multiple works of the same name, and it is time consuming to verify where each work is coming from using the interface. I found cross-referencing the download count on the record with the download count in the works list to be the most efficient approach.

When ordered alphabetically, it is difficult to quickly identify which object is being captured in the dashboard. Instead, use the download count and compare to the object’s record.

Using this workflow, I was able to quickly identify many works as not being accounted for in the Author Dashboard. I created a Google Sheet to store a direct link to the object’s record page, and I recorded the number of downloads associated with the record. By doing this, I was able to add up how many downloads I was discovering, and in the end it was over 175,000 downloads associated with over 400 objects.

The spreadsheet tabulation gave me a good sense that this project was important enough to devote more time and energy to resolve. Upon consulting with my school’s customer support representative at Bepress, Aaron Doran, we were able to figure out that the root cause of the issue was that the dashboard system connected data from objects using the e-mail addresses entered into the metadata record.

Screenshot showing author e-mail metadata field used by Bepress Author Dashboards.

Each SelectedWorks Author Profile has a primary e-mail address associated with it, and authors can also input additional e-mail address in the profile by going into “Account Settings.”

In Account Settings, authors can input additional e-mail addresses to help capture objects with those e-mail addresses in the metadata record.

This is the key linking mechanism that the Author Dashboard uses to pull in data from across the Digital Commons network. Bepress was able to help capture a small amount of data for my school by adding known, additional e-mail addresses for faculty to profiles and merging some duplicate profiles that had been created over the years. However, for us, that barely scratched the surface of the missing data: by far, most of the records I discovered simply contained an author name and no e-mail address at all.

My database-oriented mind figured it would be easy enough to have Bepress insert the e-mail addresses of faculty into the right records on the back-end, but this technical issue turned out to be more complex to resolve than that. Since each repository is independently managed, Bepress was not comfortable with unilaterally altering the metadata records of objects and requested that I reach out to every repository manager individually to request that they make the changes themselves or give Bepress permission to do so. At this point, I tried my best to convince Bepress to figure out a better way for me and others to approach this problem, mainly because annual reporting deadlines were quickly approaching. The faculty members at my school who rely on the Author Dashboards for reporting purposes were unaware of this impact being generated, and I wanted to be able to help get everyone all the data associated with them in a world where impact data is becoming more and more important.

While Bepress was very helpful in working with me to resolve this issue, in the end, they would not budge on any centralized solution that would be relatively quick and painless compared to what I had to manage: a mix of Google Sheets, color-coding, and mass e-mailing repository managers with an explanation of the situation and my request for assistance. A cool take-away from this any Bepress repository manager can appreciate is that once the e-mail address is added to the metadata record, the data is instantly piped into the author dashboard; you do not need to wait for a queued update to process!

If anyone else ventures to resolve this issue using this workflow, you can take the initial Google Sheet with the direct object link and add “Faculty Member Name,” “Faculty E-mail,” and “Repository Manager E-mail” columns to it; after adding objects by author name, you can re-order the Sheet by URL to more easily identify and group institutions together. From there, we had to figure out the best repository contact for each institution, and then I created an e-mail template explaining the context of the issue and provided a list of objects with links in the e-mail to make the metadata editing process a bit easier for them.

Screenshot of what my Google Sheet for tracking the progress on this project looked like.

As can be expected by a process requiring action by dozens of people, several repository managers have not responded to my request. If Bepress were to implement a solution to this issue that could be managed centrally on the back-end, my faculty would be able to benefit from all of the impact data associated with them. However, I was pleasantly surprised to find most of the repositories responded quickly with either a grant of permission or with certification that the metadata had been updated by them; as of now, using this workflow, over 85% of the downloads have been added to Author Dashboards. Several people replied enthusiastically to my e-mail, which inspired me to document the processes and workflows so that others can recreate it and benefit. I imagine my school is not the only one that has faculty with content in Digital Commons networks that is not being properly piped into the Author Dashboards.

Until Bepress comes up with a different approach, I think repository managers uploading content should try their best to include an e-mail address within the metadata. At times, this may not be possible, and it will surely increase the time it takes to process the ingestion of objects, but it seems to me the loss of the impact data associated with these objects is a big enough concern to encourage repository managers to take the time to add e-mail addresses. Otherwise, any impact stemming from that object is most likely not going to be reflected in that author’s impact and growth narrative, particularly for faculty relying on the Author Dashboard to accurately represent that person’s impact data across the Digital Commons network.

If you’d like to indicate whether or not you’d be supportive of Bepress implementing a more efficient solution, or if you have any comments about this issue, please feel free to fill out this quick survey: https://tamu.qualtrics.com/jfe/form/SV_b8bmV4P7CR0ZMzz. You can also reach out to me if you have any questions about the workflow, or if you’d like to see some Google Sheet and e-mail templates to help you get started!

2 Comments on “How I Found Over 175,000 Downloads on Bepress for My Faculty”

Wilhelmina Randtke says:

June 26, 2019 at 9:12 am

The bigger problem here is using email address to disambiguate. Email address is unique to the person, and it’s a great way to distinguish and match records within a single institution. But, when a faculty changes institutions, the email address changes. Over time, the email address changes. Trying to disambiguate across digital libraries on a field that regularly turns over is a recipe for insanity and never ending work.

Using email is the problem. Doesn’t matter what info is stored in the email field.

BePress should probably be using ORCID to match across institutions. Probably ORCID, because it allows self registration, whereas something like VIAF does not. ORCID is more inclusive with just about no procedural hurdles. (And, btw, ORCID and VIAF and LCNAF all can talk to one another, so if you provide for one in your metadata, then you can match across the others.) And, yes, I know ORCID is not widely adopted in the U.S., and that when it takes off, it’s extremely likely to take off in science first and be tied to research funding requirements.

I’m not sure what happened with BePress and ORCID. It was on their 2017 roadmap https://www.bepress.com/tag/orcid/ . It was also in our plans here in the FALSC Florida consortium for state university and state college libraries back in 2017. What we have seen is that ORCID institutional accounts allowing the push pull API have been managed through the campus grants offices, not the campuses libraries, so for the digital library platforms, ORCID fizzled into just metadata. We have it in our metadata profiles and have posted instruction on how to store it consistently in records. Most community buzz was about push pull API which is likely off the table because at this time it appears to be the purview of the grants office not the library. So, we implemented ORCID, but not in a press release kind of a way. Maybe BePress is in a similar situation, where it’s integrated into metadata, but no push pull API, so there isn’t a press release.

And… here, in your statistics pulls, having the metadata – having a way to quickly disambiguate names – would be really useful.
Without any push pull API, ORCID would be so super useful.

I think it’s totally nuts to ask BePress to go in and “clean up” records across institutions. That’s the institution’s metadata; That’s not BePress’s metadata. Doing centralized metadata cleanup isn’t just technical. (Here, you downplay the technical, because you have a handful of faculty. You could manually do your faculty, and it’s within the realm of possibility to search each one. BePress has all the faculty, all over the world. The manual review and searching to “clean up” one faculty at a time is infinite when there are infinite faculty. I do centralized metadata quality control from time to time, just for Florida. Trust me, it’s a bitch.) The communication takes more time than the technical. Were BePress to do what you are proposing, and alter email addresses on faculty members, BePress would HAVE TO contact each institution before altering their metadata. If it’s the descriptive metadata and not a field used just by BePress software, then the institution owns it. When we do centralized clean up projects at the consortial level across digital libraries, we always send a spreadsheet of changes, clearly state what the changes are, and offer to meet up by phone or screensharing for Q&A. Then when we get approval, we make the change to the records which have been approved. For. Each. Institution. For. Each. Altered. Record. The communication takes longer than the analysis, or the tech to fix, and if it’s done wrong there can be forceful and long lasting blowback. If BePress changes descriptive metadata, that would cause freak out and paranoia and it would feel very Orwellian. The breach of trust would be a bigger issue than any kind of metadata quality issues. If BePress made any mistakes in the clean up, and introduced error, it would be a huge PR thing brought up at conference panels for year. If someone’s local metadata profile states that the correct author email is the affiliation at the time of publication, then BePress can’t alter that.

You have said, “they would not budge on any centralized solution that would be relatively quick and painless compared to what I had to manage: a mix of Google Sheets, color-coding, and mass e-mailing repository managers with an explanation of the situation and my request for assistance.” What you have described is pretty much exactly what you are asking BePress to do. It’s not less labor, just because it happens somewhere where you can’t see it. It’s not like little elves do it for BePress either.

Sorry to sound mean, but imagine if someone looked at your metadata, threw up their hands and went to town “fixing problems”. If would not feel good. Only the very disengaged are cool with something like that.

Anyway, even if you could magic up a world where all author emails in all BePress sites were updated to current institutional affiliation email addresses, you’d have to keep on maintaining, and maintaining, and cleaning records forever.

It’s best to put efforts into looking at alternative ways to disambiguate author names. Email address is not the way.

My personal feel is that the best solution (and also the least effort because much easier to maintain than emails) to this issue would be:
Get law faculty at your institution to register for ORCID and to include the ORCID on author affiliation info on articles going forward.
Lobby to BePress to match on alternative fields to email address. Here, we allow matching on email address, ORCID, VIAF, or local institutional identifier. (Technical info on how we are matching is here https://wiki.duraspace.org/display/ISLANDORA/Entities+Solution+Pack . This is in the Islandora platform, not the BePress platform.) Setting up matching at all was the biggest effort. Once it was set up, making it configurable was minimal extra work.
Attempt to organize all the law schools to include the ORCID in records. Attempt to organize all the law journals to encourage ORCID in institutional affiliation.
- aretteen says:
  
  June 27, 2019 at 11:35 am
  
  Hi Wilhelmina:
  
  Thank you for your comments. You’ve brought up an important aspect of the greater issue as it relates to best practices in author disambiguation and potential solutions forward. However, I’d like to offer clarifications on your perspective.
  
  First, the main purpose of my blog post was to inform repository managers how the system currently works, point out the flaw that is linking by e-mail addresses in the metadata, and offer anyone interested in capturing available downloads a workflow that would accomplish that goal in the near future without having to wait for action from Bepress or for a coalition to form around convincing Bepress to implement an entirely separate technical solution. My blog post is not advocating for e-mail addresses to be the industry standard for author disambiguation, but rather offering a work-around for people who want to act now while a more elegant solution is developed over time.
  
  Second, my blog post is not asking Bepress to take responsibility of and go in and “clean up” any metadata at all, or without the notice or participation of any of the institutions affected. My concept of a centralized solution is more in line with a well-thought-out, time-efficient procedure that Bepress supports that reduces the effort to fix this issue than what I had to do. I disagree that Bepress would have to replicate my workflow. Bepress has access to their databases in a way repository managers at institutions do not, and batch inserts and edits are certainly plausible. Your comment significantly downplays the efforts taken to implement this work-around; it took me about four months to go through the entire process for my entire faculty, the duration of which I had to keep track of a lot of moving parts. Moreover, though, the centralized process for making changes would be iterative in that repository managers would be able to contact Bepress with the works and receive support so that the impact data can be piped in to the Author Dashboard.
  
  In discussion with some others on this issue, a particularly elegant idea I heard was to add some kind of button to each page that lets managers flag the record for the institution to address. Additionally, it seemed practically easier to me to convince Bepress to implement a small patch to fix this issue rather than a complete redesign of functionality, at least in the short term. That said, I do agree with you in that there are better frameworks outside of an e-mail address that could solve this problem (like ORCID integration).
  
  Third, I think you have good points about institutional metadata and the impropriety of a third-party making changes to that metadata without the knowledge or participation of the institution. However, in this case at least, I think it is worthwhile to consider that while the e-mail address metadata may be descriptive, it is also a field used by Bepress software to make links between profiles and records. While an institution can view the data for its repositories regardless of the presence of an e-mail (so internal assessment is not impacted by this flaw), it is still the case that the author(s) of the work will not have that data piped into his or her dashboard if the repository does not include an e-mail address associated with the author(s). Because of this, I would think most of us would want Bepress to help populate this field.
  
  Fourth, while it is certainly not as robust as ORCID, Bepress SelectedWorks author profiles do have an area where authors can add old (even inactive) and current e-mail addresses that are associated with them. When added to this section, it is my understanding that any record containing that e-mail address will be piped into the author dashboard. I point this out in my blog post with a screenshot, but perhaps I did not explain it very clearly. I view this functionality as a poor analog to ORCID and other author disambiguation approaches, but it is at least available to us at the moment to take advantage of. The fundamental issue here, though, is that the majority of records I have reviewed simply contain no e-mail address at all. It is true that until a more elegant solution is implemented like you described, this will be a persistent issue that will require attention every time a record is created without an e-mail address in the metadata, which is why I urge at the end for repository managers to adopt the norm of adding e-mail addresses to Bepress record metadata. With attentive repository managers capturing as much data for faculty as possible in conjunction with a better, centralized solution from Bepress, I do believe the issue is manageable, even if it is not the ideal approach.
  
  On the ORCID front, I think this is actually taking hold in the US quite successfully, but there are still some significant barriers for legal faculty that I believe will make buy-in difficult. My primary idea here is that legal scholarship will need to introduce DOIs in the publication process so that ORCID profiles for law faculty can be easily populated by importing, versus having to manually enter the metadata for each scholarly object. I have been talking with Sheila Rabun from ORCID and several stakeholders, such as CrossRef, Altmetric, and Bepress, to try to build up a coalition of interested parties to push for DOI adoption in legal scholarship. Actually, I have a forthcoming article that I hope will help articulate these barriers and provide a framework for law journal publications to pick up and use to help them get starting with DOI creation and assignment (I have a short abstract posted here about it, with more to come shortly: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3168863). Perhaps if ORCID and Bepress had a relationship that allowed easy import using Bepress Digital Commons identifiers, that would also help with the buy-in issue. Otherwise, it is my experience that faculty are experiencing scholarly profile overload and do not want to spend any more time manually populating and managing digital identities; and library budgets are continually stretched thin so that putting the burden on librarians with attention to detail with metadata is not always a viable option.
  
  I’d like to also just give some space to Bepress and acknowledge that (I believe) they built this system before persistent digital identifiers like ORCID became widely known and adopted, but also that Bepress has been extremely open and helpful in my interactions with them on this issue. Technical criticism is not always easy to receive, so I want to underscore how Bepress has been positive and receptive of this feedback and has only expressed a strong interest in making corrections to better serve our community. This issue has certainly espoused a lot of interest from others, so I am optimistic that Bepress will implement a solution that works for everyone.
  
  Thanks again for your comments, Wilhelmina; if you don’t mind, I may reach out to you in the near future to talk about some of the technical details of author disambiguation so I can learn from your experiences and opinions on best approaches.
  
  Best regards,
  Aaron Retteen